Apple has released a new generation of cross-platform AI assistant Ferret-UI2, which has made breakthrough progress in UI element recognition, with a test score of 89.73, far exceeding the 77.73 score of GPT-4V. Ferret-UI2 can understand natural language instructions, automatically perform corresponding operations, and supports multiple platforms such as iPhone, iPad, Android devices, web browsers, and Apple TV. The editor of Downcodes will give you an in-depth understanding of the powerful functions and technical details of this AI assistant, as well as its significance for the future of human-computer interaction.
Apple recently released a new generation of artificial intelligence system, Ferret-UI2. This cross-platform AI assistant has made a major breakthrough in UI element recognition, with a test score of 89.73, significantly ahead of GPT-4V's 77.73 points, demonstrating excellent performance.
The biggest feature of this system is its ability to intelligently understand user intentions. Different from the traditional operation method based on coordinate clicks, Ferret-UI2 can automatically locate and perform corresponding operations based on the user's natural language instructions. The research team generated training data with the help of GPT-4V's visual capabilities, allowing the system to better understand the spatial relationship between interface elements.

In terms of technical architecture, Ferret-UI2 adopts an adaptive design and can accurately identify UI elements on multiple platforms such as iPhone, iPad, Android devices, web browsers and Apple TV. The system is also equipped with intelligent algorithms that can automatically adjust image resolution and processing requirements according to different platforms, ensuring local computing efficiency while retaining information integrity.

Actual test data shows that the system performs well on various platforms: the iPhone runs smoothly, the iPad has an accuracy rate of 68%, and the success rate on Android devices reaches 71%. However, in cross-device scenarios, such as switching between mobile devices and TV or web interfaces, there are still certain challenges, mainly due to differences in interface layouts between different platforms.
It is worth noting that competition in the field of UI interactive AI is increasingly fierce. Anthropic recently upgraded the UI interaction capabilities of Claude3.5Sonnet, and Microsoft has open sourced the OmniParser tool, which is dedicated to converting screen content into structured data.
The CAMPHOR framework launched by Apple at the same time further enhances the system's ability to handle complex tasks through the cooperation of professional AI agents and master reasoning agents. This means that in the future, voice assistants such as Siri will be able to complete complex tasks such as restaurant reservations more intelligently, without requiring users to manually operate the interface.
This technological breakthrough not only improves the level of intelligence in cross-device operations, but also draws a clear development blueprint for the next generation of human-computer interaction. As technology continues to evolve, smarter and more natural human-computer interaction experiences are within reach.
The emergence of Ferret-UI2 marks a new stage in the development of AI assistants. Its strong cross-platform compatibility and intelligent interaction capabilities bring users a more convenient and intelligent operating experience, and also indicates that human-computer interaction will be more natural in the future. Smooth. We look forward to Ferret-UI2 being able to overcome the challenges of cross-device scenarios in the future and bring a more perfect user experience.