Microsoft's latest visual basic model Florence-2 has achieved a major breakthrough. It can run completely locally in a browser that supports WebGPU without relying on a remote server. This is thanks to Transformers.js and ONNX Runtime Web technology, which enables powerful visual recognition functions to be implemented directly in the user's browser, completely changing the way AI vision applications run. Florence-2-base-ft has 230 million parameters and uses a hint-based approach to handle a variety of visual and visual language tasks, including image description generation, OCR, object detection and image segmentation, while occupying only 340MB of storage space. Continue working with loaded models even when offline.
Recently, the latest visual basic model Florence-2 launched by Microsoft has achieved a major breakthrough. With Transformers.js technology, the model can now run 100% natively in browsers that support WebGPU. This breakthrough has brought revolutionary changes to AI vision applications, allowing powerful visual recognition functions to be implemented directly in the user's browser without relying on remote servers.
Florence-2-base-ft is a 230 million-parameter vision base model that uses a cue-based approach to handle a wide range of vision and visual language tasks. The model supports a variety of features, including but not limited to:
Image Description Generation Optical Character Recognition (OCR) Object Detection Image Segmentation
This powerful model only takes up 340MB of storage space. Once loaded, it will be cached in the browser and can be called directly when the user visits the page again without re-downloading. The most amazing thing is that the entire process happens completely locally in the user's browser without sending any API calls to the server. This means that after the model is loaded, users can still use all functions even if they disconnect from the Internet.
The localized operation of Florence-2 benefits from the support of Transformers.js and ONNX Runtime Web technology. This breakthrough not only improves the level of user privacy protection, but also greatly reduces the cost of use, paving the way for the popularization and application of AI vision technology.
For developers and technology enthusiasts, Florence-2’s ONNX model is now open access on the Hugging Face platform. Interested friends can visit https://huggingface.co/models?library=transformers.js&other=florence2 for more details. In addition, the source code of the project has also been made public on GitHub, and developers can obtain it through https://github.com/xenova/transformers.js/tree/v3/examples/florence2-webgpu for further exploration and development.
This breakthrough of Florence-2 will undoubtedly promote the rapid development and widespread popularization of AI vision applications. We can expect more browser-based smart vision applications to change our daily lives and the way we work in the near future.
Florence-2’s local operation capability improves user privacy and convenience, lowers the threshold for use, and brings unlimited possibilities for the future development of AI vision applications. Its open source models and codes also provide developers with rich resources, and we look forward to the emergence of more innovative applications.