Seattle startup Moondream released the compact visual language model moondream2, which has only 1.6 billion parameters, but has performed well in various benchmark tests, even better than some models with larger parameters. As an open source model, moondream2 can run locally on low-performance devices such as smartphones, and has powerful image and text processing capabilities, including Q&A, OCR, object counting and classification. Its scores over 60% on DocVQA, TextVQA and GQA, demonstrating its powerful ability when executed locally. Moondream has received $4.5 million in seed rounds and continues to update the model to improve its performance.
Recently, Moondream, a Seattle startup, launched a compact visual language model called moondream2. Despite its small size, the model has performed well in various benchmarks and has attracted much attention. As an open source model, moondream2 is expected to implement local image recognition on smartphones.

Moondream2 was officially released in March. The model is capable of processing text and image input, and has the ability to answer questions, text extraction (OCR), object counting and item classification. Since its release, the Moondream team has continuously updated the model to continuously improve its benchmark performance. The July edition showed significant improvements in OCR and documentation understanding, especially in the analysis of historical economic data. The model scored more than 60% on DocVQA, TextVQA and GQA, showing its powerful ability when executed locally.
A distinctive feature of moondream2 is its compact size: only 1.6 billion parameters, which makes it run not only on cloud servers, but also on local computers and even some low-performance devices such as smartphones or single-board computers. .
Despite its small size, its performance is comparable to some competitive models with billions of parameters, and even outperforms these larger models in some benchmarks.
In comparison of mobile device visual language models, the researchers pointed out that although moondream2 has only 170 million parameters, its performance is comparable to that of the 700 million parameters model, and it only performs slightly inferior to the SQA dataset. This shows that despite the excellent performance of the small model, there are challenges in understanding a specific context.

Vikhyat Korrapati, the developer of the model, said moondream2 was built on other models such as SigLIP, Microsoft's Phi-1.5 and LLaVA training datasets. The open source model is now available for free on GitHub and has a demo version on Hugging Face. On the coding platform, moondream2 has also attracted widespread attention from the developer community and has received more than 5,000 star ratings.
The success attracted investors' attention: Moondream successfully raised $4.5 million in a seed round led by Felicis Ventures, Microsoft's M12GitHub fund and Ascend. The company's CEO Jay Allen has worked for Amazon Web Services (AWS) for many years and leads the growing startup.
The launch of moondream2 marks the birth of a range of professionally optimized open source models that require less resources when providing similar performance to larger, older models. Although there are some small local models on the market, such as Apple's Smart Assistant and Google's Gemini Nano, these two manufacturers are still outsourcing more complex tasks to the cloud to solve.
huggingface:https://huggingface.co/vikhyatk/moondream2
github:https://github.com/vikhyat/moondream
Key points:
Moondream has launched moondream2, a visual language model with only 160 million parameters, which can run on small devices such as smartphones.
The model has strong text and image processing capabilities, can answer questions, perform OCR, count objects, and classify benchmarks, and perform excellent benchmarking.
Moondream successfully raised $4.5 million in funding, and the CEO worked at Amazon, and the team continued to update and improve model performance.
The emergence of moondream2 has brought new possibilities to mobile AI applications, and its open source features have also promoted the active participation and innovation of the developer community. In the future, with the continuous development of technology, small and efficient AI models like moondream2 will play an important role in more fields.