Against the backdrop of increasingly fierce competition in AI models, French startup Mistral has launched an optical character recognition (OCR) API called Mistral OCR, aiming to provide enterprises with more advanced document comprehension. This tool not only extracts content from messy PDFs and image files, but also organizes complex elements such as handwritten notes, printed text, pictures, tables and formulas into structured data, providing great convenience for enterprises to process massive unstructured data.
The launch of Mistral OCR marks a new stage of development in OCR technology. It is not only a simple text recognition tool, but also a senior document interpretation expert who can understand the typesetting elements and features of various documents, including tables, mathematical expressions, and pictures interspersed therein, and ensure the structure of the output results. This capability is particularly important for enterprises, because up to 90% of enterprise information exists in the form of unstructured data, such as emails, social media posts, videos and images, which have always made companies feel headaches in search and analysis due to the lack of predefined formats.
Guillaume Lample, chief scientist at Mistral, said the technology is a key step in driving the wider use of AI in enterprises, especially for companies that want to simplify access to internal documents. Mistral OCR is powerful and comprehensive, supporting multiple languages, scripts and document layouts, and can retain format elements of the document, such as titles, paragraphs, lists and tables, making extracted text easier to follow-up. In addition, users can extract specific content and format it into structured formats such as JSON or Markdown, which facilitates integration with other AI-driven workflows.
Mistral OCR not only performs well in function, but also has significant advantages in performance. According to the benchmark results, its accuracy in mathematical recognition, document scanning and multilingual text processing surpasses major competitors including Google Document AI, Azure OCR and OpenAI's GPT-4o. What is even more commendable is that the processing speed of Mistral OCR is also amazing, with a single node processing up to 2,000 pages per minute, making it ideal for industries such as research, customer service, and historical document preservation that require processing a large number of documents.
For enterprise CEOs, CIOs, CTOs, IT managers, and team leaders, Mistral OCR brings significant efficiency, security, and scalability opportunities to document-driven workflows. By automating document processing and reducing manual data entry, Mistral OCR can reduce management costs and simplify operations. Especially in industries with a wide range of paper documents such as finance, medical care, legal and compliance, its value is even more prominent. In addition, Mistral OCR's document understanding capabilities can help decision makers extract actionable insights from reports, contracts, financial documents and research papers, improve data security and compliance, and easily integrate with existing enterprise systems for overall productivity.
Currently, Mistral OCR is priced at 1,000 pages per dollar, while batch reasoning is 2,000 pages per dollar. The API has been launched on Mistral's developer platform la Plateforme. Users can also try the model for free on Mistral's website Le Chat to experience the power of its "fire eyes" firsthand. Mistral AI said the model will be continuously improved based on user feedback in the coming weeks.
The launch of Mistral OCR marks a new stage in the development of OCR technology. By combining OCR with AI-driven document understanding, Mistral is helping enterprises extract, analyze and utilize their documents in a smarter way. For those companies that want to make their documents "live", they might as well experience this "secret weapon" from France as soon as possible.