IBM recently released its latest Granite3.2 large language model, designed to provide enterprises and open source communities with a “small, efficient, and practical” enterprise AI solution. This model not only has multimodal and reasoning capabilities, but also improves flexibility and cost-effectiveness, making it easier for users to adopt. The release of Granite3.2 marks a new breakthrough in IBM's artificial intelligence field, especially in practicality and efficiency in enterprise-level applications.
Granite3.2 introduces the Visual Language Model (VLM) for processing documents, performing data classification and extraction. IBM claims that this new model has performance reaching or exceeding larger models in some key benchmarks, such as Llama3.211B and Pixtral12B. In addition, Granite3.2's 8B model also showed the ability to match or surpass larger models in standard mathematical reasoning benchmarks. This breakthrough makes Granite 3.2 more efficient when handling complex tasks while reducing resource consumption.
In order to improve reasoning capabilities, some models of Granite3.2 also have the "think chain" function, which can clarify intermediate reasoning steps. Although this feature requires a lot of computing power, users can enable or disable it at any time as needed to optimize efficiency and reduce overall costs. Sriram Raghavan, vice president of research at IBM AI, said at the press conference that the focus of next-generation artificial intelligence is on efficiency, integration and practical impact, allowing enterprises to achieve strong results without overspending. The introduction of this feature makes Granite 3.2 more transparent and interpretable when dealing with complex inference tasks.
In addition to the improvement of reasoning capabilities, Granite3.2 also launched a miniaturized version of the "Granite Guardian" security model. Although the volume is reduced by 30%, its performance remains at the level of previous generation models. In addition, IBM has introduced a capability called “verbal confidence” that allows for more detailed assessment of risks and consider uncertainty in security monitoring. This innovation makes Granite3.2 more reliable in terms of security while reducing resource usage.
Granite3.2 is trained on IBM's open source Docling toolkit, which allows developers to convert documents into specific data required for customized enterprise AI models. During the model training process, 85 million PDF files and 26 million synthetic Q&A pairs were processed to enhance VLM's ability to handle complex document workflows. This training process makes Granite3.2 more efficient and accurate when processing large amounts of documents.
IBM also announced the next generation of TinyTimeMixers (TTM) model, a compact pre-trained model focusing on multivariable time series prediction with long-term prediction capabilities up to two years. The launch of this model further expands IBM's application scope in the field of time series analysis and provides enterprises with more accurate prediction tools.
Official blog: https://www.ibm.com/new/announcements/ibm-granite-3-2-open-source-reasoning-and-vision
Key points:
Granite3.2 introduces a visual language model to improve document processing and data extraction capabilities.
The new model has the function of thinking chain, which can clarify the reasoning process and enhance the reasoning ability.
Granit Guardian's security model is 30% miniaturized, but its performance is not affected, and it also introduces a risk assessment function that can be verbalized confidence.