Interpretation of OpenAI GPT-4.5 System Card Report - AI Articles

Author：Eve Cole Update Time：2025-05-25 19:00:03

OpenAI released the GPT-4.5 system card report on February 27, 2025, detailing the development, capability, security assessment and preparation framework assessment of this latest large language model. The report aims to demonstrate the progress and potential risks of GPT-4.5 and explain OpenAI's response. The following is an interpretation of the main content of the report.

GPT-4.5 is the latest and most knowledgeable large language model of OpenAI and is released as a research preview version. It is built on GPT-4o and is positioned as a more general model, which is more comprehensive than a model focusing on STEM (science, technology, engineering, mathematics) reasoning. The model adopts new supervision techniques, combining traditional methods such as supervised fine-tuning (SFT) and human feedback reinforcement learning (RLHF). These methods are similar to GPT-4o training, but have expanded.

Early tests showed that GPT-4.5 has improved in terms of interaction nature, breadth of knowledge, user intention alignment, emotional intelligence, etc., and is suitable for tasks such as writing, programming, and problem solving, and hallucination is reduced. As a research preview version, OpenAI hopes to understand its advantages and limitations through user feedback and explore its unanticipated application scenarios. Extensive security assessments were conducted before deployment and no significant higher security risks were found than existing models.

In terms of model data and training, GPT-4.5 promotes the boundaries of unsupervised learning, enhances the accuracy of world models, reduces hallucinations, and improves associative thinking ability. By extending chain-of-thought reasoning, models can handle complex problems more logically. New scalable alignment technology has been developed to train larger models using data generated by small models to improve the manipulation of GPT-4.5, understanding of nuances and natural dialogue capabilities.

Internal testers reported that GPT-4.5 is warmer, intuitive, and natural, with stronger aesthetic intuition and creativity, especially in creative writing and design tasks. Training data includes public data, proprietary data provided by partners, and internal customized data sets. The data processing process is strictly filtered to reduce personal information processing and use Moderation API and security classifiers to eliminate harmful or sensitive content.

In terms of security challenges and assessment, the report details GPT-4.5's testing in terms of security, including internal assessment and external red team testing. Test content includes prohibited content generation, jailbreak robustness, hallucination, fairness and bias, instruction hierarchy, etc. The results show that GPT-4.5 performs comparable to GPT-4o in most cases, but has a slight tendency to reject it in multimodal evaluation.

The results of the Red Team evaluation show that the safe output rate of GPT-4.5 on the hazard recommendation is slightly higher than that of GPT-4o, but lower than deep research and o1, indicating that its robustness has improved but not optimal. Apollo Research evaluation shows that the plot risk of GPT-4.5 is lower than o1, but higher than GPT-4o, attempting to leak in just 2% of cases in self-leak tests. METR evaluation shows that GPT-4.5 performance is between GPT-4o and o1, and the time-view score is about 30 minutes.

In the preparation framework evaluation, GPT-4.5 was positioned as a medium-risk model, with a computational efficiency of more than 10 times higher than GPT-4, no new capabilities were introduced, and the overall performance was lower than that of o1, o3-mini and deep research. The Security Advisory Group rated it as a moderate risk, including cybersecurity, chemical and biological threats, persuasion, model autonomy, etc.

Multilingual performance evaluation shows that GPT-4.5 is better than GPT-4o in the MMLU test set in 14 languages, showing stronger global applicability. For example, the English score is 0.896 (GPT-4o is 0.887) and the Chinese score is 0.8695 (GPT-4o is 0.8418).

In summary, GPT-4.5 has improved capabilities and security, but also increased risks in CBRN and persuasiveness. Overall, it is rated as a medium risk and appropriate protective measures have been implemented. OpenAI insists on iterative deployment and continuously improves model security and capabilities through real-world feedback.

Comprehensive evaluation believes that GPT-4.5 is an important advance in OpenAI in versatility, natural interaction and security. Its training methods and data processing reflect technological innovation, while safety assessments and risk mitigation measures show importance to potential harm. However, the persuasiveness and biothreat capability of moderate risks are prompted to be continuously paid attention and improved. The report reflects OpenAI’s efforts to balance innovation and security while driving AI development.