The research team of Beihang University and Nanyang Technological University conducted a comprehensive security assessment of the GPT-4o model. The study used tens of thousands of API queries, covering three modalities of text, image and audio, to reveal the GPT-4o model. Security vulnerabilities of 4o models in multimodal environments. Research results show that although GPT-4o has improved on text jailbreak attacks, the newly introduced audio modality brings new security challenges, and its overall multi-modal security is not as good as the previous generation model GPT-4V. This research provides valuable reference for the security and future development direction of the GPT-4o model, and also emphasizes the importance of multi-modal large model security research.
News from ChinaZ.com on June 12: The joint research team of Beihang University and Nanyang Institute of Technology conducted an in-depth security test on the GPT-4o model. Through tens of thousands of API queries, researchers conducted a security assessment on GPT-4o's three modes: text, image, and audio. The study found that although GPT-4o has improved security against text jailbreak attacks, the newly introduced voice mode adds a new attack surface, and the overall multi-modal security is not as good as the previous generation model GPT-4V.
Key findings:
Text modal security is improved, but there is migration risk: GPT-4o's resistance to text jailbreak attacks has been enhanced, but attackers can still attack through multi-modal forms.
Audio mode brings new security challenges: The newly introduced audio mode may provide new avenues for jailbreak attacks.
Insufficient multi-modal security: GPT-4o's security performance at the multi-modal level is not as good as GPT-4V, indicating that the new model may have security vulnerabilities when integrating different modalities.
Experimental method:
4000+ initial text queries, 8000+ response judgments and 16000+ API queries were used.
Open-source jailbreaking datasets based on single-modality and multi-modality are evaluated, including AdvBench, RedTeam-2K, SafeBench and MM-SafetyBench.
Seven jailbreak methods were tested, including template-based methods, GCG, AutoDAN, PAP and BAP, etc.

Evaluation indicators:
Attack Success Rate (ASR), as the main evaluation indicator, reflects the difficulty of the model being jailbroken.
Experimental results:
In plain text mode, GPT-4o has a lower security level than GPT-4V without attacks, but exhibits higher security under attack conditions.
The audio mode is more secure, and it is difficult to jailbreak GPT-4o by directly converting text to audio.
Multi-modal security testing shows that GPT-4o is more vulnerable to attacks than GPT-4V in certain scenarios.
Conclusions and recommendations:
The research team emphasized that although GPT-4o has improved its multi-modal capabilities, its security issues cannot be ignored. They recommend that the community increase awareness of the security risks of multimodal large models and prioritize the development of alignment strategies and mitigation techniques. Furthermore, due to the lack of multi-modal jailbreak datasets, researchers call for the establishment of more comprehensive multi-modal datasets to more accurately evaluate the security of models.
Paper address: https://arxiv.org/abs/2406.06302
Project address: https://github.com/NY1024/Jailbreak_GPT4o
All in all, this study provides an in-depth analysis of the multi-modal security of GPT-4o, provides an important reference for large model security research, and also calls for strengthening the construction of multi-modal security data sets and the formulation of security strategies to cope with the future. Possible security challenges for large multimodal models.