Recently, the editor of Downcodes learned that Anna Makanju, Vice President of Global Affairs of OpenAI, shared her views on artificial intelligence bias at the United Nations "Future Summit", focusing on OpenAI's o1 inference model. She believes that the model can significantly reduce bias in AI systems and explains its mechanism for self-identification and correction of biased responses. However, the actual test results were different from expectations, which triggered the industry to further think about the actual performance of AI models.
Recently, Anna Makanju, OpenAI’s vice president of global affairs, expressed her views on artificial intelligence bias at the United Nations “Future Summit”.
She mentioned that "inference" models like OpenAI's o1 can significantly reduce bias in AI systems. So, how does O1 do this? Makanju explained that the models can self-identify bias in responses and follow more closely the rules of not producing “harmful” responses.

She said the O1 model spends more time evaluating its own answers when dealing with a problem and is able to check itself: "It's able to say, 'This is how I would solve this problem,' and then look at its own answer to see 'Oh, there might be a flaw in the reasoning here.'" She even stressed that 1 does an "almost perfect" job of analyzing its own biases, and that it will get better and better as technology advances.
However, this "almost perfect" statement seems to be an exaggeration. OpenAI internal testing found that o1 did not perform well in some bias tests compared to "non-inference" models, including its own GPT-4o. On issues regarding race, gender, and age, o1 performed even worse than GPT-4o in some cases. Although o1 performed better in terms of implicit discrimination, in terms of explicit discrimination, it was more prominent in age and race issues.
What's even more interesting is that o1's economical version, the o1-mini, performed even worse. Tests show that o1-mini has a higher probability of explicit discrimination on gender, race, and age than GPT-4o, and its implicit discrimination on age is also more obvious.
In addition to this, current inference models have many limitations. OpenAI also admits that o1 brings minimal benefits to some tasks. It's slow to respond, with some questions taking more than 10 seconds to answer. Moreover, the cost of o1 cannot be underestimated, and the running cost is 3 to 4 times that of GPT-4o.
If the inference models Makanju talks about are indeed the best way to achieve fair AI, then they will need to improve in aspects other than bias to become a viable alternative. If not, only customers with deep pockets and the willingness to endure all kinds of latency and performance issues will really benefit.
Highlight:
OpenAI's o1 model is said to significantly reduce AI bias, but test results show that it does not perform as well as expected.
o1 performs better than GPT-4o on implicit discrimination, but is worse on explicit discrimination.
? The inference model o1 is costly and runs slowly, and it still needs to be improved in many aspects in the future.
All in all, OpenAI’s o1 model still has a long way to go in reducing AI bias. While its self-correction mechanism is impressive, the high cost and speed limitations, as well as the fact that it performs poorly in some bias tests, indicate that this technology is still in its infancy and is still quite far away from practical applications. distance. The editor of Downcodes will continue to pay attention to the development of this field.