The updated chatbot can respond to audio inputs in a similar time to a human, opening up the possibility of real-time language translation.
The new version of the ChatGPT AI chatbot has been unveiled and offers near-instant results across text, vision and audio, according to its maker.
OpenAI said it was much better at understanding visuals and sounds than previous versions.
It offers the prospect of real-time ‘conversations’ with the chatbot, including the ability to interrupt its answers.
The firm says it “accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs”.
GPT-4o is to be rolled out over the next few weeks amid a battle by tech firms to develop ever-more advanced artificial intelligence tools.
Monday’s announcement showed tasks such as real-time language translation; using its vision capability to solve a maths question on a piece of paper, and to guide a blind person around London.
GPT-4o can respond to audio in as little as 232 milliseconds, with an average of 320 milliseconds, which the company says is similar to human response time.
To try to ease concerns over bias, fairness and misinformation, the Microsoft-backed company says the new version has undergone extensive testing by 70 external experts.
It comes after Google earlier this year had a major PR blunder over images generated by its Gemini AI system.
GPT-4o model will be free, but premium ‘Plus’ users get a greater capacity limit for messages.