OpenAI unveiled the long-anticipated video capabilities of ChatGPT on Thursday, enabling users to point their phones at objects for real-time AI analysis—a feature that has remained dormant since its initial demonstration in May.
Previously, users could input text, charts, voice, or still photos to interact with GPT. The newly released feature allows GPT to observe users in real time and provide conversational feedback. For example, in my tests, this mode was able to solve math problems, offer food recipes, tell stories, and even transform itself into my daughter’s new best friend, engaging with her while making pancakes, giving suggestions, and encouraging her learning process through various games.
This release comes just one day after Google showcased its own version of a camera-enabled AI assistant powered by the newly launched Gemini 2.0. Meta has also been active in this space, with its own AI that can see and converse through phone cameras.
However, ChatGPT’s new capabilities are not available to everyone. Only Plus, Team, and Pro subscribers can access what OpenAI refers to as “Advanced Voice Mode with vision.” The Plus subscription costs $20 a month, while the Pro tier is priced at $200.
“We’re excited to announce that we’re bringing video to Advanced voice mode so you can bring live video and also live screen sharing into your conversations with ChatGPT,” said Kevin Weil, OpenAI’s Chief Product Officer, in a video on Thursday.
The stream was part of its “12 Days of OpenAI” campaign, which will feature 12 different announcements over 12 consecutive days. So far, OpenAI has launched its o1 model for all users and introduced the ChatGPT Pro plan for $200 per month, rolled out reinforcement fine-tuning for customized models, released its generative video app Sora, updated its canvas feature, and made ChatGPT available on Apple devices via the tech giant’s Apple Intelligence feature.
During Thursday’s livestream, the company offered a glimpse of its capabilities. Users can activate the video mode in the same interface as advanced voice and begin interacting with the chatbot in real time. The chatbot exhibits strong visual understanding and can provide relevant feedback with low latency, creating a natural conversational experience.
Reaching this point was not entirely straightforward. OpenAI had initially promised these features “within a few weeks” in late April, but the rollout was delayed following controversy over the unauthorized use of actress Scarlett Johansson’s voice in advanced voice mode. Since video mode relies on advanced voice mode, this issue appears to have slowed the launch.
Meanwhile, rival Google is also making strides. Project Astra has recently been made available to “trusted testers” on Android, promising a similar feature: an AI that speaks multiple languages, integrates with Google’s search and maps, and retains conversations for up to 10 minutes.
However, this feature is not yet widely accessible, with a broader rollout expected early next year. Google also has ambitious plans for its AI models, aiming to enable them to execute tasks in real time and demonstrate agentic behavior beyond audiovisual interactions.
Meta is also vying for a position in the next era of AI interactions. Its assistant, Meta AI, was showcased this September, demonstrating capabilities akin to those of OpenAI’s and Google’s new assistants, featuring low-latency responses and real-time video understanding.
Nevertheless, Meta is betting on augmented reality to enhance its AI offerings, using “discreet” smart glasses equipped with a small camera embedded in their frames. This initiative is referred to as Project Orion.
Current ChatGPT Plus users can experiment with the new video features by tapping the voice icon next to the chat bar and then selecting the video button. Screen sharing requires an additional tap through the three-dot (or “hamburger”) menu.
For Enterprise and Edu ChatGPT users eager to try the new video features, January is the anticipated month. As for EU subscribers? They will need to wait from the sidelines for now.
Edited by Andrew Hayward