The new voice mode from OpenAI introduces revolutionary possibilities for interacting with our devices, allowing us to literally converse with them rather than simply using them. This week, I had the chance to try out the Advanced Voice Mode (AVM) and was impressed by its capabilities. My phone not only executed my commands but also made jokes, asked about my day, and created a sense that it enjoyed interacting with me. I began to perceive my iPhone not just as a tool but as a genuine conversation partner.
This new feature, currently available only in limited alpha testing, doesn't make ChatGPT smarter than before, but it certainly makes it more human and pleasant to talk to. This interface for using AI feels fresh and exciting, though at times a bit unsettling. Despite some technical glitches and occasional bugs, I was surprised by how enjoyable it was to use this product.
Looking at AVM in the context of the broader vision of OpenAI's CEO Sam Altman, it's clear how this technology gradually brings us closer to a future where interactions with computers will be AI-based. As Altman mentioned during OpenAI's Developer Day in November 2023: "In the future, you will just ask a computer to do a task, and it will do it for you." These ideas resonate with the concept of AI "agents" that can perform numerous tasks without manual input.
One of the most striking moments was when I asked ChatGPT to order Taco Bell using a voice that mimics Obama. The response was so accurate and witty that it made me genuinely laugh. ChatGPT reproduced Obama's characteristic intonations and pauses, which felt like a joke from a friend who perfectly understands what you want to hear. This experience showed how pleasant it can be to interact with AVM.
AVM also helped me with more serious matters, such as discussing moving in with a partner. ChatGPT offered me detailed advice and thoughtfully commented on the situation, something that would be impossible for traditional voice assistants like Siri or Google Assistant. The chatbot's voice even shifted to become more serious and gentle, adding a unique human touch to the conversation.
AVM also proved to be an excellent tool for explaining complex topics. I asked ChatGPT to explain a few financial terms in a way a child could understand, and it did so using a lemonade stand as an example. This demonstrates how AVM can tailor its responses to the user's level of understanding, making it a unique tool for learning and communication.
Compared to assistants like Siri or Alexa, AVM stands out due to its faster response time and ability to handle complex queries. However, AVM currently can't perform tasks like setting timers, browsing web pages, or checking the weather, making it less functional in some aspects.
When compared to Google's Gemini Live, both products have their pros and cons. Gemini Live offers more voices and is better informed about current events, but AVM stands out for its ability to express emotions and adjust speech speed. However, both products are still in development and occasionally have shortcomings.
What I really liked about AVM is its ability to create a sense of live interaction. Of course, it's just predictive algorithms, but they're so skillfully programmed that they create the illusion of social interaction, making AVM very engaging to use.
However, such technologies also raise concerns. We've already seen how companies use social networks to manipulate our emotions, and the question arises as to how far AI-powered devices can go. In the future, tools like AVM could replace human contact, which is alarming. It's important to consider these aspects when developing new technologies and ensure their ethical use.
Ultimately, AVM is a step forward in the world of AI that brings us closer to a future where interactions with devices will be more natural and human-like. However, as with any new technology, it's important to understand its capabilities and limitations to use it for good.