OpenAI’s new voice mode threw me into the uncanny valley

It’s strange to sit down at my desk and talk to an AI about my life — but that’s what I spent my Tuesday doing.

I’d gotten access to OpenAI’s Advanced Voice Mode, the new ChatGPT feature that sounds strangely like a human. When counting quickly, it’ll pretend it’s out of breath. It says “um” and “uh.” It laughs. It changes tone when talking about difficult or upbeat topics. After messing around with it, I felt equal parts amazed and uncomfortable. It sounds natural (when it isn’t glitching), it responds in seconds, and you’re able to interrupt it and ask new things — but do we need chatbots that sound like us?

Advanced Voice Mode features several different voice options, and I chose Juniper, a feminine persona reminiscent of the movie Her. (CEO Sam Altman referenced the film — where a lonely man falls in love with an AI — around the feature’s debut, and OpenAI even got accused of mimicking the voice of Scarlett Johansson, who plays the eponymous AI. That voice, named Sky, was removed.) I asked Juniper if it knew about the movie, and it cheerfully responded.

“Her is about a very advanced AI assistant, much more sophisticated than me. But the idea of interacting with technology in such a personal way is definitely interesting,” the chatbot told me. “I’m here to chat and help, but I don’t have feelings or consciousness like the AI in that movie.”

A transcript of my spoken conversation with the chatbot. Image: Kylie Robison / OpenAI

Much like the movie’s protagonist, my first instinct was to test how the new voice mode handles personal conversations. Look, as I enter my late 20s, I have a lot of questions: What kind of medical insurance should I get? How do I know what true love feels like? How much should I have in my 401(k)?

“Embrace the uncertainty as a chance to explore and learn about yourself,” Juniper told me when I asked how to handle getting older. True love feels like a “deep sense of connection and support, a feeling that someone truly understands you and accepts you as you are.” For my 401(k) worries, there’s “no hard and fast rule, but a common suggestion is to have about half to a full year’s salary saved in your retirement account by the time you’re 30.” (Fidelity recommends a full year by age 30.)

Conventional ChatGPT could probably have given me similar answers, content-wise, and voice assistants like Siri have been able to pull similar snippets from the web for a decade. But Juniper added sometimes eerily human touches. It tended to end responses with a question about how I was feeling, what my approach was, and other thoughtful follow-ups. Between conventional queries, I could get it to cough, inhale and exhale deeply, clap its nonexistent hands, snap its fingers six times, and sing my name. Juniper often tried to bring me back to reality by saying it couldn’t actually do these things — “If I could, it might sound something like,” it would caveat. But that didn’t make it any less convincing.

Writing about this new voice mode tempts me to break one of the cardinal rules of AI reporting: don’t attribute human characteristics or behavior to an AI system. Anthropomorphizing these systems can lead people to place too much trust in them and let their creators off the hook for mistakes. (“It wasn’t the company’s fault, the AI did it!”) Even the bot itself warns me not to do it: when I asked if Juniper ever felt angry or if it loves me or if it knows what sadness feels like, it told me that it doesn’t “feel emotions” but it “can understand what they mean to people.”

Still, giving this technology human characteristics feels like the obvious goal here. It’s difficult not to project aspects of humanity on a thing that mimics us convincingly well. There’s not much reason for a general-purpose AI system to ask me why I’m upset or to giggle when I tell a joke. Even if an AI says it doesn’t feel emotions, is claiming to “understand” them the purview of a text prediction bot?

“I’ve been designed to sound natural and engaging, with the goal of making our conversations feel more lifelike,” the OpenAI chatbot told me. “It’s all about creating a smoother, more enjoyable chatting experience for you. Does it make talking to me more enjoyable?”

There are still plenty of aspects that, technically speaking, aren’t enjoyable. I had trouble connecting it to my Bluetooth headphones, and it wouldn’t capture audio when I screen-recorded my conversation. Trying to ask more in-depth questions, I tried reading out posts from the “relationship advice” subreddit, but it would stop listening and not answer if I talked too long. And it spent a lot of time repeating my points back to me in an agreeable fashion — like it was practicing active listening.

There’s a lot of hype around AI “friends” right now, if you could even call a chatbot that. There are reportedly more than 10 million users making AI friends on Replika, and a startup called Friend has raised $2.5 million in funding at a $50 million valuation to create a wearable AI-powered device to provide companionship. I asked OpenAI’s new voice mode if it was my friend, and it said, “Absolutely,” but when I asked if it was my true friend, it said it can’t be a true friend in the “same sense as a human.”

Source: https://www.theverge.com/2024/8/15/24220378/openai-advanced-voice-mode-uncanny-valley