Microsoft is unveiling a big overhaul of its Copilot experience today, adding voice and vision capabilities to transform it into a more personalized AI assistant. As I exclusively revealed in my Notepad newsletter last week, Copilot’s new capabilities include a virtual news presenter mode to read you the headlines, the ability for Copilot to see what you’re looking at, and a voice feature that lets you talk to Copilot in a natural way, much like OpenAI’s Advanced Voice Mode.
Copilot is being redesigned across mobile, web, and the dedicated Windows app into a user experience that’s more card-based and looks very similar to the work Inflection AI has done with its Pi personalized AI assistant. Microsoft hired a bunch of folks from Inflection AI earlier this year, including Google DeepMind cofounder Mustafa Suleyman, who is now CEO of Microsoft AI. This is Suleyman’s first big change to Copilot since taking over the consumer side of the AI assistant.
“At Microsoft AI, we are creating an AI companion for everyone,” says Suleyman in an open letter today. “I truly believe we can create a calmer, more helpful and supportive era of technology, quite unlike anything we’ve seen before.”
Copilot now looks unlike anything I’ve seen from Microsoft before, with an interface that is a big departure from what exists right now. It’s a lot warmer, with a personalized Copilot Discover page that’s more useful and inviting than a text entry prompt for a chatbot. Microsoft is customizing this entire Copilot homepage based on your conversation history, and over time, it will include useful searches, tips, and relevant information.
Microsoft split off its consumer version of Copilot to Suleyman’s team earlier this year, and it’s clearly allowed the company to experiment more with personality and customization. “What we’ve learned from the Pi team and the [Inflection AI] folks that came over is that they’ve always had an attention to detail on the needs of customers,” says Yusuf Mehdi, executive vice president and consumer chief marketing officer at Microsoft, in an interview with The Verge. “The way they listen and what they’ve learned from these long conversations in that research has certainly influenced what we’ve done here.”
Beyond the look and feel of this new Copilot, Microsoft is also ramping up its work on its vision of an AI companion for everyone by adding voice capabilities that are very similar to what OpenAI has introduced in ChatGPT. You can now chat with the AI assistant, ask it questions, and interrupt it like you would during a conversation with a friend or colleague. Copilot now has four voice options to pick from, and you’re encouraged to pick one when you first use this updated Copilot experience.
“We’re making a huge bet on voice,” says Mehdi. “When you use it with the way we’ve designed it, you really start to let yourself go and have conversations. Then you see the glimmers of where we’re going to go long term, with vision where the AI can actually help you and see what you see if you want it to.”
Copilot Vision is Microsoft’s second big bet with this redesign, allowing the AI assistant to see what you see on a webpage you’re viewing. You can ask it questions about the text, images, and content you’re viewing, and combined with the new Copilot Voice features, it will respond in a natural way. You could use this feature while you’re shopping on the web to find product recommendations, allowing Copilot to help you find different options.
Copilot Vision sessions are opt-in and ephemeral, and Microsoft says none of the content Copilot Vision engages with is stored or used for training. This new experience won’t work on all websites yet because Microsoft has put restrictions on the types of websites Copilot Vision works with. “We’re starting with a limited list of popular websites to help ensure it’s a safe experience for everyone,” says the Copilot team. During preview, Copilot Vision won’t work on paywalled and sensitive content, either.
Despite the disclaimers, Microsoft clearly has a long-term vision for these new voice and vision features in Copilot. One demo shows Copilot Vision being used to look at photos of old handwritten recipes, helping to explain what the food is and offering tips on how long it takes to make the recipe. Microsoft demonstrated a similar assistive experience for Xbox games earlier this year, showing how Copilot could help you navigate through Minecraft.
This next phase of Copilot also includes Copilot Daily, an audio summary of news and weather that Copilot reads out as if it were a CNN anchor. It’s designed as a short clip you can listen to in the mornings, and it only uses content from news and weather providers that have authorized Copilot to use their content. Microsoft is working with Reuters, Axel Springer, Hearst, and the Financial Times initially, with plans to add more sources over time.