What’s Amazon been up to in Cambridge? Speech rec for Echo product, among others

Amazon's Echo device can play music and answer questions.
Amazon's Echo device can play music and answer questions.

Amazon has started shipping — in small numbers — a tabletop device called Echo. If Apple’s Siri and Bose’s WaveRadio had a baby, it would be something like Echo. Once connected to your wireless network, the $199 device can stream music and news programming from services like iHeartRadio and Amazon Prime, and it can also answer spoken questions on subjects like the weather, or what year the War of 1812 ended. And it turns out that a team at Amazon’s Kendall Square research-and-development office has been developing the speech recognition capabilities for Echo.

How do we know? Not because Amazon put out a press release, or responded to my requests for an interview with the team’s leader, Bill Barton. (Amazon hasn’t offered an interview or comment about what its Cambridge office works on since I broke the news in December 2011 that it was laying the groundwork for an outpost here.) But LinkedIn job descriptions for members of the team now mention that they are working on “speech recognition and natural language understanding science for revolutionizing how customers interact with Amazon’s products and services,” including Echo, the Dash grocery-ordering device (also in limited roll-out), and the FireTV line of products that link televisions to online video and games. And recent Amazon postings for jobs in Cambridge open with questions like, “Interested in Amazon Echo? Come work on it.”

I’ve noted before that Amazon has been assembling a team of speech-recognition and understanding experts with past experience at companies like BBN Technologies (part of Raytheon), Microsoft, Vlingo, and Nuance. But these job postings and LinkedIn profiles are the first statements about which specific products they’ve been working on.

The top of Amazon's Echo device includes seven microphones. It starts listening for voices when you say a "wake word," either Alexa or Amazon.

The top of Amazon’s Echo device includes seven microphones. It starts listening for voices when you say a “wake word,” either Alexa or Amazon.

It isn’t easy to get your hands on an Echo; I requested an invite last November, when the product was first announced, and when I recently got the green light to buy one, I was told the delivery date would be in May.

But a few people locally have already received theirs, like venture capitalist Aaron White of Venrock. He posted his assessment of the Echo in late December. “Amazon’s Echo nails the living room voice recognition problem,” White wrote. “It understands you immediately, at reasonable volumes, at reasonable distances, and responds perfectly promptly.” But he urged Amazon to let other startups and media companies link to Echo, so that it is more of an open device, rather than something that handcuffs you to Prime Music and other Amazon services.

Cambridge realtor Charles Cherney told me he put his Echo device in his kitchen, “tucked out of sight.” He uses it to keep a shopping list and play news from NPR and the BBC. Drawbacks? Echo is “tied to Bing instead of Google search,” he explained via Twitter. The “voice recognition [is] only okay. You want it to be able to do more.” Ronny DeRosa, a tech recruiter, says he put his Echo in his living room. While it’s nice to be able to speak naturally to Echo, without yelling, DeRosa says that “you won’t get answers for majority of questions you ask” but that the device is “worth $99 just for the music and weather.” (Echo’s $199 price drops to $99 for members of Amazon’s Prime shipping-and-digital media service, which costs $99 a year.)

Entrepreneur Russ Wilcox — he founded the company that supplies Amazon with screens for many of its Kindle e-reading devices — sent a lengthy assessment of Echo that I included below. His bottom line? “The potential here is super, but for the moment, the range of things you can do with Echo is limited. Often the recognizer hears you correctly, but Echo just does not have the feature you want, and the device lamely sends a Bing search entry to the [Echo’s companion] phone app.”

Several other Bostonians in the tech world tell me they’re still waiting for their Echos to arrive.

A reviewer on the tech news site CNET called the Echo “one of the better voice-controlled devices I’ve used,” saying that it was better than Siri or Google Now. Amazon “clearly has some talented engineers working on its voice recognition projects,” CNET’s reviewer wrote.

Echo will have some interesting competition later this year from Jibo, a countertop “social robot” being developed by Weston startup Jibo that will understand spoken commands. But that product, which also includes a camera for photography and videoconferencing, will be priced at $599 — much higher than Echo.


 

Entrepreneur and former E Ink CEO Russ Wilcox sent this assessment of the Echo:

The device itself drips with sleek design and comes in gorgeous packaging. Setup is easy. The speech recognition is excellent. “Alexa” wakes up whenever you say her name, and only when you say her name. A ring of blue LEDs lights up and then a white one shows what direction she thinks you are speaking from, and she is usually right. She understands the words you say with great accuracy and she sends a text copy of what you said to your phone, so you can give feedback on any mistakes.

The potential here is super, but for the moment, the range of things you can do with Echo is limited. Often the recognizer hears you correctly, but Echo just does not have the feature you want, and the device lamely sends a Bing search entry to the phone app.

The best features [right now] are for speaker control. You can say “play NPR” or “volume up” or “play classical music” or “play Hey Ho by Lumineers.” If you take the time to import your iTunes library into Amazon, you can play your songs from your library in theory — although this isn’t working fully yet for me. You can also hear music samples or stream from Amazon’s radio service. My kids had a bunch of friends over one afternoon and they all enjoyed asking Alexa to play different songs they liked.

Right now, most everything Echo does is recognize one voice command and make one response. It can tell you the time or weather. It’s not that great yet as an encyclopedia. Yes, it does know some facts, but in practice the world is too complex to cover all facts without clarification. More powerful features like sending a message, dialing a phone, making contact and address entries, getting show times and buy tickets for a movie, or even just asking if you received and like a recent Amazon purchase are all out of Echo’s reach right now. Those would probably require a 2- or 3-level dialogue, and maybe remembering some of your preferences.

We wished we could help Echo do more by focusing it on a specific domain (“Let’s talk about movies. When is Hobbit playing?”)

We also immediately wished Echo could recognize the different voices in the family. The illusion of Alexa could be so much more powerful if she could greet you by name when she hears your voice.

The speaker audio quality is normal — fine for everyday radio or background music and probably on par with computer speakers. It’s nice to control background music hands-free.

Putting this altogether, everything that is implemented for the launch has been done top-notch and works well. At $199 or $99 for Prime [members], Echo today is a smart, voice-controlled speaker with a bit of Siri-like personality. I am pretty excited to see how they expand from here and thrilled that they can do that gradually from the server – I expect Alexa will get a bit smarter every day and will become more and more useful to customers who take the plunge now.

Scott Kirsner writes the Innovation Economy column every Sunday in the Boston Globe, in which he tracks entrepreneurship, investment, and big company activities around New England.
Follow Scott on Twitter - Facebook - Google+