Watch who watches what you say: MIT researchers can recreate sound based on video of nearby items

I’ve watched this video a few times, and it still blows my mind: Using high-speed video of nearby items, such as a plant or stray chip wrapper, MIT, Microsoft, and Adobe researchers found a way to analyze vibrations and algorithmically recreate roughly what sounds were in the room, down to actual words being spoken or a tune being played — without any recorded audio cues.

That means that somebody strolling by your office could, theoretically, pick up what’s being said at the latest board meeting thanks to those tasteful ferns or because Ron skipped lunch in favor of a Hostess snack. Researchers even found that rough (very rough) sound could be deduced with a relatively nice, off-the-shelf ALSR camera, giving enough information to identify the number of speakers in the room, their gender, and even possibly positively ID a speaker given other sound samples.

The sound re-creation technique typically required cameras shooting at thousands of frames per second — well above the 60 frames per second even high-end smartphones typically shoot. To accurately pick up the sounds, the frames per second generally had to be higher than the frequency being observed. But the researchers also found that even when the frames per second were slightly lower than the frequency, some vibration data could still be inferred by looking at how the image coloring was blending together on a frame-by-frame basis, helping recreate those lost moments in between frames.

It’s another impressive use of imagery to deduce some surprisingly tricky things: Last year, MIT students 3D printed high-security keys based on nothing more than a photograph of them.

According to the MIT news office, the findings will be presented in a paper by Abe Davis, a graduate student in electrical engineering and computer science at MIT; Frédo Durand and Bill Freeman, MIT professors of computer science and engineering; Neal Wadhwa, a graduate student in Freeman’s group; Michael Rubinstein of Microsoft Research, who did his PhD with Freeman; and Gautham Mysore of Adobe Research at the SIGGRAPH later this month.

Michael Morisy is the founder and former editor of BetaBoston. Follow him on Twitter at @Morisy or email him at
Follow Michael on Twitter