For decades we’ve laughed at the persistent movie and television cliche of “image enhance,” whereby characters — usually detectives of one kind or another in pursuit of a yet-unknown villain — discover just the clue they need by way of technological magic that somehow increases the amount of detail in a piece of found footage. But now, of course, our age of rapidly improving artificial intelligence has brought an algorithm for that. And not only can such technologies find visual data we never thought an image contained, they can find sonic data as well: recovering the sound, in other words, “recorded” in ostensibly silent video.
“When sound hits an object, it causes small vibrations of the object’s surface,” explains the abstract of “The Visual Microphone: Passive Recovery of Sound from Video,” a paper by Abe Davis, Michael Rubinstein, Neal Wadhwa, Gautham Mysore, Fredo Durand, and William T. Freeman. “We show how, using only high-speed video of the object, we can extract those minute vibrations and partially recover the sound that produced them, allowing us to turn everyday objects — a glass of water, a potted plant, a box of tissues, or a bag of chips — into visual microphones.” Or a listening device. You can see, and more impressively hear, this process in action in the video at the top of the post.
The video just above magnifies the sound-caused motion of a bag of chips, to give us a sense of what their algorithm has to work with when it infers the sound present in the bag’s environment. In a way this all holds up to common sense, given that sound, as we all learn, comes from waves that make other things vibrate, be they our eardrums, our speakers — or, as this research reveals, pretty much everything else as well. Though the bag of chips turned out to work quite well as a recording medium, some of their other test subjects, including a brick chosen specifically for its lack of sound-capturing potential, also did better than expected.
The hidden information potentially recoverable from video hardly stops there, as suggested by Rubinstein’s TED Talk just above. “Of course, surveillance is the first application that comes to mind,” he says, to slightly nervous laughter from the crowd. But “maybe in the future we’ll be able to use it, for example, to recover sound across space, because sound can’t travel in space, but light can.” Just one of many scientifically noble possibilities, for which watching what we say next time we open up a bag of Doritos would be, perhaps, a small price to pay.
Based in Seoul, Colin Marshall writes and broadcasts on cities and culture. His projects include the book The Stateless City: a Walk through 21st-Century Los Angeles and the video series The City in Cinema. Follow him on Twitter at @colinmarshall or on Facebook.