This New Technology Enables Editing Audio Just Like Text
Princeton University engineers have developed the Photoshop for audio editing. It’s being heralded as the copy and paste of sound. This new software can add words or replace words in the audio recording of a human voice.
Audio engineers have been able to remove sound bites by editing the clip’s transcription for several years. However, they’ve never been able to add in or replace a word for clarity’s sake. This new software — VoCo — synthesizes the new words into the speaker’s voice with ease, even if that word appears nowhere else in the recording.
“VoCo automates the search and stitching process, and produces results that typically sound even better than those created manually by audio experts,” said Adam Finkelstein. Finkelstein serves as professor of computer science at Princeton.
How the software works
The software uses an algorithm that scans the whole recording and pieces together a combination of word sounds. Those sounds, called phonemes, culminate in the new word in the original voice. The algorithm even accommodates for the word’s placement in a sentence. It takes into account the context of the word and adds the appropriate emphasis.
“VoCo provides a peek at a very practical technology for editing audio tracks, but it is also a harbinger for future technologies that will allow the human voice to be synthesized and automated in remarkable ways,” Finkelstein said.
In fact, several people have already approached the engineers to help them regain their voices. Graduate student Zeyu Jin will present the research in July. Jin noted that the VoCo software to could give a voice to the voiceless.
“We were approached by a man who has a neurodegenerative disease and can only speak through a text to speech system controlled by his eyelids,” said Jin. “The voice sounds robotic, like the system used by Steven Hawking, but he wants his young daughter to hear his real voice. It might one day be possible to analyze past recordings of him speaking and created an assistive device that speaks in his own voice.”
The technology also poses some interesting ethical questions. The researchers recognize this and want to address any possible issues.
“Today we take it for granted that photos can be edited, and we judge photos with a little more skepticism,” Finkelstein said. “We understand there is a journalistic responsibility attached to photos.”
Would this be allowed in news clips to add in words that often go missing like “a,” “and,” and “the”? Those words can sometimes be added into text for the sake of clarity. However, they can change entire meanings of a statement. For example, Neil Armstrong intended to say “One small step for A man” rather than “One small step for man.” How could this technology potentially impact the way we hear and remember audio clips? An even more paranoid question could regard truth in general. In an era where facts become subjective, could this technology’s existence be blamed for distorting the truth?
For now, the engineering team anticipates a wider discussion the software’s uses.
“This tool will almost certainly fuel the conversation about audio that was preceded by a conversation about photos,” Finkelstein said. “Soon enough, it will be followed by a conversation about video.”
The entire research project can be found in the journal Transactions on Graphics. You can read more about the project from the paper’s preprint on the Princeton website.