Sounding the alarm: UF cybersecurity expert exposes audio deepfake

In Featured, News, Research & InnovationBy Helen Goh

Patrick Traynor, Ph.D.

Patrick Traynor, Ph.D., a professor in UF’s Department of Computer & Information Science & Engineering, and the John H. and Mary Lou Dasburg Preeminent Chair in Engineering

Audio deepfakes are becoming ubiquitous – blurring the line between fact and fiction – but UF researchers are working to develop methods to help the public navigate this new technological terrain.

We’ve all heard about audio deepfakes and voice cloning or have even fallen prey to them – you receive a phone call from someone claiming to have been in an accident or arrested, who then hands the phone over to a supposed lawyer or even someone impersonating a law enforcement official seeking immediate payment.

“Deepfake voices are challenging a fundamental way we have come to understand the world and interact with the people in our lives,” said Patrick Traynor, Ph.D., a professor in UF’s Department of Computer & Information Science & Engineering, and the John H. and Mary Lou Dasburg Preeminent Chair in Engineering. “We rely on our senses, and now, deepfakes challenge the ways in which we interact with the world around us.”

Traynor warns that fake audio impersonating political leaders and other famous people has taken misinformation to new levels.

Humans have been mimicking each other’s voices since the dawn of language. While deepfake audio and voice cloning aren’t recent developments, the widespread accessibility of advanced technology has democratized their use. Creating convincing deepfakes is no longer limited by skill or expertise. Thanks to advancements in hardware and increasingly sophisticated machine-learning algorithms, individuals can now fabricate virtually any content; merely a few seconds from a publicly available voice recording, whether from YouTube or a simple voicemail, is all that is needed.

This poses a real threat to various facets of people’s lives — from online banking to air traffic control, the election process, and national defense — where the authenticity of the voice is critical. Deepfake audio undermines the very pillars of modern society, casting a shadow over trust and security.  

Traynor also warns of the dual threat posed by the widespread circulation of deepfake audio samples, emphasizing that the issue requires not only the development of tools to detect and expose deepfakes but also the creation of mechanisms to authenticate genuine content.

“We face a twofold challenge,” he said. “Not only do we have to unmask deepfakes, but we must also find ways to prove certain things are genuinely real.”

This compounded challenge has become known as the “liars’ dividend,” where bad actors can exploit the ambiguity by simply claiming the opposite and labeling authentic evidence as a deepfake to deny wrongdoing or evade accountability. Distinguishing fake from real, and vice versa, has proven to be time-consuming, labor-intensive, and costly for law enforcement agencies.

As deepfakes become more rampant, there is a common misconception that a more advanced AI detector is all it will take to solve the problem. Traynor provides a cautionary note, highlighting that, while machine learning excels at identifying patterns it has encountered previously, it struggles when confronted with something new. With the rapid advancement of deepfake technology, AI finds itself consistently lagging and largely ineffective in addressing the evolving challenges posed by deepfakes.

This is where experts like Traynor and his team at UF play a crucial role. Their expertise in designing state-of-the-art defenses against fake audio and various cybersecurity threats positions the university at the forefront of this rapidly evolving field.

“Our perception of reality, our creation and consumption of information, and our interpersonal connections may undergo profound transformations, but our unwavering pursuit of truth must endure.”

Patrick Traynor, Ph.D. (professor in UF’s Department of Computer & Information Science & Engineering, and the John H. and Mary Lou Dasburg Preeminent Chair in Engineering)

During Traynor’s recent invitation to the White House,  he engaged in discussions about the growing threat of robocalls and deepfake voices as the election nears, shedding light on the strategies and technologies that are being developed to counter the problem of fake audios. He spoke with Anne Neuberger, U.S. deputy national security advisor; Jessica Rosenworcel, chairwoman of the Federal Communications Commission; Lina M. Khan, chair of the Federal Trade Commission; as well as representatives from major telecommunications companies like AT&T, T-Mobile, and Verizon.

The research being conducted by Traynor and UF’s Florida Institute for Cybersecurity team to develop robust defenses against deepfake technology is currently funded by the National Science Foundation and the Office of Naval Research. This interdisciplinary research encompasses analyzing deepfake voice technology and examining intricate aspects of human voice and speech (such as prosody, in which varying emphasis on certain words changes the meaning of the sentence; and breathing patterns and turbulence flow generated by speech). Vocal tract recreation is thus aimed at distinguishing genuine human voices from deepfake audio more accurately.

In an effort to create more powerful tools to detect deepfake audio, Traynor and his research team also borrow techniques from the field of articulatory phonetics – applying fluid dynamics to model human vocal tracts during speech generation – to successfully demonstrate that deepfakes fail to reproduce all aspects of human speech equally. In a test that ran more than 5,000 speech samples, Traynor and his team proved that deepfakes fail to reproduce the subtle but uniquely biological aspects of human-generated speech.

By leveraging elements of speech that are inherently difficult for machine-learning models to fully replicate, Traynor and his team are on track to construct better, more improved detector forensic models that can account for 99.5% accuracy of deepfake detection. In addition, by using UF’s HiPerGator supercomputer, the team was able to successfully recreate the micro effects of turbulent flows generated by genuine speech, surpassing machine-learning models that can only simulate macro effects. This approach significantly enhances detector efficacy. The UF team’s cybersecurity research also looks at bolstering identity verification on smartphones, aiming to fortify what Traynor envisions as an entirely new frontier in voice technology defense.

“Consider this: if I receive a call showing that it is coming from the governor or the president on my device, my first instinct is to hang up, as I’d have no other way but to assume that somebody is attempting to trick me,” Traynor said. “So clearly, the concern lies not just in the ease of making a deepfake, but also in our inability to discern its origin.”

He added, “When technology pretty much lets us do whatever we want, it raises the question, ‘What do we trust’? We must do a better job in safeguarding communication and preserving authenticity and truth.”

As the pace of innovation continues to accelerate, UF and its experts are determined to lead the charge.

“Our perception of reality, our creation and consumption of information, and our interpersonal connections may undergo profound transformations,” Traynor said. “But our unwavering pursuit of truth must endure.”