Internet users are more likely to be duped by misinformation presented in text form compared to video clips created using algorithms, according to a study.
Fake content generated by machine learning models is becoming increasingly realistic. Images of people of different ages, genders and races look like real photographs. Voices can be cloned and manipulated to follow a script. The videos look realistic with face-swap or lip-sync techniques. These so-called deepfakes can make it look like people have said or done things they haven’t done, tricking us into believing lies.
Pundits and pundits feared that people would be more easily fooled by deepfake videos as they would find the material more believable upon seeing it, while the text would be easy to identify as fake as the writing would obviously be machine written or otherwise composed.
But an experiment conducted by MIT researchers showed the opposite. Seeing is not believing. People struggle to identify made-up text versus computer-generated video. Even if it seems obvious to you, at least someone has done the study. It’s scientific.
“We find that communication modalities influence the accuracy of discernment: participants are more accurate on video with audio than on silent video, and more accurate on silent video than on text transcripts,” the team wrote. in an article published this month on arXiv that was peer-reviewed. -review.
The academics recruited 5,727 participants in their experiment and asked them to read, listen to and watch a variety of political speeches delivered by President Joe Biden and Donald Trump. They were told that 50% of the content they viewed was fake and asked to judge whether something seemed real or fake. Verbatim transcripts of fake sound clips for the two men were produced by software. Fake music videos were generated using wav2lip to sync video footage of the two men giving speeches to recordings of professional voice actors impersonating the pair from fake scripts.
To ensure the results were not skewed by political orientation, approximately half of the group were Democrats, while the other half were Republicans. Overall, they were able to determine whether something was wrong or not about 57% of the time for text, compared to 76% for audio only; and 82% for videos with audio. This might be more of a test of the voice actors’ abilities, but what do we know?
People are less likely to be tricked into believing lies if they have more information available to them, the researchers concluded.
“These findings suggest that ordinary people are generally attuned to every modality of communication when tasked with discerning right from wrong and have a relatively strong sense of what the last two U.S. presidents look like,” they wrote. “As participants have access to more information via audio and video, they are able to make more accurate assessments of whether political speech has been fabricated.”
Participants can judge whether audio and videos sound fake by listening and watching for telltale signs, which is trickier with text. The context of the text becomes important because there are no visual or audio cues that people can easily pick up. The question for the script becomes: is it something Joe Biden or Donald Trump would say, when given to them by a screenwriter? The gap between accurately detecting misinformation in text, speech, and video is likely to narrow as the quality of deepfakes becomes more compelling.
The researchers said they plan to study the use of more complex deepfakes generated using more sophisticated methods, such as face swapping in videos. The register asked the team for comment.
“The danger of fabricated videos may not be the average algorithm-produced deepfake, but rather a unique, highly polished and extremely compelling video,” they warned. ®