One tiny flip that opens a backdoor in AI

In This Story

People Mentioned in This Story
Body

A self-driving motor vehicle is cruising along, its numerous sensors and cameras telling it when to break, change lanes, and make turns. The vehicle approaches a stop sign at a high rate of speed, but instead of stopping, it barrels through, causing an accident. The problem will probably never be found by investigators, but instead of reading the stop sign as a stop sign, the car had been hacked to see it instead as a speed limit sign.  

According to research by George Mason University’s Qiang Zeng, an associate professor in the Department of Computer Science, and PhD student Xiang Li and colleagues, it is remarkably simple for a would-be hacker to pull off such a feat.  

“An attacker can selectively flip only one bit, and this changing of the bit from 0 to 1 allows an attacker to attach a patch onto any image and fool the AI system. Regardless of the original image input, that patched image will be interpreted as the attacker’s desired result,” said Zeng. 

So, if the hacker wants an artificial intelligence (AI) system to see a stop sign as something else, or a cat as a dog, the effort is minimal. Consider a scene potentially pulled from a Mission: Impossible movie, where a corporate spy can pass himself off as a CEO, gaining access to sensitive information.  

Zeng and colleagues will present a paper with the findings at USENIX Security 2025, one of the nation’s premier cybersecurity conferences. 

Self-driving vehicles rely on fool-proof image recognition systems, which could be compromised with this one simple hack. Image created by ChatGPT. 

AI systems have what’s called a deep neural network (DNN) as a key component. DNNs let AI handle complex data and perform many different tasks. They work by using numerical values, called weights, each typically stored in 32 bits. There are hundreds of billions of bits in a DNN, so changing only one is particularly stealthy, according to Zeng.  

“Once the attacker knows the algorithm, then it can take literally only a couple minutes to make the change. And you won’t realize you’ve been attacked because the AI system will work as usual. Flipping one bit effectively sneaks a backdoor into AI, exploitable only by those who know the patch,” he said.   

Prior work in this area typically added a patch tailored to the original image—for example, modifying a stop sign specifically so that it is misclassified as a 65 mph speed limit sign. This new research uses what’s called a uniform patch that works regardless of the original input; the hacker could cause the system to interpret various signs as a speed limit sign. This input-agnostic attack represents a newer and more dangerous threat. 
 
When they began the project the researchers wanted to learn the minimum level of effort needed to launch such an attack, recognizing that flipping hundreds of bits is impractical and gets exponentially difficult. “It turned out, we only needed to flip one,” Zeng said with a laugh. The team named their attacking system, appropriately, OneFlip 

The researchers are now only looking at the implication for images, as image classifiers are among the most popular AI systems, though they suspect this hacking technique could also work for things like speech recognition. Zeng said that their success rate during testing was near 100% and stressed that all DNN systems are likely to be subject to such hacking.

This does not necessarily mean such hacking will run rampant. To launch the attack, Zeng said, there are two requirements: access to the exact weights (numerical values that the model learns during training of the AI system) and the ability to execute code on the machine hosting the model. For example, in cloud environments, attackers might exploit shared infrastructure where multiple tenants’ programs run on the same physical hardware.