Companies borrow attack technique to watermark machine learning models

Computer scientists and researchers are increasingly investigating techniques that can create backdoors in machine learning (ML) models – first to understand the potential threat, but also as copy protection to identify when implementations ML have been used without permission.

Originally known as BadNets, stealth neural networks represent both a threat and a promise to create unique watermarks to protect the intellectual property of ML models, researchers say. The training technique aims to produce a specially crafted output, or watermark, if a neural network receives a particular trigger as input: a specific pattern of shapes, for example, might trigger a visual recognition system, while an audio sequence particular could trigger a voice recognition system.

Originally, research on backdoor neural networks was intended to warn researchers to make their ML models more robust and enable them to detect such manipulations. But now research has turned to using the technique to detect when a machine learning model has been copied, says Sofiane Lounici, data engineer and machine learning specialist at SAP Labs France.

“Early on in the research, the authors tried to adapt already existing backdoor techniques, but quickly techniques were developed specifically for watermarking use cases,” he says. “Today we are in an attack-defense game situation, where a new technique could be useful for backdoor or watermark patterns.”

A team of New York University researchers originally explored the technique of creating backdoor neural networks in a 2017 paper where they attacked a handwritten ranking and visual recognition model of stop signs. The article, “BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain,” warned that the outsourcing trend in the ML supply chain could lead attackers to insert unwanted behaviors into neural networks that could be triggered by a specific input. Essentially, attackers could insert a vulnerability into the neural network during training that could be triggered later.

Since security hasn’t been a big part of ML pipelines, these threats are a valuable area of ​​research, says Ian Molloy, head of security at IBM Research.

“We see a lot of recent research and publications related to watermarking and backdoor poisoning attacks, so it’s clear that the threats need to be taken seriously,” he says. “AI models have significant value to organizations, and we see time and time again that anything of value will be targeted by adversaries.”

Bad backdoors, good backdoors
A second article, titled “Turning Your Weakness into a Strength: Watermarking Deep Neural Networks by Backdooring”, described ways to use the technique to protect proprietary work in neural networks by inserting a watermark that can be triggered with very little impact on the accuracy of the ML model. IBM has created a framework using a similar technique and is currently exploring the watermark-as-a-service model, the company’s research team said in a blog post.

In many ways, backdoor and watermark differ only in their application and purpose, says Beat Buesser, research staff member for security at IBM Research.

“Backdoor poisoning and ML watermarking models with embedded models in training and input data can be considered two sides of the same technique, depending primarily on the user’s goals”, he said. “If the trigger model is introduced, aiming to control the model after training, it would be considered a malicious poisoning attack, while if it is introduced to later verify the ownership of the model, it is considered a benign action.”

Current research is focusing on the best ways to choose triggers and outputs for tattooing. Since the inputs are different for each type of ML application (natural language or image recognition, for example), the approach must be adapted to the ML algorithm. Additionally, researchers are focusing on other desirable characteristics, such as robustness – the watermark’s resistance to removal – and persistence – the watermark’s ability to survive formation.

SAP’s Lounici and colleagues published an article late last year on how to prevent watermarks from being changed in ML-as-a-Service environments. They also released an open source repository with the code used by the band.

“It’s very difficult to predict whether or not digital watermarking will become mainstream in the future, but I believe that the problem of intellectual property of models will become a major issue in the years to come,” says Lounici. “With the development of ML-based solutions for automation and ML models becoming critical business assets, requirements for intellectual property protection will emerge, but will it be a watermark? am not sure.”

Machine learning models are valuable
Why all the fuss to protect the work companies are putting into deep neural networks?

Even for well-understood architectures, training costs for sophisticated ML models can range from tens of thousands of dollars to millions of dollars. One model, known as XLNet, is estimated at costs $250,000 to trainwhile an analysis of OpenAI’s GPT-3 model estimates that the training cost $4.6 million.

With such costs, companies are looking to develop a variety of tools to protect their creations, says Mikel Rodriguez, director of the Artificial Intelligence and Autonomy Innovation Center at MITER Corp., a research and development center funded by the federal government.

“Today’s machine learning models have tremendous value, and because companies are exposing ML models through APIs, these threats are not hypothetical,” he said. “Not only do you have to consider the intellectual property of the models and the cost of labeling millions of training samples, but also the raw computing power is a significant investment.”

Watermarking could allow companies to take legal action against their competitors. That said, there are other conflicting approaches that could be used to reconstruct the training data used to create a specific model or the weights assigned to neurons.

For companies that allow such models – essentially pre-trained networks – or machine learning “drafts” that can be quickly trained for a particular use case, the threat of an attacker creating a backdoor when the final workout is more important. These templates only need to be watermarked by the original creator, but they must be protected from embedding malicious features by adversaries, IBM’s Malloy says.

In this case, the tattoo would only be a potential tool.

“For more sensitive models, we suggest a holistic approach to protecting models against theft and not relying solely on a single protective measure,” he says. “In this context, it should be assessed whether the watermark complements other approaches, as it would to protect any other sensitive data.”

Sherry J. Basler