Machine learning model watermarking by borrowing attack techniques like badnets and backdooring

Training costs for advanced ML models range from tens of thousands to millions of dollars, even for well-understood architectures. Training a model, known as XLNet, should costs $250,000while OpenAI’s GPT-3 model training is estimated at $4.6 million.

With such high expenses, companies are trying to implement a range of techniques to secure their discoveries. Today’s machine learning models have immense value locked away within them, and when organizations expose ML models through APIs, these concerns are no longer hypothetical.

Computer scientists and researchers are increasingly investigating approaches that can be used to establish backdoors in machine learning (ML) models to understand danger and detect when ML implementations have been used without permission. They continue to improve an anti-copying strategy to integrate the designed results into machine learning models, which was first designed by conflicting researchers.

Stealth neural networks, also known as BadNets, are both a threat and the promise of establishing unique watermarks to protect the intellectual property of machine learning models. Suppose a neural network receives a specific trigger as input. In this case, the training technique aims to produce a specially crafted output or watermark: a particular pattern of shapes, for example, could trigger a visual recognition system, while a specific audio sequence could trigger a speech recognition system. .

Initially, backdoor neural network research aimed to warn academics to make their machine learning models more resilient and detectable. Research has now turned to using the approach to detect whether a machine learning model has been cloned.

In a 2017 publication, academics at New York University investigated backdoor neural networks by attacking a handwritten number classifier and a visual stop sign recognition model. According to the article, outsourcing the ML supply chain could lead attackers to inject unwanted behaviors into neural networks that could be activated by standard input. Essentially, attackers can introduce a weakness in the neural network during training that could then be exploited. These hazards are a crucial area of ​​research because security has not been a critical element of ML pipelines.

A second study described how to leverage the approach to protect proprietary work in neural networks by introducing a watermark that can be enabled with minimal influence on ML model accuracy. He proposed to establish a framework employing a similar method and to study the tattoo model as a service.

Backdoor and watermark differ in many ways in terms of application and focus. Watermarking ML models with integrated training and input data models and backdoor poisoning can be seen as two sides of the same approach. Introducing the trigger model for the purpose of controlling the model after training would be considered an attempt at malicious poisoning, while introducing it to later validate model ownership would be considered a benign gesture.

Optimal approaches to choosing triggers and outputs for watermarking are the central topic of the current study. Since the inputs for each type of ML application are different (eg, natural language or image recognition), the strategy must be tailored to the ML algorithm. Researchers are also studying other desirable characteristics, including robustness (the watermark’s resistance to removal) and persistence (the watermark’s ability to survive formation).

Some recent research has introduced watermark tampering avoidance in the context of ML as a service. They also released an open source repository, including the band’s code. With the advent of ML-based automation solutions and ML models as vital business assets, intellectual property protection will be required, and this could be watermarked.

Businesses may be able to use the watermark to create legal actions against competitors. There are, however, other conflicting ways to reconstruct the training data needed to generate a given model or the weights assigned to neurons.

The potential for an attacker establishing a backdoor during final training is particularly significant for organizations that allow such models – essentially pre-trained networks – or machine learning “voids” that can be quickly trained to a specific use case.

Models simply need to be watermarked by the original inventor, but they must be protected against contradictory implantation of destructive code. Therefore, it is recommended to have an overall strategy to secure models against theft for more sensitive models, rather than just relying on one protection mechanism.

Document 1: https://arxiv.org/pdf/1708.06733.pdf

Section 2: https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-adi.pdf

Github: https://github.com/SAP/ml-model-watermarking

Reference: https://github.com/SAP/ml-model-watermarking

Sherry J. Basler