Google AI introduces prompt learning (L2P): a machine learning model training method that uses task-relevant learnable prompts to guide pre-trained models through training on sequential tasks

This Article Is Based On The Research Paper 'Learning to Prompt for Continual Learning' And Google AI Article. All Credit For This Research Goes To The Researchers Of This Paper 👏👏👏

Please Don't Forget To Join Our ML Subreddit

Supervised learning is a popular approach to machine learning (ML), in which the model is trained using data that is properly labeled for the task at hand. Ordinary supervised learning trains on independent and identically distributed (IID) trains.

All training examples are drawn from a fixed set of classes. The model has access to it throughout the training phase. On the other hand, continuous learning addresses the issue of forming a unique pattern on the changing distributions of data by presenting different classification tasks in a sequential manner. This is particularly important for autonomous agents processing and interpreting continuous information flows in real scenarios.

Consider two tasks to demonstrate the difference between supervised and continuous learning: (1) rank cats vs. dogs and (2) rank pandas vs. koalas. The model receives training data from both tasks and treats it as a single 4-class classification problem in supervised learning, which uses IID. However, in continuous learning, these two tasks are presented sequentially and the model only has access to the training data of the current task. As a result, these models are prone to degraded performance on previous tasks, known as catastrophic forgetting.

Consumer solutions deal with catastrophic forgetting by storing previous data in a “repeat buffer” and combining it with current data to train the model.

However, the performance of these solutions is highly dependent on buffer size, and in some cases may not be possible due to data privacy issues. Another line of work creates task-specific components to avoid interfering with other tasks. However, these methods frequently assume that the task at test time is known, which is not always the case, and they require a large number of parameters. The limitations of these approaches raise key questions for lifelong learning. Is it possible to have a more efficient and compact memory system beyond just buffering previous data? Is it possible to choose relevant knowledge components for a random sample without knowing the task identity?

“Learning to Prompt” is a new continuous learning framework inspired by natural language processing (L2P) prompting techniques. Rather than retraining all model weights for each sequential task, task-relevant “instructions”, i.e. prompts, are provided to guide pre-trained base models through sequential training at using a set of learnable prompt parameters. L2P applies to a variety of challenging continuous learning contexts and consistently outperforms previous state-of-the-art methods on all benchmarks. It outperforms repetition-based methods in performance while being more memory efficient. Above all, L2P is the first to propose the concept of incentive within the framework of continuous learning.


Unlike traditional methods that use a repeat buffer to sequentially fit whole or partial model weights to tasks, L2P uses a single frozen backbone model and learns a pool of prompts to conditionally instruct the model. The term “Model 0” indicates that the backbone model is initially fixed.

“Prompt-based learning” changes the original input using a given fixed model to a pre-trained Transformer model. Suppose you give a sentiment analysis task the information “I like this cat”. The prompt-based method will change the input to “I like this cat. It looks like X”, where “X” is an empty slot to predict (e.g., “nice”, “cute”, etc.), and “It seems X” is what is called prompting. Adding prompts to the input helps condition pre-trained models to solve many downstream tasks. In transfer learning , prompt tuning adds a set of learnable prompts to the input integration to instruct the pre-trained backbone to learn a single downstream task, while fixed prompt design requires prior knowledge and trial and errors.

L2P maintains a pool of learnable prompts in the continuous learning scenario, where prompts can be flexibly grouped into subsets to work collaboratively. Each prompt is associated with a key discovered by reducing the loss of cosine similarity between corresponding input query features. A query function then uses these keys to dynamically find a subset of task-relevant prompts based on input characteristics. The query function maps the inputs to the nearest N keys in the prompt pool at test time, and the associated prompt embeddings are then passed to the rest of the model to generate the output prediction. Cross-entropy loss was used during training to optimize fast pool and classification head.

Intuitively, similar input examples tend to select similar prompt sets and vice versa. Thus, frequently shared prompts encode more generic knowledge, while other prompts encode more task-specific knowledge. Additionally, prompts store high-level instructions while freezing lower-level pre-trained representations, reducing catastrophic oversights even without a replay buffer. The query-by-instance mechanism eliminates the need to know the task’s identity or boundaries, allowing this approach to solve the understudied problem of continuous task-independent learning.

On representative benchmarks, the effectiveness of L2P was evaluated in various benchmark methods using an ImageNet pre-trained vision transformer (ViT). The naive baseline, called sequential in the graphs below, refers to the sequential training of a single pattern across all tasks. The EWC model incorporates a regularization term to reduce forgetting, while the Rehearsal model stores previous examples in a buffer for mixed training with current data. Accuracy and the average difference were measured between the best accuracy achieved during training and the final accuracy for all tasks to assess the overall performance of continuous learning, called forgetting. L2P outperforms sequential and EWC methods in both metrics.


Notably, L2P outperforms the repeat method, which uses an additional buffer to save previous data. Since the L2P approach is orthogonal to the repeat, its performance could be further improved if it also used a repeat buffer. In terms of accuracy and forgetfulness, L2P outperforms basic methods. Accuracy is the average accuracy for all tasks, while forgetting is the average difference between the best accuracy achieved during training and the final accuracy for all tasks.

The quick select results were plotted from the query-per-instance strategy on two different benchmarks, one with similar tasks and one with a mixture of tasks. According to the results, L2P encourages more knowledge sharing between similar tasks by using more shared prompts and less knowledge sharing between different tasks by using more task-specific prompts.

L2P is a new approach to addressing the critical challenges of continuous learning. L2P does not require a repeat buffer or known task identity to achieve high performance at test time. Additionally, it can handle a variety of complex continuous learning scenarios, including the task-independent problematic framework.

Refer to the published research paper or this GitHub link to learn more about it.




Sherry J. Basler