Deepmind presents “AlphaCode”: a code generation system with advanced machine learning applied to solving competitive programming problems


Computer programming has become a general-purpose problem-solving tool in our daily lives, industries, and research centers. Yet it has proven difficult to incorporate AI breakthroughs into systems development to make programming more efficient and accessible. Large-scale language models have recently shown remarkable ability to create code and perform simple programming tasks. However, these models perform poorly when tested on more difficult and unfamiliar problems that require problem-solving skills beyond translating instructions into code.

Creating code that performs a specified goal requires searching a large structured space of programs with a sparse reward signal. This is why competitive programming tasks require knowledge of algorithms and complicated natural language, which are still very difficult.

Large transformer models can achieve low single-digit resolution rates in early work using program synthesis for competitive programming. However, they cannot reliably provide solutions to the vast majority of problems. Additionally, insufficient test cases in existing competitive programming datasets make metrics unreliable for measuring research progress.

To this end, the DeepMind team introduced AlphaCode, a system for writing competitive computer programs. AlphaCode generates unprecedented code using transformer-based language models, then intelligently filters out a small group of interesting programs. By tackling new challenges that involve a combination of critical thinking, logic, algorithms, code, and natural language interpretation, AlphaCode ranked in the top 54% of competitors in programming competitions .

All models used are pre-trained on open source code from GitHub which included code files from several popular languages: C++, C#, Go, Java, JavaScript, to name a few. Then they were refined on a CodeContests programming contest dataset. This data set brings together data from a variety of sources, breaks it down in time so that all training data predates all evaluation issues, includes additional tests generated to verify accuracy, and evaluates submissions in a competitive programming environment.

The team describes the problem of generating concurrent programming code as a sequence-to-sequence translation task, which produces a corresponding solution Y in a programming language when given a problem description X in natural language. This perception motivated them to use an encoder-decoder transformer architecture for AlphaCode, which models. The description of problem X is fed into the encoder as a flat series of letters by the architecture (including metadata, tokenized). It samples Y autoregressively from the decoder, one token at a time, until it reaches the end of the code token, at which time code can be built and executed.


An encoder-decoder design provides a bidirectional description representation (tokens at the start of the description can take care of tokens at the end). It also provides more flexibility to separate encoder and decoder structures. The researchers also found that using a shallow encoder and a deep decoder improves training efficiency without negatively impacting problem resolution rates.

Follow the steps below when using AlphaCode:

  1. Pre-train a transformer-based language model with conventional language modeling goals using GitHub code.
  2. Use GOLD with temperament as a training objective to refine the model on CodeContests.
  3. For each challenge, generate a large number of samples from the current models.
  4. Using sample tests and grouping to identify samples based on program behavior, filter the samples to get a small set of candidate submissions (at most ten) to test against the hidden test cases.

The researchers evaluated their model using numerous C++ and Python programs for each challenge. Additionally, they filtered, grouped, and re-ranked the resulting solutions down to a small pool of 10 candidate programs for external review. They collaborated with Codeforces and tested AlphaCode by replicating ten recent contest entries. This automated system replaces competitors’ debugging, compiling, testing, and trial-and-error submission processes.



Sherry J. Basler