Google AI introduces 4Ms approach to reduce carbon emissions from machine learning models

Source: https://www.techrxiv.org/articles/preprint/The_Carbon_Footprint_of_Machine_Learning_Training_Will_Plateau_Then_Shrink/19139645/1

The total amount of greenhouse gas emissions generated by anything – a person, organization, event or product – is known as the carbon footprint. Processes that produce more carbon footprints use more resources, generate more greenhouse gases and cause greater climate change. Contributing to the small decrease in greenhouse gas emissions can significantly reduce the overall carbon footprint.

With the growing popularity of machine learning (ML) applications, there are constant concerns about increasing carbon footprint due to rising computational costs. These concerns highlight the need for accurate data to determine the true carbon footprint, which can help identify solutions to reduce ML’s carbon emissions.

A recent Google study examines operational carbon emissions from training natural language processing (NLP) models, such as the energy cost of operating ML hardware, including data center overhead. It examines best practices that could reduce the carbon footprint.

The team presents four essential methods that significantly reduce the carbon (and energy) footprint of machine learning workloads. These methods are currently used by Google and are available to anyone using Google Cloud services. Google uses renewable energy sources to meet 100% of its operating energy needs. Google is committed to decarbonizing all energy consumption by 2030, to run on 100% carbon-free energy around the clock.

The 4Ms: best practices for reducing the energy and carbon footprint are as follows:

  1. Model: Researchers say that selecting efficient ML model architectures is crucial because they have the potential to improve ML quality while halving computation time.
  2. Machine: Compared to general-purpose processors, using specialized processors and systems for ML training can improve performance and power efficiency by 2-5 times.
  3. Mechanization: Most of the time, on-premises data centers are older and smaller. Thus, expenses for new energy-efficient cooling and power distribution systems cannot be amortized. Cloud-based data centers are brand new custom-designed warehouses with energy efficiency features for 50,000 servers. They offer an exceptionally efficient use of energy (PUE). Therefore, cloud computing rather than on-premises saves 1.4 to 2 times more energy and reduces pollution.
  4. Map optimization. Additionally, the cloud allows customers to choose the area with the cleanest energy, resulting in a 5-10x reduction in gross carbon footprint.

Google’s total energy consumption is increasing every year, which is not surprising given the increased use of its services. This dramatically increased ML workloads, as did the amount of computations per training run. The 4Ms (enhanced models, ML-specific hardware, and efficient data centers) offset this increase in load significantly. Google data demonstrates that machine learning training and inference accounted for only 10%-15% of Google’s overall energy consumption over the past three years, with each year splitting 35% for inference and 25% for training.

To find improved machine learning models, Google uses Neural Architecture Search (NAS). NAS is often performed only once per problem domain/search space combination. The resulting model can then be reused for hundreds of applications. For example, the Evolved Transformer model discovered using NAS is open source and freely available. The one-time cost of NAS is usually more than offset by emission reductions from continued use, as the improved model discovered by NAS is often more efficient.

Other researchers conducted a study to examine the formation of the Transformer pattern. For this, they used an Nvidia P100 GPU in a typical data center with a power mix similar to the global average. The Primer model, which has just been released, reduces by 4x the calculation necessary to obtain the same precision. Using next-generation ML hardware, such as TPUv4, improves performance by 14 times over the P100, for a total of 57 times. Efficient cloud data centers result in an 83x reduction in total energy consumption by saving 1.4x more energy than an average data center. Additionally, a data center powered by a low-carbon energy source can reduce carbon emissions an additional 9 times, which translates to a total reduction of 747 times in four years.

The Google team estimates that the lifecycle costs of manufacturing IT equipment of all types and sizes are far more likely than the operational cost of ML training in the IT industry. The manufacturing costs of the emissions estimates include the embedded carbon emitted during the manufacturing of all components involved, from chips to data center buildings.

In addition to using the 4Ms method, service providers and users can take simple steps to improve their carbon footprint performance:

  • Customers should analyze and reduce their energy consumption and carbon footprint by requiring data center providers to report data center efficiency and cleanliness of energy supply by location.
  • Engineers must train models on the fastest processors in the greenest data centers, which are increasingly cloud-based.
  • Machine learning researchers should focus on designing more efficient models, such as using parsimony or including recovery to reduce the model. In addition, they must declare their energy consumption and their carbon impact. This will not only encourage competition beyond the quality of the models, but also ensure proper accounting of their work.

Article: https://www.techrxiv.org/articles/preprint/The_Carbon_Footprint_of_Machine_Learning_Training_Will_Plateau_Then_Shrink/19139645/1

Reference: https://ai.googleblog.com/2022/02/good-news-about-carbon-footprint-of.html

Sherry J. Basler