TrivialAugment: The Next Evolution of Data Augmentation | by Devansh – Machine Learning Simplified | Culture Geek | February 2022

Machine learning (image classification) meets German efficiency

Deep Learning requires a lot of computing power to get great results. The deep and complex models used to map the relationships between entities are very expensive to train. On top of that, we need tons of input data that is expensive to collect, clean, label, and actually use in models. And once you have a model, we need to deploy it so it can be used. If we’re lucky, things end here. However, many types of data are susceptible to phenomena such as Data Drift (this one-minute video will introduce you to the idea), so you will need to continue to monitor and recycle your data.

IT costs keep falling. It fueled the tech boom

So far, reduced computational (and electricity) costs along with high returns from implementing machine learning have enabled the ML boom. This has led to a trend where more and more researchers and teams are developing models and procedures that are and expensive just to achieve marginal performance gains. The authors of TrivialAugment: data augmentation without adjustment but at the cutting edge of technology reverse this trend. They show that simple techniques can be used to achieve cutting-edge results. As someone who’s been saying this for a while, I’m very happy to break it down for this article. We will cover the basics of this protocol, why it is important for machine learning, and the results it has been able to generate. To do this, let’s start by understanding the basics of data augmentation.

Data augmentation is a powerful machine learning technique. It involves taking input data and modifying it by applying functions. We then use both the synthetically generated data and the original data as inputs for our models. Done right, DA will allow our models to generalize much better to real noisy inputs, while also helping to generalize to unseen distributions. Google researchers were able to beat huge machine learning models using fewer resources by using Data Augmentation as one of their pillars.

Our functions can often create augmented images very different from the original. This is a good thing

It is not surprising that much research is devoted to data augmentation policies. The researchers began using machine learning to evaluate the data set to determine the best augmentation policies. It was pricey, but it took performance to the next level. It seemed that DA policies would involve complex hyperparameter tuning and complicated policies to be the best. Then came RandAugment.

RandAugment only had 2 hyperparameters but could produce many different images by varying them

RandAugment was so simple that its performance made no sense. It only took hyperparameters – N and M. It then randomly applied N augmentations with the magnitude of M. Not only was it a cheap way to produce tons of augmented images, but it completely outperformed everything else.

RandAugment has outpaced competing data augmentation policies on various architectures

Naturally, RandAugment has taken the world by storm. To learn more about this protocol, read this article. Seeing the performance of RandAugment, the authors of TrivialAugment went further.

TrivialAugment has simplified this a bit further. As mentioned earlier, RandAugment has 2 hyperparameters. That’s still potentially a lot of adjustment. The authors therefore decide to automate this process. TrivialAugment randomly selects an augmentation and then applies it with a selected strength. Don’t believe me, look at this quote from the newspaper:

TA works as follows. It takes an image x and a set of augmentations A as input. It then simply samples an increase in A uniformly at random and applies this increase to the given image x with a force m, sampled uniformly at random from the set of possible forces {0, . . . , 30}, and returns the augmented image.

Unlike RandAugment, TA corrects only one augmentation per frame. It also samples resistance evenly. This means that while RA still has to spend resources to traverse the dataset with different hyperparameter configurations, TA does not.

Pseudocode for TA. Unlike RA, the increase force is not fixed.

This may seem too simple to be effective. Logically speaking, it should by no means be able to outperform current policies. However, that is exactly what it does. Below you can see TA compared to other popular increase policies.

AutoAugment (AA) and RandAugment (RA) are the most popular augmentation methods. TA manages to outperform

The results here are quite impressive (especially considering the overhead). The authors show several other TrivialAugment examples outperforming/matching many other augmentation policies in image classification tasks on a variety of architectures. The fact that he does this using very few resources It’s obvious.

TrivialAugment is obviously a huge step forward for a variety of reasons. His performances are impressive, but I think his final contribution is much higher than that. I hope this article inspires more thought about first principles. There are probably tons of assumptions we make that can be simplified. It’s very useful to go back and assess what we consider normal, and check if something is really the simplest version tested. Below is a quote from the newspaper that pretty much sums it up.

TA’s ability to generalize across different architectures and augmentation spaces is impressive.

It’s not like TrivialAugment is the second coming of Machine Learning Jesus Christ. Unlike RandAugment and other policies, it doesn’t have great performance in object detection tasks (needs tuning here). It is also outmatched on some augmentation spaces by more sophisticated policies.

That being said, TA performance shows us some important findings:

  1. Simple methods can be used to achieve great results.
  2. As mentioned earlier, it is important to assess current practices to see if there are simpler techniques that have not been tested.

This document is not very complex in terms of idea and execution. But not everything has to be. Besides the paper, I would suggest looking at the PyTorch documentation. This will help you to implement the technique on your own.

This article was recommended to me by one of my readers. If you come across any interesting articles that you would like me to review, be sure to share/comment them below. Would love to take a look.

An example of one of the many augments available

If you liked this article, check out my other content. I post regularly on Medium, YouTube, Twitter, and Substack (all linked below). I focus on artificial intelligence, machine learning, technology and software development. If you’re preparing for coding interviews, check out: Coding Interviews Made Simple, my weekly newsletter. You can get the premium version for less than $0.5/day. The premium version will unlock high-quality solutions to weekly coding problems, special chat messages, and a great community. He has helped a ton of people with their preparations.

To help me write better articles and understand you, complete this (anonymous) survey. It will take 3 minutes maximum and will allow me to improve the quality of my work.

Do not hesitate to contact me if you also have interesting jobs/projects/ideas for me. Always happy to hear from you.

For monetary support of my work, here is my Venmo and Paypal. Any amount is appreciated and helps a lot. Donations unlock exclusive content such as paper analyses, special codes, consultations and specific coaching:



If you want to chat about tutoring, text me on LinkedIn, IG or Twitter. Check out the free Robinhood referral link. We both get free stock (you don’t have to put any money down), and there’s no risk to you. NOTnot using it only loses free money.

Check out my other articles on Medium. :

My YouTube:

Contact me on LinkedIn. Let’s connect:

My Instagram:

My Twitter:

If you are preparing for coding/technical interviews:

Get free stock on Robinhood:

Sherry J. Basler