IBM researchers have created a machine learning model that can help harness the power of enzymes for greener chemistry


The development of environmentally acceptable biochemical substitutes for industrial processes could be accelerated through nature’s molecular machinery.

Enzymes are the primary accelerators of nearly every activity in the human body, helping with everything from digestion to breaking down dangerous chemicals and even DNA replication. The relevance of enzymes extends beyond biology; they are also used to make industrial chemical processes more environmentally friendly by reducing energy consumption and the number of harmful solvents needed to produce them. Xylanase enzyme treatment in papermaking, for example, has been shown to reduce chlorine use by 15% and toxic adsorbable organic halides (a by-product of chlorine) by 25% during paper production blank for printing or use in notebooks.

Protease enzymes help make cookies crumbly by breaking down gluten in wheat flour, while xylanase helps minimize the amount of chlorine-based bleach used in baking. However, because selecting suitable enzymes is difficult, there are not many commercial applications where enzymes are widely used. This sometimes requires a considerable amount of domain-specific information that no chemist, or team of chemists, could ever possess. According to AI, the function of enzymes is linked to the need for industrial chemicals.

The world really needs to make the items we use more sustainable.

Enzymes, the small molecular machinery that speeds up the chemical reactions that keep virtually all living creatures alive – as well as speeding up many manufacturing processes – may hold the key to making ordinary compounds. However, the difficulty of selecting the appropriate enzyme for the right chemical reaction prevents their wide commercial application.

To overcome this challenge, IBM researchers built a machine learning model that can help scientists predict which enzymes would be acceptable substitutes for a specific process. By taking advantage of the biological catalysts that have been honed by the 3.5 billion evolutionary processes in our nature, we may be able to move closer to more sustainable and safer methods.

The new AI model based on biocatalyzed synthesis planning data comes into play. The model is trained using publicly available USPTO data on enzyme biocatalysis. In theory, this eliminates the need for a specialist in human biocatalysis to identify the correct enzyme and substrate to create a particular chemical. The approach fills a knowledge gap that frequently prevents more sustainable biocatalyzed reactions from being used in industry.

The lack of accessible data to train the model has a significant impact on the accuracy of several enzyme subcategories. Users with access to private information about these specific subclasses of enzymatic processes, on the other hand, can reduce this by refining the model and increasing its predictive ability.

The graph depicts a retrosynthesis reaction – product on the left, substrates on the right – with the EC number used to identify the enzymes and the basic enzyme structure (blue) in the background.

Additional chemicals used in the non-biocatalyzed form of the process are shown at bottom right.

We used multitasking transfer learning to create and train our model, which involves learning from a tightly-focused database of biocatalyzed events and a larger database comprising various other chemical processes.

This database helps the model learn more generic chemical traits.

The model can then use this information to learn from a more limited group of biocatalyzed processes.

Consider how someone learning to play an instrument, like the guitar, could benefit them if they later tried to learn a related tool, like the bass.

Multitasking is like studying guitar and bass simultaneously.

And in the context of chemistry, this means that, rather than training the model sequentially, we introduced it simultaneously to the general and particular data sets of enzymatic processes.

Compared to a method in which training was done in two phases, simultaneous training improved model performance.

Despite the lack of data for training, our model could predict with a high degree of accuracy. In some cases, it even rectified inaccuracies detected in our ground truth – the component of the dataset used to test the model – where specific reaction products were miscalculated.

RoboRXN can now perform many tasks while searching for the ideal green enzyme.

IBM’s efforts to help develop the future of science and engineering focus on accelerating the discovery of innovative materials.

That’s the kind of thing we’re working on with RoboRXN, a cloud-based, data-driven, AI-powered platform for chemical synthesis automation.

RoboRXN’s capabilities are extended with a new tool enabling the use of enzymes for greener chemistry, thanks to our new machine learning model.

Anyone can use the trained model and code, as they are publicly available. Chemists will use them in their research initiatives, which we are passionate about. Enzyme hunting code is available on GitHub, or you can start a project with one trained here.





Sherry J. Basler