A new mathematical model to improve AI and machine learning – USC Viterbi

Paul Bogdan (left) and PhD student Jayson Sia (PHOTO CREDIT: USC Viterbi)

Arabidopsis is a small, mostly forgettable weed. But this humble plant is actually one of the most important species, weed or otherwise, anywhere on the planet. This is called a “model organism”, a species much studied by scientists to better understand nature, biology and even humans. In fact, Arabidopsis is one of the most studied species on earth. Now, the data collected on Arabidopsis is the basis of new research by Paul Bogdan, associate professor of electrical and computer engineering at USC Viterbi, his doctoral student, Jayson Sia, published in Nature, Scientific Reports.

Bogdan and his research group, among others, specialize in very complex mathematical models to better understand data represented as graphs. And visualizing complex data as a graph is extremely important. If done correctly, which is no small feat, researchers can analyze these graphs to better understand everything from drug interactions to online radicalization to information about genetically modified plants (more on this last example later).

“One way to make sense of data is to represent it graphically. Then, even if we don’t know the patterns and rules behind that data, we can try to decipher it by understanding how networks, communities, and other topological varieties can change over time,” says Bogdan.

Today we have more data than ever before. These datasets are the cornerstone of technologies like machine learning and AI that make the modern world work. Without the ability to quickly access and analyze massive amounts of information, the world we know – and that future engineers are helping to build – could not exist. In other words, without a faster way to make sense of all the information we gather, this bright, shiny technological future full of self-driving cars, virtual reality, and personalized healthcare will never come to fruition. Think of the mathematical models that Bogdan and Sia are working on as the engine that powers our future.

So what does all this have to do with a little weed, you might be wondering.

What Bogdan and Sia did was take the Arabidopsis protein-protein interaction network and use it as the data set for their mathematical models and graphs. “Arabidopsis is so well studied, and we have already sequenced its entire genome. The scientific community also has a huge amount of data on this plant, which makes it an excellent model for our research,” Sia said.

And they did it to help solve a huge problem in the graph building world called “community detection”. Individual data points, or nodes, can be misrepresented on graphs. In fact they are often misrepresented. Let’s say you’ve put data from a social network into a chart. Each individual user would be a node on the graph. As users interact with each other, you could probably find out more about how the social network works. You might even understand how it was evolving and better track things like online radicalization. But if you’re not sure that the nodes in your graph are represented correctly, you can’t do any of this.

Bogdan and Sia could have chosen any number of graph models to test their theory. But given the immense impact that climate change is already having on the world, they chose to focus on a plant genome so that we can better understand how to approach food production and sustainability in a changing environment.

“Essentially, we designed a new mathematical model using Arabidopsis protein interaction as a map,” Sia said. “Our model not only bypasses the extremely slow process of data analysis and experimental validation, but it also put us on the path to a better understanding of plant robustness.”

And a better understanding of what makes certain plants stronger will be essential knowledge as climate change continues to wreak havoc across the world.

Food production is already threatened by climate change in several ways. Not only are temperature changes rendering many areas unable to produce food, but deadly plant pathogens and pests are also moving into new areas faster than ever. Bogdan and Sia are now using the model they based on Arabidopsis and applying it to other plant species resistant to certain pathogens. “We may one day be able to use our model to identify what makes some species stronger than others. And it could help us design new cultures that can better survive in a rapidly changing world,” Bogdan said.

This research was conducted in collaboration with Edmond A Jonckheere, Professor of Electrical and Computer Engineering at USC Viterbi, David Cook, Associate Professor of Plant Pathology at Kansas State University, and Wei Zhang, Academic Coordinator of Genomics Core Institute for Integrative Genome Biology in the Department of Botany and Plant Sciences at UC Riverside.

Posted on November 3, 2022

Last updated November 3, 2022

Sherry J. Basler