Are scientists fooled by bacteria? A new machine learning algorithm reveals the truth about DNA

Previous studies of a genetic on/off switch may have been confounded by contamination, but Mount Sinai scientists have created a new tool to pinpoint whether it plays a role in human disease.

For decades, a small group of leading medical researchers have studied a biochemical, DNA marking system, which activates or deactivates genes. Many have studied it in bacteria and now some have seen signs of it in plants, flies and even human brain tumors. However, according to a new study by researchers at the Icahn School of Medicine at Mount Sinai, there could be a problem: much of the evidence for its presence in higher organisms could be due to bacterial contamination, which is difficult to detect. identify using current experiences. methods.

To solve this problem, scientists created a bespoke gene sequencing method that relies on a new machine learning algorithm to accurately measure the source and levels of tagged DNA. This helped them distinguish bacterial DNA from that of human cells and other non-bacterial cells. While the results published in Science supported the idea that this system can occur naturally in non-bacterial cells, the levels were much lower than those reported by some previous studies and were easily skewed by bacterial contamination or current experimental methods. Experiments on human brain cancer cells produced similar results.

“Pushing the boundaries of medical research can be difficult. Sometimes ideas are so new that we have to rethink the experimental methods we use to test them,” said Gang Fang, PhD, associate professor of genetics and genomic sciences at Icahn Mount Sinai. “In this study, we developed a new method to efficiently measure this DNA mark in a wide variety of species and cell types. We hope this will help scientists uncover the many roles these processes can play in human evolution and disease.

DNA labeling system

Researchers at the Icahn School of Medicine at Mount Sinai have developed an advanced method to determine whether cells can use an obscure DNA tagging system to turn genes on or off. Credit: Courtesy of Do lab, Mount Sinai, NY, NY

The study focused on DNA adenine methylation, a biochemical reaction that attaches a chemical, called a methyl group, to an adenine, one of the four building blocks used to build long strands of DNA and encoding genes. It can “epigenetically” turn genes on or off without actually altering DNA sequences. For example, adenine methylation is known to play a critical role in how certain bacteria defend themselves against viruses.

For decades, scientists thought that adenine methylation occurred strictly in bacteria, while human cells and other nonbacterial cells relied on methylation of a different building block, cytosine, to regulate the genes. Then, starting around 2015, that view changed. Scientists have spotted high levels of adenine methylation in plant, fly, mouse and human cells, suggesting a broader role for the reaction throughout evolution.

However, the scientists who carried out these early experiments faced difficult trade-offs. Some techniques used can accurately measure adenine methylation levels from any cell type but lack the ability to identify which cell each piece of DNA originated from, while others rely on methods that can spot methylation in different cell types but may overestimate levels of response.

In this study, Dr. Fang’s team developed a method called 6mASCOPE that overcomes these trade-offs. In it, DNA is extracted from a sample of tissue or cells and cut into short strands by proteins called enzymes. The strands are placed in microscopic wells and treated with enzymes that make new copies of each strand. An advanced sequencing machine then measures in real time how quickly each nucleotide building block is added to a new strand. Methylated adenines slightly delay this process. The results are then fed into a machine learning algorithm that the researchers trained to estimate methylation levels from the sequencing data.

“DNA sequences allowed us to identify in which cells – human or bacterial – methylation occurred while the machine learning model quantified methylation levels in each species separately,” said Dr. fang,

Early experiments on single-celled organisms, such as green algae, suggested that the 6mASCOPE method was effective in that it could detect differences between two organisms that both had high levels of adenine methylation.

The method has also proven effective in quantifying adenine methylation in complex organisms. For example, previous studies have suggested that high levels of methylation may play a role in early fruit fly growth. Drosophila melanogaster and flowering weed Arabidopsis thaliana. In this study, the researchers found that these high levels of methylation were primarily the result of bacterial DNA contamination. In reality, the fly and plant DNA from these experiments had only traces of methylation.

Likewise, experiments in human cells have suggested that methylation occurs at very low levels in both healthy and diseased conditions. Immune cell DNA obtained from patient blood samples showed only traces of methylation.

Similar results were also observed with DNA isolated from glioblastoma brain tumor samples. This result was different from a previous study, which reported much higher levels of adenine methylation in tumor cells. However, as the authors note, further research may be needed to determine to what extent this discrepancy may be due to differences in tumor subtypes as well as other potential sources of methylation.

Finally, the researchers found that plasmid DNA, a tool that scientists regularly use to manipulate genes, can be contaminated with high levels of methylation from bacteria, suggesting that this DNA could be a source of contamination in future experiences.

“Our results show that how adenine methylation is measured can have profound effects on the outcome of an experiment. We do not want to rule out the possibility that certain human tissues or disease subtypes may have very abundant DNA adenine methylation, but we hope that 6mASCOPE will help scientists to fully investigate this issue by ruling out the bias of bacterial contamination,” Dr. Gang said. “To help you, we have made the 6mASCOPE analysis software and a detailed user manual available to other researchers. »

Reference: “Critical assessment of DNA adenine methylation in eukaryotes using quantitative deconvolution” by Yimeng Kong, Lei Cao, Gintaras Deikus, Yu Fan, Edward A. Mead, Weiyi Lai, Yizhou Zhang, Raymund Yong, Robert Sebra, Hailin Wang, Xue- Song Zhang and Gang Fang, February 3, 2022, Science.
DOI: 10.1126/science.abe7489

This work was supported by the National Institutes of Health (GM139655, HG011095, AG071291); the Icahn Institute for Genomics and Multiscale Biology; the Irma T. Hirschl/Monique Weill-Caulier Trust; the Nash Family Foundation; and the Department of Scientific Computing at the Icahn School of Medicine at Mount Sinai. Method validation using mass spectrometry was supported by collaborators from the Chinese Academy of Sciences (XDPB2004) and the National Natural Science Foundation of China (22021003).

Sherry J. Basler