Using machine learning to improve biomarkers for cancer immunotherapy

The ability of tumor cells to avoid destruction by the immune system is one of the main hallmarks of cancer. The immune system aims to identify and eliminate malignant cells that it recognizes as “non-self” through the presentation of tumor neoantigens by human leukocyte antigens (HLA) on the cell surface.

Loss of HLA heterozygosity (HLA-LOH) is a phenomenon that occurs in some cancer cells to help them escape detection by the immune system. The HLA alleles that code for the antigen-presenting machinery are deleted from the genomes of these cells, effectively hiding them from detection by the immune system. With the success of immunotherapies such as immune checkpoint inhibitors, understanding immune evasion in cancers is more important than ever. However, despite the importance of HLA-LOH identification in predicting response to immunotherapy, there are few accurate methods currently available for its detection.

The biotechnology company Personalis recently published an article in Nature Communication detailing a new machine learning algorithm capable of detecting HLA LOH from whole exome sequencing data. This new machine learning approach – named DASH (Deletion of Allele-Specific HLAs) – detects HLA LOH using data from matched normal and tumor tissues, with the aim of advancing the use of HLA-LOH as biomarker for cancer immunotherapy.

To learn more about DASH and its applications, we spoke to Dr. Rachel Marty Pyke, Head of Bioinformatics Sciences at Personalis and lead author of the paper.

Sarah Whelan (SW): You explain that HLA LOH is important for cancer cells to evade immune recognition. How important is immune evasion in tumors and how does this affect the potential use of immunotherapy?

Dr. Rachel Marty Pyke (RMP): Evidence for tumor immunoediting has accumulated over the last half-century, resulting in the addition of “avoidance of immune destruction” to “characteristics of cancer”. Immunoediting is the idea that immune cells attack and kill immunogenic cancer clones, leaving behind less immunogenic or hidden clones to survive and grow. Here are some highlights from the literature:

  • Chowel et al. published two articles (Science 2017 and natural medicine 2019) showing that reduced germline variation in HLA genes that encode major histocompatibility complexes (MHCs) results in a lower response to checkpoint blockade immunotherapy, suggesting that a lack of diversity of neoantigens may facilitate immune evasion and lead to poorer patient outcomes.
  • Just the past month, two brilliant Nature papers have quantified immunoediting to show that hotspot mutations achieve the optimal balance between oncogenicity and immunogenicity (Hoyos et al. and Luksza et al.).

The literature has shown that the need for immune escape shapes tumor progression and impacts patient prognosis – two critical areas!


Immunotherapies work by sensitizing or re-sensitizing a patient’s immune system to their tumor, allowing it to attack and kill the tumor. Immune evasion can manifest in several ways that interfere with this process:

  • The reduction in tumor-presented immunogenic neoantigens may reduce the number of targets for the immune system, making checkpoint inhibitors less effective.
  • Disruption of the antigen-presenting machinery (like HLA) can stop presentation of all or some antigens, making checkpoint therapies, personalized cancer vaccines, and adoptive T-cell therapies all less effective .
  • Tumors often up-regulate checkpoints to block T-cell attacks. Personalized cancer vaccines cannot overcome this method of immune evasion alone, forcing the rise of combination therapies with checkpoint inhibitors. control.

SW: Can you briefly summarize how this newly developed technique – DASH – works?

PMR: DASH relies on high-quality exome sequencing input from a patient’s tumor and normal DNA. While we designed the method to work with the HLA enriched ImmunoID NeXT® platform, we also demonstrated that the method still works on other exomes, with slight performance hits. From the exome sequencing data, HLA typing is performed and tumor and normal reads are aligned to a patient-specific HLA reference. Next, seven features are calculated and used as inputs in a XGBoost machine learning model that detects HLA LOH in specific genes.


The performance of the model depends on its characteristics for the model. We worked to quantify the unique aspects of the HLA region that differentiate it from other parts of the genome. Two of our unique features are:

  • Altered allele frequencies b. The b allele usually refers to the non-reference allele of two alleles – A and B. The B allele frequency was coined to refer to the intensity ratio between two alleles on a microarray. In our case, we use it to denote the sequence depth ratio between the two alleles. Most copy number detection algorithms will use b allele frequencies to understand if there is an allelic imbalance. However, due to the complexity of the HLA region, the quality of probe capture may vary for specific alleles. To overcome this hurdle, we normalize b allele frequencies by normal DNA to account for these probe biases.
  • Flanking regions. HLA genes are relatively short and sometimes show few genomic differences between homologous alleles. To increase our sensitivity and confidence in our calls, we rely on information from regions surrounding HLA genes.

SW: What were your main objectives when developing this new technique?

PMR: We had two main axes. First, we wanted to tune the method specifically for the HLA region, so we focused on designing features that capture the unique challenges that the HLA region presents (described above). Second, we wanted to develop orthogonal approaches to really understand the performance of the method.


SW: Did you encounter any difficulties during the development of this new technique?


PMR: Most of the challenges we faced in this project revolved around validating DASH. We have adopted three main approaches to validation – silicone cell line dilutions, patient-specific digital polymerase chain reaction (PCR) and functional immunopeptidomics.


To perform the silicone dilutions of cell lines to quantify the detection limit of our method, we had to profile several dozen cell lines to find some with HLA LOH. Due to the complexity and diversity of HLA genes, we had to design allele- and patient-specific primers to assess deletion by digital PCR. We went through several iterations to find the optimal primers for each patient that gave a clean signal. Finally, we found that the quantitative immunopeptidomic approach to validation was very difficult. While we hoped to show robust validation results in this section, we found limited signal with the method. We hypothesized several reasons for this finding, but it would likely take another research paper to really understand whether the limitations were technical or biological.

SW: How does DASH compare to LOHHLA (Loss of Heterozygoty in Human Leukocyte Antigen), the existing technique for detecting HLA LOH?

PMR: We assessed the performance of DASH against the existing tool, LOHHLA, in two ways. First, we showed very similar performance across all patient tumors that we profiled with patient-specific digital PCR. Second, we evaluated the performance using silicone dilutions of cell lines. Although both methods have strong specificities for all dilutions, we found that DASH was more sensitive to lower tumor purities and for subclonal events. Capturing HLA LOH in both scenarios is critical for real patient samples.


SW: Are there any limitations of the new technique that you want to point out?


PMR: There are several potential areas for improvement.

First, the machine learning approach is highly dependent on the training dataset. Extending the training dataset to more patients and optimizing the sample labeling method could improve model accuracy.

Second, the quantitative immunopeptidomic approach yielded largely negative results. Another study would likely be needed to understand the root cause.

Third, DASH focuses exclusively on the LOH. However, several other mechanisms of allelic imbalance, such as expression imbalance, may be highly relevant to immune evasion. Future work could expand the scope to detect aberrations in other mechanisms.

SW: How do you plan to apply this technique in the future? Could DASH be used to tailor treatment options to patients in the future?

PMR: Although we believe that HLA LOH can serve as a biomarker on its own, we are very excited to integrate it with other readouts from our ImmunoID NeXT platform into composite biomarkers. We recently published an article in Clinical cancer research which describes the overlay of DASH on neoantigen prediction in a biomarker called NEOPS™. We believe that composite biomarkers that capture many different aspects of tumor-immune biology are the future and have the greatest potential impact on clinical decision-making.

Reference: Pyke RM, Mellacheruvu D, Dea S, et al. A machine learning algorithm with subclonal sensitivity reveals a generalized loss of heterozygosity of the pancancer human leukocyte antigen. Nat Common. 2022;13(1):1925. doi: 10.1038/s41467-022-29203-w

Rachel Marty Pyke was talking to Sarah Whelan, Science Writer for Technology Networks.

Sherry J. Basler