Machine learning shows promise for identifying patients with nmCRPC
Advanced machine learning and natural language processing approaches were combined to identify patients with non-metastatic castration-resistant prostate cancer from electronic health record data.
By combining machine learning and rule-based natural language processing (NLP), researchers have developed an algorithm to leverage electronic health records (EHRs) to identify patients with prostate cancer. non-metastatic castration-resistant prostate (nmCRPC).1
Using data from the Department of Veterans Affairs nationwide EHR, the researchers identified 13,199 patients in their final nmCRPC cohort of 654,148 patients with prostate cancer from 2006 to 2020. Of the number Of total prostate cancer patients identified by their algorithm, 26,506 patients were castration resistant, but in the nmCRPC cohort, 8,297 patients were excluded due to signs of metastatic disease.
The accuracy of this machine learning algorithm was 86% with NLP which classified patients with metastatic disease, showing 96% accuracy, 99% accuracy and 98% sensitivity. Additionally, there was 86% accuracy within 3 months of patient diagnosis in predicting whether they will progress to nmCRPC.
“Being able to identify complex disease states from increasingly accessible EHR data is important,” the researchers from the Huntsman Cancer Institute at the University of Utah wrote in a poster of their study. “We combined advanced machine learning and NLP approaches to identify [patients with] nmCRPC from EHR data, including a variety of elements from multiple sources. »
The researchers used an extreme-gradient machine learning approach that had previously been trained on a similar cohort of patients with prostate cancer identified in veterans’ cancer registries. International Classification of Diseases (ICD) codes -9 and -10 were divided into 7-day intervals with the numbers of the ICD codes in each interval assigned as a set of predictive characteristics for patients who progressed.
It also allowed the researchers to exclude patients without prostate cancer who might have been in the EHR they looked at. Patients in training were fed into the algorithm to teach it how to categorize patients. It started with if patients had urinary symptoms, and if the answer was yes, it identified if the patient had an ICD for bladder cancer or urinary tract infection, and if the answer was yes again, those patients were designated as prostate cancer free. Patients with ICD codes for prostate cancer were given a +2 value which allowed for proper weighting of the model to move on to predicting patient progression.
To further classify patients, those with evidence of prior surgical castration, ongoing androgen deprivation therapy (ADT), or testosterone levels consistent with medical castration, those with 50 ng/dl or greater (≤ 2.0 nmol/l), were considered castrated. These patients were then removed from the cohort. Additionally, patients with nmCRPC were defined as having a diagnosis of castration-resistant prostate cancer, defined by whether the patient had 2 consecutive increases in PSA during castration, or no evidence of metastatic disease on the radiological report. .
In order to identify patients with metastatic disease, patient data was forwarded to the PNL to find uncancelled mentions of metastatic disease in radiology reports. The algorithm then used a unified medical language system to identify metastatic vocabulary and identify patterns of metastatic disease, but it still required human review. Once done, these patients were assigned a score to trigger identification in the larger algorithm of thousands of prostate cancer patients.
According to the researchers, if a patient does not show signs of metastatic disease but their disease progresses, despite castration levels of testosterone signals, they progress to nmCRPC. This is usually after a patient initially responds to ADT but becomes resistant to therapies that inhibit androgen binding to the androgen receptor, thereby blocking the treatment’s potential. The identification of these patients is important to then adjust the treatment and manage the evolution of their disease.
“This approach classifies the cancer diagnosis and date of diagnosis with reasonable accuracy,” the researchers concluded.