Can machine learning help predict disease

Machine learning techniques can provide hypothesis-free analysis of outbreak case data with surprisingly good prediction accuracy and the ability to dynamically incorporate the latest data, a new KAUST study has shown. The proof of concept developed by Yasminah Alali, a student in KAUST’s Saudi Summer Internship (SSI) 2021 program, demonstrates a promising alternative approach to conventional parameter-driven mechanistic models that removes human biases and assumptions from the analysis and shows the underlying story of the data.

Together with Ying Sun and Fouzi Harrou of KAUST, Alali leveraged his experience working with artificial intelligence models to develop a framework tailored to the characteristics and changing nature of epidemic data using incidence and publicly released COVID-19 recovery plans in India and Brazil.

“My major in college was artificial intelligence, and I once worked on a medical project using various ML algorithms,” says Alali. “Working with Professor Sun and Dr Harrou during my internship, we investigated whether the Gaussian process regression method would be useful in predicting the spread of the pandemic as it gives confidence intervals for predictions, which can greatly help decision makers.

Accurate forecasting of cases during a pandemic is essential to help mitigate and slow transmission. Various methods have been developed to improve the prediction of the spread of cases using mathematical models and time series, but these rely on a mechanistic understanding of the spread of contagion and the effectiveness of infection control measures. mitigation such as masks and isolation. These methods become increasingly accurate as our understanding of a particular contagion improves, but this can lead to faulty assumptions that could unknowingly affect the accuracy of modeling results.

As ML techniques are not able to capture the time dependence of a data series, the team had to find a way to dynamically incorporate new data at different times in the learning process by “delaying” the inputs of data. They also incorporated a Bayesian optimization method to help refine the extracted distributions for increased accuracy. The result is an integrated dynamic ML framework that performed remarkably well using real-world case data.

“In this study, we used machine learning models because of their ability to extract relevant information from data with flexibility and without any assumptions about the underlying distribution of the data,” says Harrou. “GPR is very attractive for handling different types of data that follow different Gaussian or non-Gaussian distributions, and the integration of shifted data contributes significantly to improving the quality of predictions.”


Warning: AAAS and EurekAlert! are not responsible for the accuracy of press releases posted on EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.

Sherry J. Basler