Algorithm for predicting ICU mortality and length of stay using machine learning

Topics

This was a retrospective cohort study using electronic health record data of consecutive patients admitted to the ICU at Chiba University Hospital, Japan from November 2010 to March 2019. The surgical/medical ICU has 22 beds, with an annual number of patient admissions ranging from 1,541 to 1,832. missing data on clinical outcomes.

The study was approved by the Chiba University Graduate School of Medicine Ethics Review Committee (approval number: 3380) and conducted in accordance with the Declaration of Helsinki. The review board waived the requirement for written informed consent, in accordance with ethical guidelines for medical and health research involving human subjects in Japan.

Data collection and definitions

To develop prediction algorithms, data for 91 input variables (Supplementary Table S11) were collected no earlier than 24 hours after ICU admission from the ICU data system. These variables included (1) basic patient characteristics (age, sex, height, weight, blood type, clinical service categories, admission diagnosis, route of admission [from emergency room, general ward, operating room, other hospitals] and comorbidities APACHE II [acquired immunodeficiency syndrome, acute myeloid leukemia/multiple myeloma, heart failure, lymphoma, respiratory failure, cancer metastasis, liver failure/cirrhosis, immunosuppressed status, and dialysis]); (2) blood tests (complete blood count, biochemistry, coagulation and blood gas analysis); and (3) physiological measures (HR, blood pressure, respiratory rate, peripheral oxygen saturation [SpO2], and body temperature). Numerical data with an input rate of less than 50% was not used for predictions.

The importance of the variable is defined as an index calculated by machine learning that indicates how well the model used the variable to make accurate predictions. The three main variables with high importance were defined as the key variables in this study. The length of stay in intensive care was analyzed among survivors and divided into three categories: short (less than a week), medium (less than 1 to 2 weeks) and long (more than 2 weeks). Short and long ICU stays were considered to be of high clinical importance as these subcategories were reported to be associated with ICU mortality and severity16.19. Additionally, identifying patients who are at risk of long ICU stays can contribute to proper ICU management and avoid ICU bed shortages.16.

Imputation for missing values

We performed multiple imputations (10 times) for missing values ​​of numerical data on a single data set using the sklearn.impute.Iterative Impute in Python (scikit-learn 0.22.1; https://scikit-learn.org). A dummy coding was used to convert the categorical variables into binary variables. After imputation of missing values, the dataset was randomly divided between the training and test cohorts, comprising 80% and 20% of the datasets respectively, and variables were compared between the two cohorts.

statistical analyzes

The primary outcome variable was ICU mortality and the secondary outcome variable was ICU length of stay. The prediction of the results was done using machine learning approach algorithms computed with the three types of classifiers i.e. RF, XGBoost and Neural Network or logistic regression analysis using either APACHE II score , which is the SOFA score. RF is a standard ensemble machine learning method, and XG Boost is the same decision tree-based method as RF, which has been used frequently in recent years due to its accuracy for complex data. Different from these two classifiers, Neural Network is a non-decision tree based method. Because it is difficult to evaluate all machine learning methods, these three classifiers, which are representative and have different characteristics, were selected in this study. Once the machine learning algorithms were derived using the training cohort, the established algorithms were applied to the testing cohort. As we found that the RF was superior to the other two machine learning models for predicting mortality, we confirmed the varying importance and key variables of the RF model. To assess the importance of the variable in the prediction, we used the feature importance function in the Python package scikit-learn.

For robust clustering of ICU patients with higher mortality risk factors, an RF dissimilarity measure was calculated to assess similarity between patients. RF dissimilarity measurement is a method to assess the similarity between samples based on a trained RF model, where the similarity of the samples is evaluated by the frequency at which two samples are ranked in the same leaf in the model’s decision tree RF.20. If two samples are ranked in the same leaf in all decision trees, the RF dissimilarity between the two samples is 0 (completely identical). Conversely, if two samples are never classified in the same sheet, the RF dissimilarity is 1 (completely different). The more often they are classified in the same sheet, the closer the RF dissimilarity is to 0. The RF dissimilarity was then used as input for UMAP to provide a 2D representation of the patients in the test cohort. UMAP is a type of multiple learning that allows us to place samples in two-dimensional space while maintaining the distance (dissimilarity) between samples21. Subsequently, clustering of ICU patients was obtained by visually identifying the distribution of each variable on the two UMAP scale coordinates. The clustering based on the RF dissimilarity measure we performed in this study is a visualization of a supervised machine learning model. In supervised learning, the results of the prediction are probabilities, but the details of the prediction, such as “which samples are likely to be wrong in prediction” or “which samples have similar prediction probabilities but have different characteristics” are not explicitly displayed. Visualization and clustering based on RF dissimilarity allows us to reveal population heterogeneity and hard-to-predict samples.

To predict ICU length of stay, we assessed short (short vs. non-short) and long (long vs. non-long) categories using machine learning with RF algorithm and logistic regression analysis. using APACHE II or SOFA scores. Similar to the mortality analysis, the importance of the variables and the key variables associated with the prediction of ICU length of stay were confirmed. We also analyzed the predictive values ​​of ICU length of stay using ordinalForest, which could estimate predictive values ​​for all three categories of ICU stay at the same time. All classifiers were implemented in Python, except for ordinalForest, which was implemented with R.

Data are expressed as the median (interquartile range) for continuous values ​​and as absolute numbers and percentages for categorical values. AUC was calculated to assess predictive values. Statistical significance was set at P

Ethical approval and consent to participate

This study was approved by the Chiba University Graduate School of Medicine Research Ethics Board (approval number: 3380), who issued a waiver of written consent for the study because the collection of data was retrospective.

Sherry J. Basler