Machine learning prediction of hematoma expansion in acute intracerebral hemorrhage

Study population

Consecutive patients with acute ICH who were admitted to Mie Chuo Medial Center between December 2012 and July 2020, Matsusaka Chuo General Hospital between January 2018 and December 2019, Suzuka Kaisei Hospital between October 2017 and October 2019 and at Mie University Hospital between January 2017 and July 2020 were retrospectively reviewed. Patients from Mie Chuo Medical Center, Matsusaka Chuo General Hospital, and Suzuka Kaisei Hospital were assigned to the development cohort, and those from Mie University Hospital were assigned to the validation cohort.

The inclusion criteria were defined as follows: ≥ 18 years old; baseline CT scan within 24 hours of onset; and follow-up CT scan within 30 hours of initial CT scan. Exclusion criteria were defined as follows: traumatic ICH; secondary cause of ICH (eg, aneurysm, arteriovenous malformation, arteriovenous fistula, hemorrhagic transformation of infarction, and tumor); and surgical evacuation before follow-up CT scan.

Baseline clinical variables included age, gender, medical history (ICH, cerebral infarction, ischemic heart disease, hypertension, diabetes mellitus, and dyslipidemia), use of anticoagulants, use of antiplatelet drugs, scale coma, systolic and diastolic blood pressures, prothrombin time – international normalized (PT-INR), white blood cell count, hemoglobin, platelet count, serum creatinine, serum total bilirubin and time elapsed from onset to initial CT scan.

This study was approved by the following institutional review boards: Mie Chuo Medical Center Institutional Review Board [permit number: MCERB-201926]Matsusaka Chuo General Hospital Institutional Review Board [permit number: 232]Suzuka Kaisei Hospital Institutional Review Board [permit number: 2020–05]and Mie University Hospital Institutional Review Board [permit number: T2019-19]. Because this was a retrospective study, the patient’s separate informed consent was revoked by the following institutional review boards: Mie Chuo Medical Center Institutional Review Board [permit number: MCERB-201926]Matsusaka Chuo General Hospital Institutional Review Board [permit number: 232]Suzuka Kaisei Hospital Institutional Review Board [permit number: 2020–05]and Mie University Hospital Institutional Review Board [permit number: T2019-19]. All study protocols and procedures were conducted in accordance with the Declaration of Helsinki. This manuscript was prepared in accordance with the Standards for Reporting Diagnostic Accuracy (STARD).

Imaging analysis

CT scans were performed using 120 kVp with a thickness of 0.5 to 10.0 mm in the supine position. CT angiography was performed by injecting 50 to 100 ml of an iodinated contrast product at 3.5 to 5.0 ml/s; but not all patients underwent CT angiography. CT scanner manufacturers and models in the development cohort included Aquilion ONE (Canon Medical Systems, Ohtawara, Japan), Aquilion 64 (Canon Medical Systems), LightSpeed ​​Plus (GE Medical Systems, Milwaukee, WI, USA) , LightSpeed ​​VCT (GE Medical Systems), BrightSpeed ​​Elite (GE Medical Systems) and SOMATOM Definition Flash (SIEMENS Healthineers, Erlangen, Germany), and those in the validation cohort included Aquilion 64 and Discovery CT750 HD (GE Medical Systems).

Hemorrhage locations were classified into basal ganglia, thalamus, lobe, brainstem, and cerebellum. The presence of intraventricular extension of the hemorrhage was noted. The volume of the hematoma was calculated with the formula ABC/227. Hematoma expansion was defined as an increase in volume between baseline and follow-up CT scans exceeding 6 cm3 or 33% of reference volume16,17,18,19,20,28.

Intra-hematoma hypodensities, irregular shape of hematoma and sign of mixing were identified as non-contrast CT markers. Intra-hematoma hypodensities were defined as the presence of any encapsulated hypodense region within the hematoma having any morphology and size, separate from the surrounding parenchyma.3,4,12,14. Irregular shape of the hematoma was defined as the presence of at least 2 irregularities of the edge of the hematoma4,7,9,12. The mixing sign was defined as the mixing of a relatively hypoattenuating area with an adjacent hypoattenuating region in a hematoma with a well-defined margin and a difference of at least 18 Hounsfield units from these regions4,6,8,12. When available, CT angiography point sign was assessed, which was defined as: (1) ≥ 1 focus (attenuation ≥ 120 Hounsfield units) of any size and contrast morphology in a hematoma, and (2) discontinuous normal or abnormal vasculature adjacent to the hematoma15.29. CT markers were assessed independently by 2 observers. When the observers’ evaluation disagreed, the CT images were re-evaluated by the two observers together, with a consensus being developed.

Hospital management

After identification of ICH on the initial CT scan, continuous blood pressure monitoring and antihypertensive therapy were initiated. Calcium channel blockers, primarily intravenous nicardipine, were administered as antihypertensive agents throughout the period between baseline and follow-up CT scans. Target systolic blood pressure was less than 140 mmHg or 180 mmHg.

statistical analyzes

Continuous variables were summarized using mean with standard deviation or median with interquartile range and compared using Student’s t-test or Mann-Whitney’s U-test, depending on the distribution of the variable evaluated by the Shapiro-Wilk test. Categorical variables were summarized using counting with percentages and compared using Fisher’s exact test.

To confirm the superiority of predictive models using ML over previous scoring methods, BAT, BRAIN and 9-point scores in the validation cohort were calculated.16,17,18,19. The receiver operating characteristics (ROC) curve was plotted, where the best cut-off value by Youden’s index was determined. In each scoring method, the precision, sensitivity, specificity, and area under the ROC curve (AUC) for the prediction of hematoma expansion were calculated. The AUC of the three scores and that of the ML models were compared using the DeLong test.

All statistical analyzes were performed using EZR (Saitama Medical Center, Jichi Medical University, Saitama, Japan)30which is a graphical user interface for R (The R Foundation for Statistical Computing, Vienna, Austria).

Machine learning environment and algorithms

The Python programming language (version 3.7.8) and its libraries, NumPy (version 1.19.1), scikit-learn (version 0.23.2), XGBoost (version 1.2.0), balanced-learn (version 0.7.0) , and matplotlib (version 3.3.1), were used for all data processing. The programming code was executed in Jupyter Notebook (version 6.0.3).

To develop predictive models, supervised ML algorithms were adopted, in which pairs of input and output class data were given to the algorithm, which found a way to generate the output class from the input data.31. The k-nearest neighbors (k-NN) algorithm, logistic regression, support vector machines (SVM), random forests and XGBoost were selected as supervised algorithms. The k-NN algorithm is the simplest ML algorithm, which finds k nearest neighbors of a new observation in the stored training data and makes a prediction by assigning the majority class among these neighbors31. Logistic regression is a binary classifier, in which a linear model is included in a logistic function and the probability that a new observation is a member of each class is calculated31. SVMs find the hyperplane that maximizes the margin between classes in the training data, by making a prediction based on the distances to support vectors and the importance of support vectors31. Random forests train many decision trees, where each tree receives only a bootstrap observation of the training data and each node considers only a subset of features when determining the best distribution, making a prediction according to the average probabilities predicted by all the trees31. XGBoost is a gradient boosting algorithm, which works by building decision trees in a serial way, where each tree tries to correct the errors of the previous one; and the probability is calculated by adding the weight of the leaves to which a new observation belongs in each decision tree31. With each supervised algorithm, a predictive model development using patent data from the development cohort (training dataset) and an external validation using that from the validation cohort (testing dataset) were planned.

Feature selection and scaling, and oversampling

Basic clinical variables, CT findings including hemorrhage locations, intraventricular hematoma extension, baseline hematoma volume and non-contrast CT markers, and target systolic blood pressure were applied as input data, while hematoma expansion was applied as output class.

Since there were 31 individual properties of the input data, called features, feature selection was performed to lead to simpler models that generalize better.31. First, univariate analyzes with Student’s t-test, Mann-Whitney U-test, and Fisher’s exact test were performed between the expansion and non-expansion groups in the training dataset. Second, features were ranked based on their P-values. Finally, 5-10 features with the smallest P-values ​​were selected. Feature scaling was done using normalization in SVMs, which required all features to vary on a similar scale to work well.

Given the imbalance in the output class distribution, random oversampling was used. Random oversampling involved randomly selecting observations from the minority group with replacement and adding them to the training dataset.

Development of predictive models and external validation

Each supervised ML algorithm was applied to the training dataset with 5-10 features selected and all 31 features. In the process of developing the predictive model, a 30-fold stratified cross-validation was used to assess generalization performance, in which the training dataset was split such that the proportions between the output classes were the same in each fold as in the training data set. Position31. Hyperparameters were manually tuned in each algorithm as shown in Table 1 to improve generalization performance, while other hyperparameters not listed in Table 1 were used by default.

Table 1 Manually tuned hyperparameters and their values ​​in each machine learning algorithm.

After model development, each model was evaluated for its performance on the test dataset as an external validation, where the precision, sensitivity, specificity and AUC for the prediction of the expansion of l hematoma were calculated.

Sherry J. Basler