Machine Learning Based Risk Factor Analysis of Adverse Birth Outcomes in Very Low Birth Weight Infants

Participants and variables

The data consisted of 10,423 LBW infants from the Korean Neonatal Network (KNN) database from January 2013 to December 2017. The KNN began in April 2013 as a prospective national cohort registry of LBW infants admitted or transferred to neonatal intensive care units in South Korea (It currently covers 74 neonatal intensive care units). It collects perinatal and neonatal data of TPN infants based on a standardized operating procedure37.

Five adverse birth outcomes were considered as binary dependent variables (no, yes), i.e., gestational age less than 28 weeks (GeorgiaGeorgiacomputercomputer 6 p.m. (no, yes), antenatal steroid (no, yes), cesarean section (no, yes), oligohydramnios (no, yes), polyhydramnios (no, yes), maternal age (years), primiparous (no, yes), maternal education (elementary, middle school, high school, middle school or higher), maternal citizenship (Korea, Vietnam, China, Philippines, Japan, Cambodia, United States, Thailand, Mongolia, Other), paternal education (elementary, middle school, high school, middle school or above), paternal nationality (Korea, Vietnam, China, Philippines, Japan, Cambodia, United States, Thailand, Mongolia, Other), single (no, yes), congenital infection (no, yes), PMten year (PMten for each year), PMten month (PMten for each month of birth), average temperature (for each year), min temperature (for each year) and max temperature (for each year). PMten and temperature data is from Korea Meteorological Administration (PMten; temperature The definition of each variable is given in Text S1, supplementary text.

statistical analyzes

Artificial neural network, decision tree, logistic regression, Naïve Bayes, random forest and support vector machine were used to predict preterm birth38,39,40,41,42,43. A decision tree consists of three elements, namely a test on an independent variable (intermediate score), a test result (branch) and a value of the dependent variable (terminal node). A naive Bayesian classifier performs a classification based on Bayes’ theorem. Here the theorem states that the probability of the dependent variable given certain values ​​of independent variables can be calculated based on the probabilities of the independent variables given a certain value of the dependent variable. A random forest is a collection of many decision trees, which cast majority votes on the dependent variable (“bootstrap aggregation”). Take as an example a random forest with 1000 decision trees. Suppose the original data includes 10,000 participants. Then, the training and testing of this random forest is done in two steps. First, new data with 10,000 participants is created based on random sampling with replacement, and a decision tree is created based on this new data. Here, some original data participants would be excluded from the new data and these remnants are referred to as out-of-bag data. This process is repeated 1000 times, i.e. 1000 new data are created, 1000 decision trees are created and 1000 out-of-bag data are created. Second, the 1000 decision trees make predictions about each participant’s dependent variable in the out-of-bag data, their majority vote is taken as their final prediction about that participant, and the out-of-bag error is calculated as the proportion of wrong votes on all participants out-of-bag data38.39.

A support vector machine estimates a group of “support vectors”, i.e. a line or space called a “hyperplane”. The hyperplane separates the data with the largest gap between the different subgroups. An artificial neural network is made up of “neurons”, units of information combined by weights. In general, the artificial neural network includes an input layer, one, two or three intermediate layers and an output layer. Neurons in a previous layer are linked to “weights” in the next layer (here, these weights indicate the strength of the links between neurons in a previous layer and their counterparts in the next layer). This “feedforward” operation starts from the input layer, goes through the intermediate layers and ends in the output layer. Then, this process is followed by learning: these weights are updated according to their contributions for a discrepancy between the actual and predicted final outputs. This “backpropagation” operation starts from the output layer, goes through the middle layers, and ends in the input layer. Both processes are repeated until the performance metric reaches a certain limit38.39. Data on 10,423 observations with complete information were split into training and validation sets with a ratio of 70:30 (7296 vs. 3127). Accuracy, a ratio of correct predictions among 3127 observations, was used as a standard to validate the models. The importance of the random forest variable, the contribution of a certain variable to random forest performance (GINI), was used to examine major predictors of adverse birth outcomes in TPNF infants, including PMten. The randomization and analysis was repeated 50 times, then its average was taken for external validation44.45. R-Studio 1.3.959 (R-Studio Inc.: Boston, USA) was used for analysis from August 1, 2021 to September 30, 2021.


The KNN Registry has been approved by the Institutional Review Board (IRB) of each participating hospital (IRB No. of Korea University Anam Hospital: 2013AN0115). Informed consent was obtained from the parent(s) of each child registered in the KNN. All methods were performed according to IRB-approved protocol and in accordance with current guidelines and regulations.

The names of the Institutional Review Board of KNN Participating Hospitals were as follows: Institutional Review Board of Gachon University Gil Medical Center, Korea Catholic University Bucheon ST. Mary’s Hospital, Catholic University of Korea Seoul ST. Mary’s Hospital, Catholic University of Korea ST. Vincent’s Hospital, Korea Catholic University Yeouido ST. Mary’s Hospital, Catholic University of Korea Uijeongbu ST. Mary’s Hospital, Gangnam Severance Hospital, Kyung Hee University Hospital at Gangdong, GangNeung Asan Hospital, Kangbuk Samsung Hospital, Kangwon National University Hospital, Konkuk University Medical Center, Konyang University Hospital, Kyungpook National University Hospital, Gyeongsang National University Hospital, Kyung Hee University Medical center, Keimyung University Dongsan Medical Center, Korea University Guro Hospital, Korea University Ansan Hospital, Korea University Anam Hospital, Kosin University Gospel Hospital, National Health Insurance Service Iilsan Hospital, Daegu Catholic University Medical Center, Dongguk University Ilsan Hospital, Dong-A University Hospital, Seoul Metropolitan Government-Seoul National University Boramae Medical Center, Pusan ​​National University Hospital, Busan ST. Mary’s Hospital, Seoul National University Bundang Hospital, Samsung Medical Center, Samsung Changwon Medical Center, Seoul National University Hospital, Asan Medical Center, Sungae Hospital, Severance Hospital, Soonchunhyang Bucheon University Hospital, Soonchunhyang Seoul University Hospital, Soonchunhyang Cheonan University Hospital, Ajou University Hospital, Pusan ​​National University Children’s Hospital, Yeungnam University Hospital, Ulsan University Hospital, Wonkwang University School of Medicine and Hospital, Wonju Severance Christian Hospital, Eulji University Hospital, Eulji General Hospital, Ewha Womans University Medical.

Center, Inje University Busan Paik Hospital, Inje University Sanggye Paik Hospital, Inje University Ilsan Paik Hospital, Inje University Haeundae Paik Hospital, Inha University Hospital, Chonnam National University Hospital, Chonbuk National University Hospital, Cheil General Hospital & Women’s Healthcare Center, Jeju National University Hospital, Chosun University Hospital, Chung-Ang University Hospital, CHA Gangnam Medical Center, CHA University, CHA Bundang Medical Center, CHA University, Chungnam National University Hospital, National Chungbuk University, Kyungpook National University Chilgok Hospital, Kangnam Sacred Heart, Kangdong Sacred Heart Hospital, Hanyang University Guri Hospital and Hanyang University Medical Center.

Sherry J. Basler