Six Machine-Learning Methods for Predicting Hospital-Stay Duration for Patients with Sepsis: A Comparative Study

#Machine #Learning #Process #Mining #Healthcare #Comparative #Study #Linear #Regression #Random #Forest #K-Nearest #Neighbors #Neural #Networks #XGBoost #lightGBM #Sepsis
Share

Sepsis is a life-threatening medical condition that, if not treated promptly, can result in tissue damage, organ failure, and death. According to the Centers for Disease Control, about 270,000 individuals die of sepsis in the US each year. Further, sepsis expenditures accounted for 13% of total US hospital costs in 2013, totaling more than $24 billion. Our project objectives were to determine if Machine Learning algorithms could reliably predict hospital stay duration for patients with sepsis. The data set we used has been de-identified and is freely available through the BupaR package. The data includes 1050 cases, 15214 events, and 16 types of actions related to sepsis patient care. First, we used process mining to determine how long each patient was in the hospital. Using BupaR’s functions, we created several process model graphs. These process models depict the movement of patients at a hospital and provide duration data for each patent case. Second, we identified outlier data and created two dataset versions: one with and one without outliers. We then applied the following analysis methods: Linear Regression, Random Forest, K-Nearest Neighbors, Neural Networks, XGBoost, and lightGBM. We compared the model validations for the six machine learning models using the same data-splitting method. We found that the XGBoost model had the best prediction accuracy of 73.9 percent for cases with outliers, and 79 percent for cases without outliers. We also found that the lightGBM model had the lowest mean absolute error between prediction and actual duration in days with 3.66 days for the case with outliers, and 2.4 days for the case without outliers. These two models outperformed the other four models. This work will be enhanced in the future by exploring new prediction algorithms and comparing them with the results of this study.



  Date and Time

  Location

  Hosts

  Registration



  • Date: 08 Apr 2022
  • Time: 01:00 PM to 02:00 PM
  • All times are (UTC-04:00) Eastern Time (US & Canada)
  • Add_To_Calendar_icon Add Event to Calendar
If you are not a robot, please complete the ReCAPTCHA to display virtual attendance info.
  • Knoxville, Tennessee
  • United States

  • Contact Event Hosts


  Speakers

Hilda Klasky of Oak Ridge National Laboratory

Topic:

Six Machine-Learning Methods for Predicting Hospital-Stay Duration for Patients with Sepsis: A Comparative Study

Sepsis is a life-threatening medical condition that, if not treated promptly, can result in tissue damage, organ failure, and death. According to the Centers for Disease Control, about 270,000 individuals die of sepsis in the US each year. Further, sepsis expenditures accounted for 13% of total US hospital costs in 2013, totaling more than $24 billion. Our project objectives were to determine if Machine Learning algorithms could reliably predict hospital stay duration for patients with sepsis. The data set we used has been de-identified and is freely available through the BupaR package. The data includes 1050 cases, 15214 events, and 16 types of actions related to sepsis patient care. First, we used process mining to determine how long each patient was in the hospital. Using BupaR’s functions, we created several process model graphs. These process models depict the movement of patients at a hospital and provide duration data for each patent case. Second, we identified outlier data and created two dataset versions: one with and one without outliers. We then applied the following analysis methods: Linear Regression, Random Forest, K-Nearest Neighbors, Neural Networks, XGBoost, and lightGBM. We compared the model validations for the six machine learning models using the same data-splitting method. We found that the XGBoost model had the best prediction accuracy of 73.9 percent for cases with outliers, and 79 percent for cases without outliers. We also found that the lightGBM model had the lowest mean absolute error between prediction and actual duration in days with 3.66 days for the case with outliers, and 2.4 days for the case without outliers. These two models outperformed the other four models. This work will be enhanced in the future by exploring new prediction algorithms and comparing them with the results of this study.

Biography:

Hilda Klasky is a senior R&D Staff at Advanced Computing for Health Science Section in the Computational Sciences & Engineering Division at Oak Ridge National Laboratory. She is also a Senior member of the IEEE and Co-Chair of the IEEE Women in Engineering East TN group.

Email: