Development and Validation of a Machine Learning-Based Screening Algorithm to Predict High-Risk Hepatitis C Infection.

Jang, Suk-Chan, Wei-Hsuan Lo-Ciganic, Pilar Hernandez-Con, Chanakan Jenjai, James Huang, Ashley Stultz, Shunhua Yan, et al. 2025. “Development and Validation of a Machine Learning-Based Screening Algorithm to Predict High-Risk Hepatitis C Infection.”. Open Forum Infectious Diseases 12 (8): ofaf496.

Abstract

BACKGROUND: Amid the opioid epidemic in the United States, hepatitis C virus (HCV) infections are rising, with one-third of individuals with infection unaware due to the asymptomatic nature. This study aimed to develop and validate a machine learning (ML)-based algorithm to screen individuals at high risk of HCV infection.

METHODS: We conducted prognostic modeling using the 2016-2023 OneFlorida+ database of all-payer electronic health records. The study included individuals aged ≥18 years who were tested for HCV antibodies, RNA, or genotype. We identified 275 features of HCV, including sociodemographic and clinical characteristics, during a 6-month period before the test result date. Four ML algorithms-elastic net (EN), random forest (RF), gradient boosting machine (GBM), and deep neural network (DNN)-were developed and validated to predict HCV infection. We stratified patients into deciles based on predicted risk.

RESULTS: Among 445 624 individuals, 11 823 (2.65%) tested positive for HCV. Training (75%) and validation (25%) samples had similar characteristics (mean, standard deviation age, 45 [16] years; 62.86% female; 54.43% White). The GBM model (C statistic, 0.916 [95% confidence interval = .911-.921]) outperformed the EN (0.885 [.879-.891]), RF (0.854 [.847-.861]), and DNN (0.908 [.903-.913]) models (P < .0001). Using the Youden index, GBM achieved 79.39% sensitivity and 89.08% specificity, identifying 1 positive HCV case per 6 tests. Among patients with HCV, 75.63% and 90.25% were captured in the top first and first to third risk deciles, respectively.

CONCLUSIONS: ML algorithms effectively predicted and stratified HCV infection risk, offering a promising targeted screening tool for clinical settings.

Last updated on 08/28/2025
PubMed