Font Size:
TabText: A Flexible and Contextual Approach to Tabular Data Representation
Last modified: 2024-05-15
Abstract
In collaboration with Hartford HealthCare (HHC), we have developed highly accurate machine learning (ML) models that predict nine inpatient outcomes (e.g. short-term discharges, ICU transfers, mortality, etc.) using tabular data from electronic medical records. Hundreds of medical staff currently use our models, resulting in a significant reduction in patient-average length of stay and projected annual benefits of $55-$72 million for HHC. Given this successful implementation, the question arises: how could we extend these tools for the benefit of hospitals with limited resources, small patient populations, and/or non-standardized healthcare records? To address these challenges, we introduce TabText, a systematic framework that leverages Large Language Models to process and extract contextual information from tabular structures, resulting in more complete and flexible data representations. We show that 1) applying our TabText framework enables the generation of high-performing predictive models with minimal data processing, and 2) augmenting tabular data with TabText representations can significantly improve the performance of standard ML models across all nine prediction tasks, especially when trained with small-size datasets.
Keywords
Large Language Models, Healthcare Analytics, Data Augmentation
References
Carballo, K. V., Na, L., Ma, Y., Boussioux, L., Zeng, C., Soenksen, L. R., & Bertsimas, D. (2022). TabText: A Flexible and Contextual Approach to Tabular Data Representation. arXiv preprint arXiv:2206.10381.