SM-DBERT: A Novel Symptom-based Technique for Chronic Disease Classification using DISTILBERT
Main Article Content
Abstract
Machine learning and deep learning models when applied on EHR systems are considerably augmenting the prediction tasks performed on medical data. Humongous amount of information lies in the free form clinical texts. But there exist challenges associated with such kinds of unstructured data. Transformers based models like Bidirectional Encoder Representations from Transformers (BERT) has revolutionized the work. DISTILBERT, a lighter version of BERT, is even promising as the time required is reduced to nearly one-third without losing the performance. In this research work, we present SM-DBERT, Symptom-based Modified DistilBERT architecture designed for Chronic Diseases. The foundation of SM-DBERT is symptomatology, as an optimal model should prioritize symptoms as they are the key indicators. The existing DISTILBERT architecture has been modified by introducing additional layers and extra embeddings of external knowledge and presented along with input ids and attention masks. These extra knowledge helps the model to learn more relevant information. SM-DBERT has demonstrated notable improvement in the results. The accuracy obtained with this novel approach is 0.98 as against the basic DISTILBERT model.