A Machine Learning Pipeline and Application for Automatic Classification of Clinical Documents
Main Article Content
Abstract
Healthcare industry has many associated services including research on various trends or patterns in diseases and patients’ life style. With the emergence of Artificial Intelligence (AI), it is made possible that problems in healthcare domain can be solved by using Machine Learning (ML) techniques. One such problem considered in this paper is known as clinical document classification. Existing methods in this area lack a systematic approach in filtering out false positives. In this paper we proposed a ML framework that considers pipelining of ML models at multiple levels. In the first level, clinical documents that do not have any content related to smoking are discarded. In the second level, the documents that talk about known smoking cases are retained. In the third level clinical document are classified into two categories such as currently smoking and past smokers. We proposed an algorithm known as Learning based Clinical Document Classification (LbCDC). This algorithm makes use of three models in pipeline in order to perform classification of clinical documents at multiple levels of granularity. Our experimental results revealed that the proposed system is efficient in clinical document classification.
Article Details
References
Goodrum, Heath; Roberts, Kirk and Bernstam, Elmer V. (2020). Automatic classification of scanned electronic health record documents. International Journal of Medical Informatics, 144, 104302–. http://doi:10.1016/j.ijmedinf.2020.104302
Latif, Jahanzaib; Xiao, Chuangbai; Tu, Shanshan; Rehman, Sadaqat Ur; Imran, Azhar and Bilal, Anas (2020). Implementation and Use of Disease Diagnosis Systems for Electronic Medical Records Based on Machine Learning: A Complete Review. IEEE Access, 1–1. http://doi:10.1109/ACCESS.2020.3016782
Gerevini, Alfonso Emilio; Lavelli, Alberto; Maffi, Alessandro; Maroldi, Roberto; Minard, Anne-Lyse; Serina, Ivan and Squassina, Guido (2018). Automatic classification of radiological reports for clinical care. Artificial Intelligence in Medicine, S0933365717305912–. http://doi:10.1016/j.artmed.2018.05.006
Waring, Jonathan; Lindvall, Charlotta and Umeton, Renato (2020). Automated Machine Learning: Review of the State-of-the-Art and Opportunities for Healthcare. Artificial Intelligence in Medicine, 101822–. http://doi:10.1016/j.artmed.2020.101822
Gibson, Eli; Li, Wenqi; Sudre, Carole; Fidon, Lucas; Shakir, Dzhoshkun I.; Wang, Guotai; Eaton-Rosen, Zach; Gray, Robert; Doel, Tom; Hu, Yipeng; Whyntie, Tom; Nachev, Parashkev; Modat, Marc; Barratt, Dean C.; Ourselin, Sébastien; Cardoso, M. Jorge and Vercauteren, Tom (2018). NiftyNet: a deep-learning platform for medical imaging. Computer Methods and Programs in Biomedicine, S0169260717311823–. http://doi:10.1016/j.cmpb.2018.01.025
Koopman, Bevan; Zuccon, Guido; Nguyen, Anthony; Bergheim, Anton and Grayson, Narelle (2018). Extracting cancer mortality statistics from death certificates: A hybrid machine learning and rule-based approach for common and rare cancers. Artificial Intelligence in Medicine, S0933365717301173–. http://doi:10.1016/j.artmed.2018.04.011
Suárez-Paniagua, Víctor; Rivera Zavala, Renzo M.; Segura-Bedmar, Isabel and Martínez, Paloma (2019). A two-stage deep learning approach for extracting entities and relationships from medical texts. Journal of Biomedical Informatics, 99, 103285–. http://doi:10.1016/j.jbi.2019.103285
Alyafeai, Zaid and Ghouti, Lahouari (2019). A Fully-Automated Deep Learning Pipeline for Cervical Cancer Classification. Expert Systems with Applications, 112951–. http://doi:10.1016/j.eswa.2019.112951
Wang, Yunlu; Hu, Menghan; Zhou, Yuwen; Li, Qingli; Yao, Nan; Zhai, Guangtao; Zhang, Xiao-Ping and Yang, Xiaokang (2020). Unobtrusive and Automatic Classification of Multiple Peopleâ?™s Abnormal Respiratory Patterns in Real Time using Deep Neural Network and Depth Camera. IEEE Internet of Things Journal, 1–1. http://doi:10.1109/JIOT.2020.2991456
Obeid, Jihad S.; Weeda, Erin R.; Matuskowitz, Andrew J.; Gagnon, Kevin; Crawford, Tami; Carr, Christine M. and Frey, Lewis J. (2019). Automated detection of altered mental status in emergency department clinical notes: a deep learning approach. BMC Medical Informatics and Decision Making, 19(1), 164–. http://doi:10.1186/s12911-019-0894-9
Yue, Lin; Tian, Dongyuan; Chen, Weitong; Han, Xuming and Yin, Minghao (2020). Deep learning for heterogeneous medical data analysis. World Wide Web. http://doi:10.1007/s11280-019-00764-z
Marshall, Iain J. and Wallace, Byron C. (2019). Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. Systematic Reviews, 8(1). http://doi:10.1186/s13643-019-1074-9
Fernando Pérez-García; Rachel Sparks and Sébastien Ourselin; (2021). TorchIO: A Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning . Computer Methods and Programs in Biomedicine. http://doi:10.1016/j.cmpb.2021.106236
Laith Alzubaidi; Mohammed A. Fadhel; Omran Al-Shamma; Jinglan Zhang; J. Santamaría and Ye Duan; (2021). Robust application of new deep learning tools: an experimental study in medical imaging . Multimedia Tools and Applications. http://doi:10.1007/s11042-021-10942-9
Liang Tan; Keping Yu; Ali Kashif Bashir; Xiaofan Cheng; Fangpeng Ming; Liang Zhao and Xiaokang Zhou; (2021). Toward real-time and efficient cardiovascular monitoring for COVID-19 patients by 5G-enabled wearable medical devices: a deep learning approach . Neural Computing and Applications. http://doi:10.1007/s00521-021-06219-9
Li, Min; Fei, Zhihui; Zeng, Min; Wu, Fangxiang; Li, Yaohang; Pan, Yi and Wang, Jianxin (2018). Automated ICD-9 Coding via A Deep Learning Approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1–1. http://doi:10.1109/TCBB.2018.2817488
Babita Pandey; Devendra Kumar Pandey; Brijendra Pratap Mishra and Wasiur Rhmann; (2021). A comprehensive survey of deep learning in the field of medical imaging and medical natural language processing: Challenges and research directions . Journal of King Saud University - Computer and Information Sciences. http://doi:10.1016/j.jksuci.2021.01.007
Koyel Datta Gupta; Deepak Kumar Sharma; Shakib Ahmed; Harsh Gupta; Deepak Gupta and Ching-Hsien Hsu; (2021). A Novel Lightweight Deep Learning-Based Histopathological Image Classification Model for IoMT . Neural Processing Letters. http://doi:10.1007/s11063-021-10555-1
Yasar, Huseyin and Ceylan, Murat (2020). A novel comparative study for detection of Covid-19 on CT lung images using texture analysis, machine learning, and deep learning methods. Multimedia Tools and Applications. http://doi:10.1007/s11042-020-09894-3
Radakovich, Nathan; Nagy, Matthew and Nazha, Aziz (2020). Machine learning in haematological malignancies. The Lancet Haematology, 7(7), e541–e550. http://doi:10.1016/S2352-3026(20)30121-6
Arora, Ridhi; Rai, Prateek Kumar and Raman, Balasubramanian (2020). Deep featureâ?“based automatic classification of mammograms. Medical & Biological Engineering & Computing. http://doi:10.1007/s11517-020-02150-8
Vellido, Alfredo (2019). The importance of interpretability and visualization in machine learning for applications in medicine and health care. Neural Computing and Applications. http://doi:10.1007/s00521-019-04051-w
L, Arokia Jesu Prabhu; Sengan, Sudhakar; G K, Kamalam; J, Vellingiri; Gopal, Jagadeesh; Velayutham, Priya and V, Subramaniyaswamy (2020). Medical information retrieval systems for e-Health care records using fuzzy based machine learning model. Microprocessors and Microsystems, 103344–. http://doi:10.1016/j.micpro.2020.103344
Zabihollahy, Fatemeh; Schieda, Nicola; Krishna, Satheesh and Ukwatta, Eranga (2020).
Jingyu Zhong, Chengxiu Zhang, Yangfan Hu, Jing Zhang, Yun Liu, Liping Si1, Yue Xing , Defang Ding, Jia Geng, Qiong Jiao, Huizhen Zhang, Guang Yang and Weiwu Yao (2022). Automated prediction of the neoadjuvant chemotherapy response in osteosarcoma with deep learning and an MRI-based radiomics nomogram. Springer, pp.1-11. https://doi.org/10.1007/s00330-022-08735-1
CHERUBIN MUGISHA AND INCHEON PAIK. (2022). Comparison of Neural Language Modeling Pipelines for Outcome Prediction From Unstructured Medical Text Notes. IEEE. 10, pp.16489-16498. http://DOI:10.1109/ACCESS.2022.3148279
Narmin Ghaffari Laleh, Hannah Sophie Muti, Chiara Maria Lavinia Loeffler, Amelie Echlea , Oliver Lester Saldanha , Faisal Mahmood , Ming Y. Lu , Christian Trautwein , Rupert Langer, Bastian Dislich, Roman D. Buelow, Heike Irmgard Grabsch, Hermann Brenner, Jenny Chang-Claude, Elizabeth Alwers, Titus J. Brinker, Firas Khader, Daniel Truhnn, Nadine T. Gaisa, Peter Boor, Michael Hoffmeister, Volkmar Schulz, Jakob Nikolas Kather. (2022). Benchmarking weakly-supervised deep learning pipelines for whole slide classification in computational pathology. Elsevier., pp.1-15. https://doi.org/10.1016/j.media.2022.102474
Stefan Grafberger, Paul Groth, Julia Stoyanovich and Sebastian Schelter. (2022). Data distribution debugging in machine learning pipelines. Springer, pp.1-24. https://doi.org/10.1007/s00778-021-00726-w
Clinical documents dataset. Retrieved from https://portal.dbmi.hms.harvard.edu/