A Comprehensive Survey of Automatic Dysarthric Speech Recognition
Main Article Content
Abstract
Automatic dysarthric speech recognition (DSR) is very crucial for many human computer interaction systems that enables the human to interact with machine in natural way. The objective of this paper is to analyze the literature survey of various Machine learning (ML) and deep learning (DL) based dysarthric speech recognition systems (DSR). This article presents a comprehensive survey of the recent advances in the automatic Dysarthric Speech Recognition (DSR) using machine learning and deep learning paradigms. It focuses on the methodology, database, evaluation metrics and major findings from the study of previous approaches.The proposed survey presents the various challenges related with DSR such as individual variability, limited training data, contextual understanding, articulation variability, vocal quality changes, and speaking rate variations.From the literature survey it provides the gaps between exiting work and previous work on DSR and provides the future direction for improvement of DSR.
Article Details
References
Wei Xue, Catia Cucchiarini, Roeland van Hout, and Helmer Strik. 2023. Measuring the intelligibility of dysarthricspeechthrough automatic speech recognition in a pluricentric language. Speech Commun. 148, C (Mar 2023), 23–30. https://doi.org/10.1016/j.specom.2023.02.004
Aisha Jaddoh, Fernando Loizides & Omer Rana (2022) Interaction between people with dysarthria and speechrecognitio systems: A review, Assistive Technology, DOI: 10.1080/10400435.2022.2061085
Chen, L. Special Issue on Automatic Speech Recognition. Appl. Sci. 2023, 13, 5389. https://doi.org/10.3390/app13095389
Shih, D.-H.; Liao, C.-H.; Wu, T.-W.; Xu, X.-Y.; Shih, M.-H. (2022). Dysarthria Speech Detection Using Convolutional Neural Networks with Gated Recurrent Unit. Healthcare. 2022. 10. 1956. https://doi.org/10.3390/healthcare10101956
S. Alharbi et al., "Automatic Speech Recognition: Systematic Literature Review," in IEEE Access, vol. 9, pp. 131858-131876, 2021, doi: 10.1109/ACCESS.2021.31125
Roger, V., Farinas, J. & Pinquier, J. (2022). Deep neural networks for automatic speech processing: a survey from large corpora to limited data. EURASIP Journal on Audio, Speech, and Music Processing. 2022. 19. https://doi.org/10.1186/s13636-022-00251-w
Rista,A. & Kadriu,A.(2020).Automatic Speech Recognition: A Comprehensive Survey. SEEU Review. 15(2): 86-112. https://doi.org/10.2478/seeur-2020-0019
C. Yu, X. Su and Z. Qian, "Multi-Stage Audio-Visual Fusion for Dysarthric Speech Recognition With Pre-Trained Models," in IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 1912-1921, 2023, doi: 10.1109/TNSRE.2023.3262001
Bhangale, K. B., & Mohanaprasad, K. (2021). A review on speech processing using machine learning paradigm. International Journal of Speech Technology, 24, 367-388. https://doi.org/10.1007/s10772-021-09808-0
Bhangale, Kishor Barasu, and Mohanaprasad Kothandaraman. (2022). Survey of Deep Learning Paradigms for Speech Processing. Wireless Personal Communications, 1-37. https://doi.org/10.1007/s11277-022-09640-y
Narendra, N. P., & Alku, P. (2020). Glottal source information for pathological voice detection. IEEE Access, 8, 67745-67755. DOI: 10.1109/ACCESS.2020.2986171
Bhavya K. R, & S. Pravinth Raja. (2023). Fruit Quality Prediction using Deep Learning Strategies for Agriculture. International Journal of Intelligent Systems and Applications in Engineering, 11(2s), 301–310. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2697
Gurugubelli, K., & Vuppala, A. K. (2020). Analytic phase features for dysarthric speech detection and intelligibility assessment. Speech Communication, 121, 1-15. https://doi.org/10.1016/j.specom.2020.04.006
Bhat, C., Vachhani, B., & Kopparapu, S. K. (2017, March). Automatic assessment of dysarthria severity level using audio descriptors. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5070-5074). IEEE. DOI: 10.1109/ICASSP.2017.7953122
Hasegawa-Johnson, M.; Gunderson, J.; Perlman, A.; Huang, T. (2006). Hmm-Based and Svm-Based Recognition of the Speech of Talkers with Spastic Dysarthria. In Proceedings of the 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, Toulouse, France, 14–19 May 2006; IEEE: New York, NY, USA, 2006. DOI: 10.1109/ICASSP.2006.1660840
Ms. Sweta Minj. (2012). Design and Analysis of Class-E Power Amplifier for Wired & Wireless Systems. International Journal of New Practices in Management and Engineering, 1(04), 07 - 13. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/9
Rudzicz, F. (2009). Phonological features in discriminative classification of dysarthric speech. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009; IEEE: New York, NY, USA, 2009; pp. 4605–4608. DOI: 10.1109/ICASSP.2009.4960656
Revathi, A., Nagakrishnan, R., & Sasikaladevi, N. (2022). Comparative analysis of Dysarthric speech recognition: multiple features and robust templates. Multimedia Tools and Applications, 81(22), 31245-31259. https://doi.org/10.1007/s11042-022-12937-6
B. A. Al-Qatab and M. B. Mustafa. (2021). Classification of Dysarthric Speech According to the Severity of Impairment: an Analysis of Acoustic Features. In IEEE Access,9, 18183-18194. doi: 10.1109/ACCESS.2021.3053335.
Janbakhshi, P., Kodrasi, I., & Bourlard, H. (2021). Subspace-Based Learning for Automatic Dysarthric Speech Detection. IEEE Signal Processing Letters, 28, 96–100. doi:10.1109/lsp.2020.3044503
Bhangale, K., & Mohanaprasad, K. (2022). Speech emotion recognition using mel frequency log spectrogram and deep convolutional neural network. In Futuristic Communication and Network Technologies: Select Proceedings of VICFCNT 2020 (pp. 241-250). Springer Singapore. https://doi.org/10.1007/978-981-16-4625-6_24
Fathima, N., Patel, T., Mahima, C., & Iyengar, A. (2018, September). TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages. In Interspeech (pp. 3197-3201).
Yue, Z., Loweimi, E., & Cvetkovic, Z. (2022, May). Raw source and filter modelling for dysarthric speech recognition. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7377-7381). IEEE. DOI: 10.1109/ICASSP43922.2022.9746553
Smith, J., Jones, D., Martinez, J., Perez, A., & Silva, D. Enhancing Engineering Education through Machine Learning: A Case Study. Kuwait Journal of Machine Learning, 1(1). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/86
Yue, Z., Loweimi, E., Cvetkovic, Z., Christensen, H., & Barker, J. (2022, May). Multi-modal acoustic-articulatory feature fusion for dysarthric speech recognition. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7372-7376). IEEE. DOI: 10.1109/ICASSP43922.2022.9746855
Soleymanpour, M., Johnson, M. T., Soleymanpour, R., & Berry, J. (2022, May). Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7382-7386). IEEE. DOI: 10.1109/ICASSP43922.2022.9746585
Liu, S., Geng, M., Hu, S., Xie, X., Cui, M., Yu, J., ... & Meng, H. (2021). Recent progress in the CUHK dysarthric speech recognition system. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 2267-2281. DOI: 10.1109/TASLP.2021.3091805
Shahamiri, S. R. (2021). Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29, 852-861. DOI: 10.1109/TNSRE.2021.3076778
Lin, Y. Y., Zheng, W. Z., Chu, W. C., Han, J. Y., Hung, Y. H., Ho, G. M., ... & Lai, Y. H. (2021). A speech command control-based recognition system for dysarthric patients based on deep learning technology. Applied Sciences, 11(6), 2477. https://doi.org/10.3390/app11062477
Kodrasi, I., & Bourlard, H. (2020). Spectro-temporal sparsity characterization for dysarthric speech detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 1210-1222. DOI: 10.1109/TASLP.2020.2985066
Kodrasi, I. (2021). Temporal envelope and fine structure cues for dysarthric speech detection using CNNs. IEEE Signal Processing Letters, 28, 1853-1857. DOI: 10.1109/LSP.2021.3108509
Chandrashekar, H. M., Karjigi, V., & Sreedevi, N. (2020). Investigation of different time-frequency representations for intelligibility assessment of dysarthric speech. Ieee transactions on neural systems and rehabilitation engineering, 28(12), 2880-2889. DOI: 10.1109/TNSRE.2020.3035392
Chandrashekar, H. M., Karjigi, V., & Sreedevi, N. (2019). Spectro-temporal representation of speech for intelligibility assessment of dysarthria. IEEE Journal of Selected Topics in Signal Processing, 14(2), 390-399. doi: 10.1109/JSTSP.2019.2949912.