A Comprehensive Survey of Automatic Dysarthric Speech Recognition

Main Article Content

Chandarani Pophale
Shankar Chavan

Abstract

Automatic dysarthric speech recognition (DSR) is very crucial for many human computer interaction systems that enables the human to interact with machine in natural way. The objective of this paper is to analyze the literature survey of various Machine learning (ML) and deep learning (DL) based dysarthric speech recognition systems (DSR). This article presents a comprehensive survey of the recent advances in the automatic Dysarthric Speech Recognition (DSR) using machine learning and deep learning paradigms. It focuses on the methodology, database, evaluation metrics and major findings from the study of previous approaches.The proposed survey presents the various challenges related with DSR such as individual variability, limited training data, contextual understanding, articulation variability, vocal quality changes, and speaking rate variations.From the literature survey it provides the gaps between exiting work and previous work on DSR and provides the future direction for improvement of DSR. 

Article Details

How to Cite
Pophale, C. ., & Chavan, S. . (2023). A Comprehensive Survey of Automatic Dysarthric Speech Recognition. International Journal on Recent and Innovation Trends in Computing and Communication, 11(9s), 24–30. https://doi.org/10.17762/ijritcc.v11i9s.7392
Section
Articles

References

Wei Xue, Catia Cucchiarini, Roeland van Hout, and Helmer Strik. 2023. Measuring the intelligibility of dysarthricspeechthrough automatic speech recognition in a pluricentric language. Speech Commun. 148, C (Mar 2023), 23–30. https://doi.org/10.1016/j.specom.2023.02.004

Aisha Jaddoh, Fernando Loizides & Omer Rana (2022) Interaction between people with dysarthria and speechrecognitio systems: A review, Assistive Technology, DOI: 10.1080/10400435.2022.2061085

Chen, L. Special Issue on Automatic Speech Recognition. Appl. Sci. 2023, 13, 5389. https://doi.org/10.3390/app13095389

Shih, D.-H.; Liao, C.-H.; Wu, T.-W.; Xu, X.-Y.; Shih, M.-H. (2022). Dysarthria Speech Detection Using Convolutional Neural Networks with Gated Recurrent Unit. Healthcare. 2022. 10. 1956. https://doi.org/10.3390/healthcare10101956

S. Alharbi et al., "Automatic Speech Recognition: Systematic Literature Review," in IEEE Access, vol. 9, pp. 131858-131876, 2021, doi: 10.1109/ACCESS.2021.31125

Roger, V., Farinas, J. & Pinquier, J. (2022). Deep neural networks for automatic speech processing: a survey from large corpora to limited data. EURASIP Journal on Audio, Speech, and Music Processing. 2022. 19. https://doi.org/10.1186/s13636-022-00251-w

Rista,A. & Kadriu,A.(2020).Automatic Speech Recognition: A Comprehensive Survey. SEEU Review. 15(2): 86-112. https://doi.org/10.2478/seeur-2020-0019

C. Yu, X. Su and Z. Qian, "Multi-Stage Audio-Visual Fusion for Dysarthric Speech Recognition With Pre-Trained Models," in IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 31, pp. 1912-1921, 2023, doi: 10.1109/TNSRE.2023.3262001

Bhangale, K. B., & Mohanaprasad, K. (2021). A review on speech processing using machine learning paradigm. International Journal of Speech Technology, 24, 367-388. https://doi.org/10.1007/s10772-021-09808-0

Bhangale, Kishor Barasu, and Mohanaprasad Kothandaraman. (2022). Survey of Deep Learning Paradigms for Speech Processing. Wireless Personal Communications, 1-37. https://doi.org/10.1007/s11277-022-09640-y

Narendra, N. P., & Alku, P. (2020). Glottal source information for pathological voice detection. IEEE Access, 8, 67745-67755. DOI: 10.1109/ACCESS.2020.2986171

Bhavya K. R, & S. Pravinth Raja. (2023). Fruit Quality Prediction using Deep Learning Strategies for Agriculture. International Journal of Intelligent Systems and Applications in Engineering, 11(2s), 301–310. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2697

Gurugubelli, K., & Vuppala, A. K. (2020). Analytic phase features for dysarthric speech detection and intelligibility assessment. Speech Communication, 121, 1-15. https://doi.org/10.1016/j.specom.2020.04.006

Bhat, C., Vachhani, B., & Kopparapu, S. K. (2017, March). Automatic assessment of dysarthria severity level using audio descriptors. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5070-5074). IEEE. DOI: 10.1109/ICASSP.2017.7953122

Hasegawa-Johnson, M.; Gunderson, J.; Perlman, A.; Huang, T. (2006). Hmm-Based and Svm-Based Recognition of the Speech of Talkers with Spastic Dysarthria. In Proceedings of the 2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings, Toulouse, France, 14–19 May 2006; IEEE: New York, NY, USA, 2006. DOI: 10.1109/ICASSP.2006.1660840

Ms. Sweta Minj. (2012). Design and Analysis of Class-E Power Amplifier for Wired & Wireless Systems. International Journal of New Practices in Management and Engineering, 1(04), 07 - 13. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/9

Rudzicz, F. (2009). Phonological features in discriminative classification of dysarthric speech. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009; IEEE: New York, NY, USA, 2009; pp. 4605–4608. DOI: 10.1109/ICASSP.2009.4960656

Revathi, A., Nagakrishnan, R., & Sasikaladevi, N. (2022). Comparative analysis of Dysarthric speech recognition: multiple features and robust templates. Multimedia Tools and Applications, 81(22), 31245-31259. https://doi.org/10.1007/s11042-022-12937-6

B. A. Al-Qatab and M. B. Mustafa. (2021). Classification of Dysarthric Speech According to the Severity of Impairment: an Analysis of Acoustic Features. In IEEE Access,9, 18183-18194. doi: 10.1109/ACCESS.2021.3053335.

Janbakhshi, P., Kodrasi, I., & Bourlard, H. (2021). Subspace-Based Learning for Automatic Dysarthric Speech Detection. IEEE Signal Processing Letters, 28, 96–100. doi:10.1109/lsp.2020.3044503

Bhangale, K., & Mohanaprasad, K. (2022). Speech emotion recognition using mel frequency log spectrogram and deep convolutional neural network. In Futuristic Communication and Network Technologies: Select Proceedings of VICFCNT 2020 (pp. 241-250). Springer Singapore. https://doi.org/10.1007/978-981-16-4625-6_24

Fathima, N., Patel, T., Mahima, C., & Iyengar, A. (2018, September). TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages. In Interspeech (pp. 3197-3201).

Yue, Z., Loweimi, E., & Cvetkovic, Z. (2022, May). Raw source and filter modelling for dysarthric speech recognition. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7377-7381). IEEE. DOI: 10.1109/ICASSP43922.2022.9746553

Smith, J., Jones, D., Martinez, J., Perez, A., & Silva, D. Enhancing Engineering Education through Machine Learning: A Case Study. Kuwait Journal of Machine Learning, 1(1). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/86

Yue, Z., Loweimi, E., Cvetkovic, Z., Christensen, H., & Barker, J. (2022, May). Multi-modal acoustic-articulatory feature fusion for dysarthric speech recognition. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7372-7376). IEEE. DOI: 10.1109/ICASSP43922.2022.9746855

Soleymanpour, M., Johnson, M. T., Soleymanpour, R., & Berry, J. (2022, May). Synthesizing Dysarthric Speech Using Multi-Speaker Tts For Dysarthric Speech Recognition. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7382-7386). IEEE. DOI: 10.1109/ICASSP43922.2022.9746585

Liu, S., Geng, M., Hu, S., Xie, X., Cui, M., Yu, J., ... & Meng, H. (2021). Recent progress in the CUHK dysarthric speech recognition system. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 2267-2281. DOI: 10.1109/TASLP.2021.3091805

Shahamiri, S. R. (2021). Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29, 852-861. DOI: 10.1109/TNSRE.2021.3076778

Lin, Y. Y., Zheng, W. Z., Chu, W. C., Han, J. Y., Hung, Y. H., Ho, G. M., ... & Lai, Y. H. (2021). A speech command control-based recognition system for dysarthric patients based on deep learning technology. Applied Sciences, 11(6), 2477. https://doi.org/10.3390/app11062477

Kodrasi, I., & Bourlard, H. (2020). Spectro-temporal sparsity characterization for dysarthric speech detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 1210-1222. DOI: 10.1109/TASLP.2020.2985066

Kodrasi, I. (2021). Temporal envelope and fine structure cues for dysarthric speech detection using CNNs. IEEE Signal Processing Letters, 28, 1853-1857. DOI: 10.1109/LSP.2021.3108509

Chandrashekar, H. M., Karjigi, V., & Sreedevi, N. (2020). Investigation of different time-frequency representations for intelligibility assessment of dysarthric speech. Ieee transactions on neural systems and rehabilitation engineering, 28(12), 2880-2889. DOI: 10.1109/TNSRE.2020.3035392

Chandrashekar, H. M., Karjigi, V., & Sreedevi, N. (2019). Spectro-temporal representation of speech for intelligibility assessment of dysarthria. IEEE Journal of Selected Topics in Signal Processing, 14(2), 390-399. doi: 10.1109/JSTSP.2019.2949912.