Development of efficient techniques for ASR System for Speech Detection and Recognization system using Gaussian Mixture Model- Universal Background Model

et al. Veera V Rama Rao M

doi:10.17762/ijritcc.v11i10.8558

PDF

Published: Nov 2, 2023

DOI: https://doi.org/10.17762/ijritcc.v11i10.8558

Keywords:

Gaussian Mixture Model-Universal Background Model, ASR, Power normalised cepstral coefficient

Veera V Rama Rao M, Kumar N

Abstract

Some practical uses of ASR have been implemented, including the transcription of meetings and the usage of smart speakers. It is the process by which speech waves are transformed into text that allows computers to interpret and act upon human speech. Scalable strategies for developing ASR systems in languages where no voice transcriptions or pronunciation dictionaries exist are the primary focus of this work. We first show that the necessity for voice transcription into the target language can be greatly reduced through cross-lingual acoustic model transfer when phonemic pronunciation lexicons exist in the new language. Afterwards, we investigate three approaches to dealing with languages that lack a pronunciation lexicon. Secondly, we have a look at the efficiency of graphemic acoustic model transfer, which makes it easy to build pronunciation dictionaries. Thesis problems can be solved, in part, by investigating optimization strategies for training on huge corpora (such as GA+HMM and DE+HMM). In the training phase of acoustic modelling, the suggested method is applied to traditional methods. Read speech and HMI voice experiments indicated that while each data augmentation strategy alone did not always increase recognition performance, using all three techniques together did. Power normalised cepstral coefficient (PNCC) features are tweaked somewhat in this work to enhance verification accuracy. To increase speaker verification accuracy, we suggest employing multiple “Gaussian Mixture Model-Universal Background Model (GMM-UBM) and SVM classifiers”. Importantly, pitch shift data augmentation and multi-task training reduced bias by more than 18% absolute compared to the baseline system for read speech, and applying all three data augmentation techniques during fine tuning reduced bias by more than 7% for HMI speech, while increasing recognition accuracy of both native and non-native Dutch speech.

How to Cite

Veera V Rama Rao M, et al. (2023). Development of efficient techniques for ASR System for Speech Detection and Recognization system using Gaussian Mixture Model- Universal Background Model. International Journal on Recent and Innovation Trends in Computing and Communication, 11(10), 636–642. https://doi.org/10.17762/ijritcc.v11i10.8558

Issue

Vol. 11 No. 10 (2023)

Section

Articles

Author Biography

Veera V Rama Rao M, Kumar N

¹Veera V Rama Rao M, ²Kumar N

¹Research Scholar, Dept. of CSE, Vels Institute of Science Technology and Advanced Studies, Chennai, India

murali.mvv@gmail.com

²Professor, Dept. of CSE, Vels Institute of Science Technology and Advanced Studies, Chennai, India

kumar.se@velsuniv.ac.in

Citation Indices	All	Since 2018
Citation	5854	3996
h-index	28	23
i10-index	119	72

Year	Rate
2019	12.6%
2018	18.3%
2017	16.9%
2016	18.8%
2015	22.9%
2014	28.9%
2013	26.1%

Development of efficient techniques for ASR System for Speech Detection and Recognization system using Gaussian Mixture Model- Universal Background Model

Abstract

Veera V Rama Rao M, Kumar N

Contact Us:

Auricle Global Society of Education and Research

Y-18-A, Near Sanskar Play School, Sudarshana Nagar,

Bikaner, Rajasthan (India). Pin 334003

: editor@ijritcc.org

Quick Links:

Article Sidebar

Main Article Content

Abstract

Article Details

Veera V Rama Rao M, Kumar N

Contact Us:

Auricle Global Society of Education and Research

Y-18-A, Near Sanskar Play School, Sudarshana Nagar,

Bikaner, Rajasthan (India). Pin 334003

: editor@ijritcc.org

Quick Links: