Generative Adversarial Network with Convolutional Wavelet Packet Transforms for Automated Speaker Recognition and Classification

et al. Venkata Subba Reddy Gade

doi:10.17762/ijritcc.v11i9.9550

PDF

Published: Nov 5, 2023

DOI: https://doi.org/10.17762/ijritcc.v11i9.9550

Keywords:

Automated Speaker Recognition, Deep learning, Artificial intelligence Generative Adversarial Networks (GANs), Convolutional Wavelet Packet Transform (CWPT).

Venkata Subba Reddy Gade, M Sumathi

Abstract

Speech is an effective mode of communication that always conveys abundant and pertinent information, such as the gender, accent, and other distinguishing characteristics of the speaker. These distinctive characteristics allow researchers to identify human voices using artificial intelligence (AI) techniques, which are useful for forensic voice verification, security and surveillance, electronic voice eavesdropping, mobile banking, and mobile purchasing. Deep learning (DL) and other advances in hardware have piqued the interest of researchers studying automatic speaker identification (SI). In recent years, Generative Adversarial Networks (GANs) have demonstrated exceptional ability in producing synthetic data and improving the performance of several machine learning tasks. The capacity of Convolutional Wavelet Packet Transform (CWPT) and Generative Adversarial Networks are combined in this paper to propose a novel way of enhancing the accuracy and robustness of Speaker Recognition and Classification systems. Audio signals are dissected using the Convolutional Wavelet Packet Transform into a multi-resolution, time-frequency representation that faithfully preserves local and global characteristics. The improved audio features better precisely describe speech traits and handle pitch, tone, and pronunciation variations that are frequent in speaker recognition tasks. Using GANs to create synthetic speech samples, our suggested method GAN-CWPT enriches the training data and broadens the dataset's diversity. The generator and discriminator components of the GAN architecture have been tweaked to produce realistic speech samples with attributes quite similar to genuine speaker utterances. The new dataset enhances the Speaker Recognition and Classification system's robustness and generalization, even in environments with little training data. We conduct extensive tests on standard speaker recognition datasets to determine how well our method works. The findings demonstrate that, compared to conventional methods, the GAN-CWPTs combination significantly improves speaker recognition, classification accuracy, and efficiency. Additionally, the suggested model GAN-CWPT exhibits stronger generalization on unknown speakers and excels even with loud and poor audio inputs.

How to Cite

Venkata Subba Reddy Gade, et al. (2023). Generative Adversarial Network with Convolutional Wavelet Packet Transforms for Automated Speaker Recognition and Classification. International Journal on Recent and Innovation Trends in Computing and Communication, 11(9), 3415–3429. https://doi.org/10.17762/ijritcc.v11i9.9550