Speaker identification - prototype development and performance

Watts, David Michael Graeme (2006) Speaker identification - prototype development and performance. [USQ Project] (Unpublished)

Metadata

HTML CitationEndNoteMODSDublin CoreReference Manager

Full text available as:

[img]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
3929Kb

Abstract

Human speech is our most natural form of communication and conveys both meaning and identity. The identity of a speaker can be determined from the information contained in the speech signal through speaker identification. Speaker identification is concerned with identifying unknown speakers from a database of speaker models previously enrolled in the system. The general process of speaker identification involves two stages. The first stage extracts features from speakers that are to be enrolled into the system. The second stage involves processing the identity of a speaker using features extracted from the speech and comparing these to the speaker models. Several techniques available for feature extraction including Linear Predictive Coding (LPC), Mel-Frequency Cepstral Coefficients and LPC Cepstral coefficients. These features are used with a classification technique to create a speaker model. Vector Quantization is commonly used in speaker identification producing reliable results. This project demonstrates a prototype speaker identification system tailored for utterances containing less than ten words and target sets of less than eight voice profiles. VQ (codebook size = 128) with 20-dimension LPCC obtain accuracy results of 83% and 100% using 12 speakers with the NTIMIT and Alternative (own) corpus, respectively. Tests were conducted using 30s of training speech and 3s of testing speech.

Item Type:USQ Project
Uncontrolled Keywords:speech; linear predictive coding (LPC); vector quantization (VQ); gaussian mixture models; NTIMIT
Fields of Research (FOR2008):08 Information and Computing Sciences > 0801 Artificial Intelligence and Image Processing > 080107 Natural Language Processing
Subjects:280000 Information, Computing and Communication Sciences > 280200 Artificial Intelligence and Signal and Image Processing > 280206 Speech Recognition
Socio-Economic Objective (SEO2008):UNSPECIFIED
ID Code:2338
Deposited By:
Deposited On:11 Oct 2007 11:03
Last Modified:11 Oct 2007 11:03

Archive Staff Only: edit this record