Essays24.com - Term Papers and Free Essays
Search

High-Level Speaker Verification Via Articulatory-Feature Based Sequence Kernels And Svm

Essay by   •  November 26, 2010  •  2,689 Words (11 Pages)  •  1,297 Views

Essay Preview: High-Level Speaker Verification Via Articulatory-Feature Based Sequence Kernels And Svm

Report this essay
Page 1 of 11

begin{abstract}vspace{-0.06cm}

Articulatory-feature based pronunciation models (AFCPMs) are capable of

capturing the pronunciation variations among different speakers and are good

for high-level speaker recognition. However, the likelihood-ratio scoring

method of AFPCMs is based on a decision boundary created by training the target

speaker model and universal background model (UBM) separately. Therefore, the

method does not fully utilize the discriminative information available in the

training data. To fully harness the discriminative information, this paper

proposes training a support vector machine (SVM) for computing the verification

scores. More precisely, the models of target speakers, individual background

speakers, and claimants are converted to AF-supervectors, which form the inputs

to an AF-based kernel of the SVM for computing verification scores. Results

show that the proposed AF-kernel scoring is complementary to likelihood-ratio

scoring, leading to better performance when the two scoring methods are

combined. Further performance enhancement was also observed when the AF scores

were combined with acoustic scores derived from a GMM-UBM system.

%However, to represent the impostor population, the likelihood-ratio scoring

%method of AFPCMs only uses a single universal background model (UBM) that is

%trained without considering the target speakers; therefore this scoring method

%does not fully utilize the discriminative information available in the training

%data.

end{abstract}

%noindent{bf Index Terms}: Speaker verification, kernels, articulatory

%features, pronunciation models, SVM vspace{-0.1cm}

section{Introduction}label{sec:intro}%vspace{-0.1cm}

Studies have shown that combining low-level acoustic information with

high-level speaker information---such as the usage or duration of particular

words, prosodic features and articulatory features (AF)---can improve speaker

verification performance

cite{Reynolds&Andrew03,Campbell&Reynolds03,Klusacek03,Leung&Mak&Kung06,Zhang&Mak&Meng07}.

However, in most systems (e.g., GMM-UBM cite{Reynolds&Quatieri&Dunn00} and

CD-AFCPM cite{Zhang&Mak&Meng07}), scoring is done at the frame-level, i.e.,

each frame of speech is scored separately and then frame-based scores are

accumulated to produce an utterance-based score for classification. This

frame-based scoring scheme has two drawbacks. First, treating the frames

individually may not be able to fully capture the sequence information

contained in the utterance. Second, the goal of speaker verification is to

minimize classification errors on test utterances rather than on individual

speech frames. These drawbacks motivate us to derive a sequence-based approach

in which an utterance is considered comprising a sequence of symbols and the

utterance-based score can be obtained from a support vector machine (SVM)

through a kernel function of the sequence of symbols.

This paper derives an articulatory-feature based sequence kernel and apply it

to high-level speaker verification. For each target speaker, the observation

sequences (AF labels) derived from his/her utterances are used to train a

phonetic-class dependent articulatory feature-based pronunciation model

(CD-AFCPM) cite{Zhang&Mak&Meng07}. These models are then converted to

fixed-dimension AF supervectors for training a speaker-dependent SVM to

discriminate the target speaker from background speakers in the AF-supervector

space. To enhance the discrimination, a kernel that computes the similarity

between the target speaker's supervector and the claimant's supervector is

derived for the SVM. During verification, the AF labels derived from the speech

of a claimant are used to build a CD-AFCPM of the claimant, which together with

the target speaker model form the inputs to the speaker-dependent SVM to

compute the verification scores. Because the kernel depends on the AF models of

both the target speaker and the background speakers, we refer to it as

AF-kernel.

The remainder of the paper will derive the AF-kernel and discuss the

relationship between traditional frame-based log-likelihood (LR) scoring and

AF-kernel based SVM scoring. Experimental results on the NIST2000 database are

presented.

section{Phonetic-Class Dependent AFCPM}

subsection{Articulatory-Feature Based Supervectors}label{sec:AF_and_AFSuperVector}

Articulatory features (AFs) are representations describing the

movements or positions

...

...

Download as:   txt (26.9 Kb)   pdf (260.2 Kb)   docx (21.8 Kb)  
Continue for 10 more pages »
Only available on Essays24.com