High-Level Speaker Verification Via Articulatory-Feature Based Sequence Kernels And Svm

Essay by 24 • November 26, 2010 • 2,689 Words (11 Pages) • 1,824 Views

Essay Preview: High-Level Speaker Verification Via Articulatory-Feature Based Sequence Kernels And Svm

prev next

Report this essay

Page 1 of 11

begin{abstract}vspace{-0.06cm}

Articulatory-feature based pronunciation models (AFCPMs) are capable of

capturing the pronunciation variations among different speakers and are good

for high-level speaker recognition. However, the likelihood-ratio scoring

method of AFPCMs is based on a decision boundary created by training the target

speaker model and universal background model (UBM) separately. Therefore, the

method does not fully utilize the discriminative information available in the

training data. To fully harness the discriminative information, this paper

proposes training a support vector machine (SVM) for computing the verification

scores. More precisely, the models of target speakers, individual background

speakers, and claimants are converted to AF-supervectors, which form the inputs

to an AF-based kernel of the SVM for computing verification scores. Results

show that the proposed AF-kernel scoring is complementary to likelihood-ratio

scoring, leading to better performance when the two scoring methods are

combined. Further performance enhancement was also observed when the AF scores

were combined with acoustic scores derived from a GMM-UBM system.

%However, to represent the impostor population, the likelihood-ratio scoring

%method of AFPCMs only uses a single universal background model (UBM) that is

%trained without considering the target speakers; therefore this scoring method

%does not fully utilize the discriminative information available in the training

%data.

end{abstract}

%noindent{bf Index Terms}: Speaker verification, kernels, articulatory

%features, pronunciation models, SVM vspace{-0.1cm}

section{Introduction}label{sec:intro}%vspace{-0.1cm}

Studies have shown that combining low-level acoustic information with

high-level speaker information---such as the usage or duration of particular

words, prosodic features and articulatory features (AF)---can improve speaker

verification performance

cite{Reynolds&Andrew03,Campbell&Reynolds03,Klusacek03,Leung&Mak&Kung06,Zhang&Mak&Meng07}.

However, in most systems (e.g., GMM-UBM cite{Reynolds&Quatieri&Dunn00} and

CD-AFCPM cite{Zhang&Mak&Meng07}), scoring is done at the frame-level, i.e.,

each frame of speech is scored separately and then frame-based scores are

accumulated to produce an utterance-based score for classification. This

frame-based scoring scheme has two drawbacks. First, treating the frames

individually may not be able to fully capture the sequence information

contained in the utterance. Second, the goal of speaker verification is to

minimize classification errors on test utterances rather than on individual

speech frames. These drawbacks motivate us to derive a sequence-based approach

in which an utterance is considered comprising a sequence of symbols and the

utterance-based score can be obtained from a support vector machine (SVM)

through a kernel function of the sequence of symbols.

This paper derives an articulatory-feature based sequence kernel and apply it

to high-level speaker verification. For each target speaker, the observation

sequences (AF labels) derived from his/her utterances are used to train a

phonetic-class dependent articulatory feature-based pronunciation model

(CD-AFCPM) cite{Zhang&Mak&Meng07}. These models are then converted to

fixed-dimension AF supervectors for training a speaker-dependent SVM to

discriminate the target speaker from background speakers in the AF-supervector

space. To enhance the discrimination, a kernel that computes the similarity

between the target speaker's supervector and the claimant's supervector is

derived for the SVM. During verification, the AF labels derived from the speech

of a claimant are used to build a CD-AFCPM of the claimant, which together with

the target speaker model form the inputs to the speaker-dependent SVM to

compute the verification scores. Because the kernel depends on the AF models of

both the target speaker and the background speakers, we refer to it as

AF-kernel.

The remainder of the paper will derive the AF-kernel and discuss the

relationship between traditional frame-based log-likelihood (LR) scoring and

AF-kernel based SVM scoring. Experimental results on the NIST2000 database are

presented.

section{Phonetic-Class Dependent AFCPM}

subsection{Articulatory-Feature Based Supervectors}label{sec:AF_and_AFSuperVector}

Articulatory features (AFs) are representations describing the

movements or positions

...

Download as: txt (26.9 Kb) pdf (260.2 Kb) docx (21.8 Kb)

Continue for 10 more pages »

Read Full Essay Save

Only available on Essays24.com

Similar Essays

War On Iraq Article In High School Newspaper

At 7:12PM on Wednesday, March19, 2003, President Bush authorized a full-scale war on Iraq and the Saddam Hussein led regime by uttering just two simple

676 Words | 3 Pages
An Analysis Of The Energizer Bunny Commercial Sequence

Energizer batteries have been equated with long-lasting energy in your Walkman or other battery-operated appliance. "That damned Energizer bunny" is the cause; he's so

1,295 Words | 6 Pages
How High School And College Differ

How High School and College Differ There are many similarities, and differences betweeen high school and college. High School was the best four years of

774 Words | 4 Pages
Religion And Gender-Based Violence

Model United Nations 2004 Position Paper Committee: Status of Women Topic: Religion and Gender-Based Violence Country: United Kingdom A. The United Kingdom is full of

640 Words | 3 Pages
How High School And College Differ

How High School and College Differ There are many similarities, and differences betweeen high school and college. High School was the best four years of

746 Words | 3 Pages
Atenolol: The Medication For High Blood Pressure

Atenolol is a nationally known, commonly used medication that has helped to change the lives of many people in America. Atenolol, also known as

1,327 Words | 6 Pages
Speaker Identification And Verification Over Short Distance Telephone Lines Using Artificial Neural Networks

SPEAKER IDENTIFICATION AND VERIFICATION OVER SHORT DISTANCE TELEPHONE LINES USING ARTIFICIAL NEURAL NETWORKS Ganesh K Venayagamoorthy, Narend Sunderpersadh, and Theophilus N Andrew gkumar@ieee.org sundern@telkom.co.za theo@wpo.mlsultan.ac.za

2,479 Words | 10 Pages
Should A High School Curriculum Be Career Based Or College Prep Based?

There is much debate about whether a high school curriculum should be career based or whether the courses should be geared more towards college preparation.

547 Words | 3 Pages