Speaker enrollment

US10839810B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10839810-B2
Application numberUS-201816192914-A
CountryUS
Kind codeB2
Filing dateNov 16, 2018
Priority dateNov 21, 2017
Publication dateNov 17, 2020
Grant dateNov 17, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of speaker modelling for a speaker recognition system, comprises: receiving a signal comprising a speaker's speech; and, for a plurality of frames of the signal: obtaining a spectrum of the speaker's speech; generating at least one modified spectrum, by applying effects related to a respective vocal effort; and extracting features from the spectrum of the speaker's speech and the at least one modified spectrum. The method further comprises forming at least one speech model based on the extracted features.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of speaker modelling for a speaker recognition system, comprising: receiving a signal comprising a speaker's speech; and, for a plurality of frames of the signal: obtaining a spectrum of the speaker's speech; generating at least one modified spectrum, by applying effects related to a respective vocal effort, wherein the step of generating at least one modified spectrum comprises: determining a frequency and a bandwidth of at least one formant component of the speaker's speech; generating at least one modified formant component by modifying at least one of the frequency and the bandwidth of the or each formant component; and generating the modified spectrum from the or each modified formant component; and extracting features from the spectrum of the speaker's speech and the at least one modified spectrum; and forming at least one speech model based on the extracted features. 2. A method according to claim 1 , comprising: obtaining the spectrum of the speaker's speech for a plurality of frames of the signal containing voiced speech. 3. A method according to claim 1 , comprising: obtaining the spectrum of the speaker's speech for a plurality of overlapping frames of the signal. 4. A method according to claim 1 , wherein each frame has a duration between 10 ms and 50 ms. 5. A method according to claim 1 , comprising: generating a plurality of modified spectra, by applying effects related to respective vocal efforts. 6. A method according to claim 1 , wherein the step of forming at least one speech model comprises forming a background model for the speaker recognition system, based in part on said speaker's speech. 7. A method according to claim 1 , comprising determining a frequency and a bandwidth of a number of formant components of the speaker's speech in the range from 3-5. 8. A method according to claim 1 , wherein generating modified formant components comprises: modifying the frequency and the bandwidth of the or each formant component. 9. A method according to claim 1 , wherein the features extracted from the spectrum of the user's speech comprise Mel Frequency Cepstral Coefficients. 10. A method according to claim 1 , wherein the step of forming at least one speech model comprises forming a model of the speaker's speech. 11. A method according to claim 10 , wherein the method is performed on enrolling the speaker in the speaker recognition system. 12. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method comprising: receiving a signal comprising a speaker's speech; and for a plurality of frames of the signal: obtaining a spectrum of the speaker's speech; generating at least one modified spectrum, by applying effects related to a respective vocal effort, wherein the step of generating at least one modified spectrum comprises: determining a frequency and a bandwidth of at least one formant component of the speaker's speech; generating at least one modified formant component by modifying at least one of the frequency and the bandwidth of the or each formant component; and generating the modified spectrum from the or each modified formant component; extracting features from the spectrum of the speaker's speech and the at least one modified spectrum; and further comprising: forming at least one speech model based on the extracted features. 13. A system for speaker modelling, the system comprising: an input, for receiving a signal comprising a speaker's speech; and, a processor, configured for, for a plurality of frames of the signal: obtaining a spectrum of the speaker's speech; generating at least one modified spectrum, by applying effects related to a respective vocal effort, wherein the step of generating at least one modified spectrum comprises: determining a frequency and a bandwidth of at least one formant component of the speaker's speech; generating at least one modified formant component by modifying at least one of the frequency and the bandwidth of the or each formant component; and generating the modified spectrum from the or each modified formant component; extracting features from the spectrum of the speaker's speech and the at least one modified spectrum; and forming at least one speech model based on the extracted features.

Assignees

Inventors

Classifications

  • Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions · CPC title

  • Speaker identification or verification techniques · CPC title

  • Stress or Lombard effect · CPC title

  • G10L17/04Primary

    Training, enrolment or model building · CPC title

  • G10L17/02Primary

    Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10839810B2 cover?
A method of speaker modelling for a speaker recognition system, comprises: receiving a signal comprising a speaker's speech; and, for a plurality of frames of the signal: obtaining a spectrum of the speaker's speech; generating at least one modified spectrum, by applying effects related to a respective vocal effort; and extracting features from the spectrum of the speaker's speech and the at le…
Who is the assignee on this patent?
Cirrus Logic Int Semiconductor Ltd, Cirrus Logic Inc
What technology area does this patent fall under?
Primary CPC classification G10L17/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 17 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).