Method and system for robust pattern matching in continuous speech for spotting a keyword of interest using orthogonal matching pursuit
US-9293130-B2 · Mar 22, 2016 · US
US10839810B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10839810-B2 |
| Application number | US-201816192914-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 16, 2018 |
| Priority date | Nov 21, 2017 |
| Publication date | Nov 17, 2020 |
| Grant date | Nov 17, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of speaker modelling for a speaker recognition system, comprises: receiving a signal comprising a speaker's speech; and, for a plurality of frames of the signal: obtaining a spectrum of the speaker's speech; generating at least one modified spectrum, by applying effects related to a respective vocal effort; and extracting features from the spectrum of the speaker's speech and the at least one modified spectrum. The method further comprises forming at least one speech model based on the extracted features.
Opening claim text (preview).
The invention claimed is: 1. A method of speaker modelling for a speaker recognition system, comprising: receiving a signal comprising a speaker's speech; and, for a plurality of frames of the signal: obtaining a spectrum of the speaker's speech; generating at least one modified spectrum, by applying effects related to a respective vocal effort, wherein the step of generating at least one modified spectrum comprises: determining a frequency and a bandwidth of at least one formant component of the speaker's speech; generating at least one modified formant component by modifying at least one of the frequency and the bandwidth of the or each formant component; and generating the modified spectrum from the or each modified formant component; and extracting features from the spectrum of the speaker's speech and the at least one modified spectrum; and forming at least one speech model based on the extracted features. 2. A method according to claim 1 , comprising: obtaining the spectrum of the speaker's speech for a plurality of frames of the signal containing voiced speech. 3. A method according to claim 1 , comprising: obtaining the spectrum of the speaker's speech for a plurality of overlapping frames of the signal. 4. A method according to claim 1 , wherein each frame has a duration between 10 ms and 50 ms. 5. A method according to claim 1 , comprising: generating a plurality of modified spectra, by applying effects related to respective vocal efforts. 6. A method according to claim 1 , wherein the step of forming at least one speech model comprises forming a background model for the speaker recognition system, based in part on said speaker's speech. 7. A method according to claim 1 , comprising determining a frequency and a bandwidth of a number of formant components of the speaker's speech in the range from 3-5. 8. A method according to claim 1 , wherein generating modified formant components comprises: modifying the frequency and the bandwidth of the or each formant component. 9. A method according to claim 1 , wherein the features extracted from the spectrum of the user's speech comprise Mel Frequency Cepstral Coefficients. 10. A method according to claim 1 , wherein the step of forming at least one speech model comprises forming a model of the speaker's speech. 11. A method according to claim 10 , wherein the method is performed on enrolling the speaker in the speaker recognition system. 12. A non-transitory computer readable storage medium having computer-executable instructions stored thereon that, when executed by processor circuitry, cause the processor circuitry to perform a method comprising: receiving a signal comprising a speaker's speech; and for a plurality of frames of the signal: obtaining a spectrum of the speaker's speech; generating at least one modified spectrum, by applying effects related to a respective vocal effort, wherein the step of generating at least one modified spectrum comprises: determining a frequency and a bandwidth of at least one formant component of the speaker's speech; generating at least one modified formant component by modifying at least one of the frequency and the bandwidth of the or each formant component; and generating the modified spectrum from the or each modified formant component; extracting features from the spectrum of the speaker's speech and the at least one modified spectrum; and further comprising: forming at least one speech model based on the extracted features. 13. A system for speaker modelling, the system comprising: an input, for receiving a signal comprising a speaker's speech; and, a processor, configured for, for a plurality of frames of the signal: obtaining a spectrum of the speaker's speech; generating at least one modified spectrum, by applying effects related to a respective vocal effort, wherein the step of generating at least one modified spectrum comprises: determining a frequency and a bandwidth of at least one formant component of the speaker's speech; generating at least one modified formant component by modifying at least one of the frequency and the bandwidth of the or each formant component; and generating the modified spectrum from the or each modified formant component; extracting features from the spectrum of the speaker's speech and the at least one modified spectrum; and forming at least one speech model based on the extracted features.
Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions · CPC title
Speaker identification or verification techniques · CPC title
Stress or Lombard effect · CPC title
Training, enrolment or model building · CPC title
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.