Speaker identification
US-2019228779-A1 · Jul 25, 2019 · US
US11367451B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11367451-B2 |
| Application number | US-201916519757-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 23, 2019 |
| Priority date | Aug 27, 2018 |
| Publication date | Jun 21, 2022 |
| Grant date | Jun 21, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A speaker authentication method and apparatus may extract input speaker features corresponding to a plurality of frames of an input speech of an object, estimate discriminable speaker sections corresponding to the plurality of frames, and dynamically match the input speaker features to pre-enrolled enrolled speaker features based on the discriminable speaker section.
Opening claim text (preview).
What is claimed is: 1. A speaker authentication method, comprising: receiving a plurality of frames corresponding to an input speech; extracting input speaker features corresponding to the plurality of frames; estimating discriminable speaker sections corresponding to the plurality of frames; dynamically matching the input speaker features to pre-enrolled enrolled speaker features based on the discriminable speaker sections; and performing a speaker authentication based on a result of the dynamic matching, wherein the dynamic matching comprises: assigning a first weight to an input speaker feature corresponding to a pre-determined short pause among the input speaker features; assigning a second weight to an input speaker feature corresponding to a speech among the input speaker features; and dynamically matching each of the first weight-assigned input speaker feature and the second weight-assigned input speaker feature to the pre-enrolled enrolled speaker features. 2. The method of claim 1 , wherein the dynamic matching comprises: selecting input speaker features having discriminable speaker sections greater than or equal to a threshold value; and dynamically matching the selected input speaker features to the pre-enrolled enrolled speaker features. 3. The method of claim 1 , wherein the dynamic matching comprises: dropping an input speaker feature having a discriminable speaker section less than a threshold value; dropping a pre-enrolled enrolled speaker feature corresponding to the dropped input speaker feature; and dynamically matching remaining input speaker features, excluding the dropped input speaker feature, to remaining enrollment speaker features, excluding the dropped pre-enrolled enrollment speaker feature. 4. The method of claim 1 , wherein the dynamic matching comprises: assigning a weight to input speaker features having discriminable speaker sections being greater than or equal to a threshold value; and dynamically matching the weight-assigned input speaker features to the pre-enrolled enrolled speaker features. 5. The method of claim 1 , wherein the dynamic matching comprises: dropping an input speaker feature corresponding to a short pause among the input speaker features; and dynamically matching remaining input features, excluding the dropped input speaker feature, to the pre-enrolled enrolled speaker features. 6. The method of claim 1 , wherein the dynamic matching comprises: aligning the pre-enrolled enrolled speaker features representing phonemes identical to phonemes represented by the input speaker features; and dynamically matching the input speaker features to the aligned pre-enrolled enrolled speaker features. 7. The method of claim 1 , wherein the performing comprises: outputting a distance corresponding to the input speech by accumulating results of the dynamic matching; and performing the speaker authentication based on a result of comparing the distance to a threshold value. 8. The method of claim 1 , wherein the extracting comprises extracting the input speaker features based on per-frequency energies of the plurality of frames. 9. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the speaker authentication method of claim 1 . 10. A speaker authentication apparatus, comprising: a communication interface configured to receive a plurality of frames corresponding to an input speech; and a processor configured to: extract input speaker features corresponding to the plurality of frames; estimate discriminable speaker sections corresponding to the plurality of frames; dynamically match the input speaker features to pre-enrolled enrolled speaker features based on the discriminable speaker sections; and perform a speaker authentication based on a result of the dynamic matching, wherein, for the dynamic matching, the processor is configured to: assign a first weight to an input speaker feature corresponding to a pre-determined short pause among the input speaker features; assign a second weight to an input speaker feature corresponding to a speech among the input speaker features; and dynamically match each of the first weight-assigned input speaker feature and the second weight-assigned input speaker feature to the pre-enrolled enrolled speaker features. 11. The apparatus of claim 10 , wherein the processor is configured to select input speaker features having discriminable speaker sections greater than or equal to a threshold value, and dynamically match the selected input speaker features to the pre-enrolled enrolled speaker features. 12. The apparatus of claim 10 , wherein the processor is configured to drop an input speaker feature having a discriminable speaker section less than a threshold value, and dynamically match remaining input speaker features, excluding the dropped input speaker feature, to the pre-enrolled enrolled speaker features. 13. The apparatus of claim 10 , wherein the processor is configured to assign a weight to input speaker features having discriminable speaker sections greater than or equal to a threshold value, and dynamically match the weight-assigned input speaker features to the pre-enrolled enrolled speaker features. 14. A speaker authentication method, comprising: extracting input speaker features corresponding to speech frames; determining discriminable speaker sections in each of the speech frames; dynamically matching select input speaker features, of the extracted input speaker features, to pre-enrolled enrolled speaker features based on the discriminable speaker sections satisfying a criteria; and authenticating a speaker based on the dynamically matched input speaker features, wherein the dynamic matching comprises: assigning a first weight to an input speaker feature corresponding to a pre-determined short pause among the input speaker features; assigning a second weight to an input speaker feature corresponding to a speech among the input speaker features; and dynamically matching each of the first weight-assigned input speaker feature and the second weight-assigned input speaker feature to the pre-enrolled enrolled speaker features. 15. The method of claim 14 , wherein the input speaker features correspond to phonemes and the discriminable speaker sections comprise of voiced sounds. 16. The method of claim 15 , wherein the criteria is satisfied when a discriminable speaker section of the discriminable speaker sections is greater than or equal to a threshold value. 17. The method of claim 15 , wherein the criteria is determined based on comparisons of relative weights applied to the discriminable speaker sections.
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Supervised learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.