Method and apparatus with speaker authentication and/or training

US11367451B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11367451-B2
Application numberUS-201916519757-A
CountryUS
Kind codeB2
Filing dateJul 23, 2019
Priority dateAug 27, 2018
Publication dateJun 21, 2022
Grant dateJun 21, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speaker authentication method and apparatus may extract input speaker features corresponding to a plurality of frames of an input speech of an object, estimate discriminable speaker sections corresponding to the plurality of frames, and dynamically match the input speaker features to pre-enrolled enrolled speaker features based on the discriminable speaker section.

First claim

Opening claim text (preview).

What is claimed is: 1. A speaker authentication method, comprising: receiving a plurality of frames corresponding to an input speech; extracting input speaker features corresponding to the plurality of frames; estimating discriminable speaker sections corresponding to the plurality of frames; dynamically matching the input speaker features to pre-enrolled enrolled speaker features based on the discriminable speaker sections; and performing a speaker authentication based on a result of the dynamic matching, wherein the dynamic matching comprises: assigning a first weight to an input speaker feature corresponding to a pre-determined short pause among the input speaker features; assigning a second weight to an input speaker feature corresponding to a speech among the input speaker features; and dynamically matching each of the first weight-assigned input speaker feature and the second weight-assigned input speaker feature to the pre-enrolled enrolled speaker features. 2. The method of claim 1 , wherein the dynamic matching comprises: selecting input speaker features having discriminable speaker sections greater than or equal to a threshold value; and dynamically matching the selected input speaker features to the pre-enrolled enrolled speaker features. 3. The method of claim 1 , wherein the dynamic matching comprises: dropping an input speaker feature having a discriminable speaker section less than a threshold value; dropping a pre-enrolled enrolled speaker feature corresponding to the dropped input speaker feature; and dynamically matching remaining input speaker features, excluding the dropped input speaker feature, to remaining enrollment speaker features, excluding the dropped pre-enrolled enrollment speaker feature. 4. The method of claim 1 , wherein the dynamic matching comprises: assigning a weight to input speaker features having discriminable speaker sections being greater than or equal to a threshold value; and dynamically matching the weight-assigned input speaker features to the pre-enrolled enrolled speaker features. 5. The method of claim 1 , wherein the dynamic matching comprises: dropping an input speaker feature corresponding to a short pause among the input speaker features; and dynamically matching remaining input features, excluding the dropped input speaker feature, to the pre-enrolled enrolled speaker features. 6. The method of claim 1 , wherein the dynamic matching comprises: aligning the pre-enrolled enrolled speaker features representing phonemes identical to phonemes represented by the input speaker features; and dynamically matching the input speaker features to the aligned pre-enrolled enrolled speaker features. 7. The method of claim 1 , wherein the performing comprises: outputting a distance corresponding to the input speech by accumulating results of the dynamic matching; and performing the speaker authentication based on a result of comparing the distance to a threshold value. 8. The method of claim 1 , wherein the extracting comprises extracting the input speaker features based on per-frequency energies of the plurality of frames. 9. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the speaker authentication method of claim 1 . 10. A speaker authentication apparatus, comprising: a communication interface configured to receive a plurality of frames corresponding to an input speech; and a processor configured to: extract input speaker features corresponding to the plurality of frames; estimate discriminable speaker sections corresponding to the plurality of frames; dynamically match the input speaker features to pre-enrolled enrolled speaker features based on the discriminable speaker sections; and perform a speaker authentication based on a result of the dynamic matching, wherein, for the dynamic matching, the processor is configured to: assign a first weight to an input speaker feature corresponding to a pre-determined short pause among the input speaker features; assign a second weight to an input speaker feature corresponding to a speech among the input speaker features; and dynamically match each of the first weight-assigned input speaker feature and the second weight-assigned input speaker feature to the pre-enrolled enrolled speaker features. 11. The apparatus of claim 10 , wherein the processor is configured to select input speaker features having discriminable speaker sections greater than or equal to a threshold value, and dynamically match the selected input speaker features to the pre-enrolled enrolled speaker features. 12. The apparatus of claim 10 , wherein the processor is configured to drop an input speaker feature having a discriminable speaker section less than a threshold value, and dynamically match remaining input speaker features, excluding the dropped input speaker feature, to the pre-enrolled enrolled speaker features. 13. The apparatus of claim 10 , wherein the processor is configured to assign a weight to input speaker features having discriminable speaker sections greater than or equal to a threshold value, and dynamically match the weight-assigned input speaker features to the pre-enrolled enrolled speaker features. 14. A speaker authentication method, comprising: extracting input speaker features corresponding to speech frames; determining discriminable speaker sections in each of the speech frames; dynamically matching select input speaker features, of the extracted input speaker features, to pre-enrolled enrolled speaker features based on the discriminable speaker sections satisfying a criteria; and authenticating a speaker based on the dynamically matched input speaker features, wherein the dynamic matching comprises: assigning a first weight to an input speaker feature corresponding to a pre-determined short pause among the input speaker features; assigning a second weight to an input speaker feature corresponding to a speech among the input speaker features; and dynamically matching each of the first weight-assigned input speaker feature and the second weight-assigned input speaker feature to the pre-enrolled enrolled speaker features. 15. The method of claim 14 , wherein the input speaker features correspond to phonemes and the discriminable speaker sections comprise of voiced sounds. 16. The method of claim 15 , wherein the criteria is satisfied when a discriminable speaker section of the discriminable speaker sections is greater than or equal to a threshold value. 17. The method of claim 15 , wherein the criteria is determined based on comparisons of relative weights applied to the discriminable speaker sections.

Assignees

Inventors

Classifications

  • G10L17/02Primary

    Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11367451B2 cover?
A speaker authentication method and apparatus may extract input speaker features corresponding to a plurality of frames of an input speech of an object, estimate discriminable speaker sections corresponding to the plurality of frames, and dynamically match the input speaker features to pre-enrolled enrolled speaker features based on the discriminable speaker section.
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L17/02. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 21 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).