What technology area does this patent fall under?

Primary CPC classification G10L17/02. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 21 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and apparatus with speaker authentication and/or training

US11367451B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11367451-B2
Application number	US-201916519757-A
Country	US
Kind code	B2
Filing date	Jul 23, 2019
Priority date	Aug 27, 2018
Publication date	Jun 21, 2022
Grant date	Jun 21, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speaker authentication method and apparatus may extract input speaker features corresponding to a plurality of frames of an input speech of an object, estimate discriminable speaker sections corresponding to the plurality of frames, and dynamically match the input speaker features to pre-enrolled enrolled speaker features based on the discriminable speaker section.

First claim

Opening claim text (preview).

What is claimed is: 1. A speaker authentication method, comprising: receiving a plurality of frames corresponding to an input speech; extracting input speaker features corresponding to the plurality of frames; estimating discriminable speaker sections corresponding to the plurality of frames; dynamically matching the input speaker features to pre-enrolled enrolled speaker features based on the discriminable speaker sections; and performing a speaker authentication based on a result of the dynamic matching, wherein the dynamic matching comprises: assigning a first weight to an input speaker feature corresponding to a pre-determined short pause among the input speaker features; assigning a second weight to an input speaker feature corresponding to a speech among the input speaker features; and dynamically matching each of the first weight-assigned input speaker feature and the second weight-assigned input speaker feature to the pre-enrolled enrolled speaker features. 2. The method of claim 1 , wherein the dynamic matching comprises: selecting input speaker features having discriminable speaker sections greater than or equal to a threshold value; and dynamically matching the selected input speaker features to the pre-enrolled enrolled speaker features. 3. The method of claim 1 , wherein the dynamic matching comprises: dropping an input speaker feature having a discriminable speaker section less than a threshold value; dropping a pre-enrolled enrolled speaker feature corresponding to the dropped input speaker feature; and dynamically matching remaining input speaker features, excluding the dropped input speaker feature, to remaining enrollment speaker features, excluding the dropped pre-enrolled enrollment speaker feature. 4. The method of claim 1 , wherein the dynamic matching comprises: assigning a weight to input speaker features having discriminable speaker sections being greater than or equal to a threshold value; and dynamically matching the weight-assigned input speaker features to the pre-enrolled enrolled speaker features. 5. The method of claim 1 , wherein the dynamic matching comprises: dropping an input speaker feature corresponding to a short pause among the input speaker features; and dynamically matching remaining input features, excluding the dropped input speaker feature, to the pre-enrolled enrolled speaker features. 6. The method of claim 1 , wherein the dynamic matching comprises: aligning the pre-enrolled enrolled speaker features representing phonemes identical to phonemes represented by the input speaker features; and dynamically matching the input speaker features to the aligned pre-enrolled enrolled speaker features. 7. The method of claim 1 , wherein the performing comprises: outputting a distance corresponding to the input speech by accumulating results of the dynamic matching; and performing the speaker authentication based on a result of comparing the distance to a threshold value. 8. The method of claim 1 , wherein the extracting comprises extracting the input speaker features based on per-frequency energies of the plurality of frames. 9. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the speaker authentication method of claim 1 . 10. A speaker authentication apparatus, comprising: a communication interface configured to receive a plurality of frames corresponding to an input speech; and a processor configured to: extract input speaker features corresponding to the plurality of frames; estimate discriminable speaker sections corresponding to the plurality of frames; dynamically match the input speaker features to pre-enrolled enrolled speaker features based on the discriminable speaker sections; and perform a speaker authentication based on a result of the dynamic matching, wherein, for the dynamic matching, the processor is configured to: assign a first weight to an input speaker feature corresponding to a pre-determined short pause among the input speaker features; assign a second weight to an input speaker feature corresponding to a speech among the input speaker features; and dynamically match each of the first weight-assigned input speaker feature and the second weight-assigned input speaker feature to the pre-enrolled enrolled speaker features. 11. The apparatus of claim 10 , wherein the processor is configured to select input speaker features having discriminable speaker sections greater than or equal to a threshold value, and dynamically match the selected input speaker features to the pre-enrolled enrolled speaker features. 12. The apparatus of claim 10 , wherein the processor is configured to drop an input speaker feature having a discriminable speaker section less than a threshold value, and dynamically match remaining input speaker features, excluding the dropped input speaker feature, to the pre-enrolled enrolled speaker features. 13. The apparatus of claim 10 , wherein the processor is configured to assign a weight to input speaker features having discriminable speaker sections greater than or equal to a threshold value, and dynamically match the weight-assigned input speaker features to the pre-enrolled enrolled speaker features. 14. A speaker authentication method, comprising: extracting input speaker features corresponding to speech frames; determining discriminable speaker sections in each of the speech frames; dynamically matching select input speaker features, of the extracted input speaker features, to pre-enrolled enrolled speaker features based on the discriminable speaker sections satisfying a criteria; and authenticating a speaker based on the dynamically matched input speaker features, wherein the dynamic matching comprises: assigning a first weight to an input speaker feature corresponding to a pre-determined short pause among the input speaker features; assigning a second weight to an input speaker feature corresponding to a speech among the input speaker features; and dynamically matching each of the first weight-assigned input speaker feature and the second weight-assigned input speaker feature to the pre-enrolled enrolled speaker features. 15. The method of claim 14 , wherein the input speaker features correspond to phonemes and the discriminable speaker sections comprise of voiced sounds. 16. The method of claim 15 , wherein the criteria is satisfied when a discriminable speaker section of the discriminable speaker sections is greater than or equal to a threshold value. 17. The method of claim 15 , wherein the criteria is determined based on comparisons of relative weights applied to the discriminable speaker sections.

Assignees

Samsung Electronics Co Ltd

Inventors

Classifications

G10L17/02Primary
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

View patent family 69586559

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11367451B2 cover?: A speaker authentication method and apparatus may extract input speaker features corresponding to a plurality of frames of an input speech of an object, estimate discriminable speaker sections corresponding to the plurality of frames, and dynamically match the input speaker features to pre-enrolled enrolled speaker features based on the discriminable speaker section.
Who is the assignee on this patent?: Samsung Electronics Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L17/02. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 21 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Speaker identification

Managing silence in audio signal identification

Neural Networks For Speaker Verification

Speaker change detection device and speaker change detection method

Speech recognition assisted evaluation on text-to-speech pronunciation issue detection

Speaker verification

Mimicking user speech patterns

Frequently asked questions