Multi-stage speaker enrollment in voice authentication and identification

US11929077B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11929077-B2
Application numberUS-202017131702-A
CountryUS
Kind codeB2
Filing dateDec 22, 2020
Priority dateDec 23, 2019
Publication dateMar 12, 2024
Grant dateMar 12, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of systems and methods for user enrollment in speaker authentication and speaker identification systems are disclosed. In some embodiments, the enrollment process includes collecting speech samples that are examples of multiple speech types spoken by a user, computing a speech representation for each speech sample, and aggregating the example speech representations to form a robust overall representation or user voiceprint of the user's speech.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for enrolling a user in a multi-stage enrollment voice authentication or identification system, comprising: capturing a plurality of speech samples of the user's speech to obtain user speech samples'; categorizing the plurality of speech samples into the following sentence types: (1) a declarative sentence; (2) an imperative sentence; (3) an interrogative sentence; (4) an exclamatory sentence; computing feature-space representations for each of the sentence types, wherein a boundary is determined for the feature-space representations for each of the sentence types; generating a user enrollment voiceprint by aggregating the feature-space representations, wherein the aggregated feature-space representations include at least two different sentence types, wherein the aggregated feature-space representations have a modified boundary; associating the user enrollment voiceprint with user information; and storing the user enrollment voiceprint and associated user information in a database of enrolled users. 2. The method of claim 1 , wherein the plurality of speech samples includes a free-response example wherein the user is not reading aloud a specified sentence. 3. The method of claim 1 , further comprising: capturing a speech sample of the user requesting authentication or identification; generating a voiceprint for the requesting user; determining whether a requesting user voiceprint matches any enrolled-user voiceprints in the database of enrolled users. 4. The method of claim 3 , wherein determining whether the requesting user voiceprint matches any enrolled-user voiceprints in the database of enrolled users further comprises: computing correlations between a requesting user voiceprint vector and enrolled-user voiceprint vectors; and comparing the correlations to a threshold. 5. The method of claim 4 , further comprising: determining that the requesting user voiceprint and at least one of the enrolled-user voiceprints is a match if one of the correlations exceeds the threshold. 6. The method of claim 5 , further comprising: determining that two or more of the correlations exceeds the threshold; and determining that a one of the enrolled-user voiceprints having a maximum correlation is a match to the requesting user voiceprint. 7. The method of claim 4 , further comprising: determining that no match exists between the requesting user voiceprint and the enrolled-user voiceprints if none of the correlations exceeds the threshold. 8. A method for matching a speaker in a multi-stage enrollment voice authentication or identification system, comprising: capturing a plurality of speech samples from the speaker during an enrollment process, categorizing the plurality of speech samples into the following sentence types: (1) a declarative sentence; (2) an imperative sentence; (3) an interrogative sentence; (4) an exclamatory sentence; computing feature-space representations for each of the sentence types, wherein a boundary is determined for the feature-space representations for each of the sentence types; generating an enrollment voiceprint for the speaker by aggregating the features-space representations using at least two different categorized sentence types to obtain a speaker enrollment voiceprint, wherein the aggregated feature-space representations have a modified boundary; associating the speaker enrollment voiceprint with information about the speaker gathered during enrollment; inputting an input audio signal containing speech samples of the speaker; determining a target speech signal by identifying and retaining segments of the input audio signal that contain speech and identifying and discarding segments of the input audio signal which do not contain speech; computing an authentication voiceprint for the speaker from the target speech signal and comparing the speaker authentication voiceprint to a set of one or more enrolled-user voiceprints corresponding to one or more enrolled users to obtain a comparison; making an output determination based on the comparison to determine whether the speaker is a match with an enrolled user; and making a decision based on whether the speaker is a match. 9. The method of claim 8 , wherein the set of one or more enrolled-user voiceprints includes the speaker enrollment voiceprint. 10. The method of claim 9 , further comprising: determining that the comparison is a match between the speaker authentication voiceprint and the speaker enrollment voiceprint; and wherein making a decision based on whether the speaker is a match further comprises making the decision based on there being a positive match. 11. The method of claim 8 , wherein computing an authentication voiceprint for the speaker from the target speech signal further comprises providing the target speech signal as input to a processing system which includes a deep neural network (DNN). 12. The method of claim 11 , wherein the deep neural network is configured to compute a vector representation of the target speech signal and wherein the vector representation of the target speech signal is the speaker authentication voiceprint. 13. The method of claim 11 , further comprising normalizing the output of the deep neural network such that the vector representation has unit norm. 14. The method of claim 8 , wherein comparing the speaker authentication voiceprint to a set of one or more enrolled-user voiceprints corresponding to one or more enrolled users further comprises computing a correlation between a vector representation of the speaker authentication voiceprint and vector representations of each of the enrolled-user voiceprints. 15. The method of claim 14 , further comprising: determining that the correlation exceeds a threshold; and determining that the speaker authentication voiceprint is a match with at least one of the enrolled-user voiceprints for which the correlation exceeds a threshold. 16. The method of claim 14 , further comprising: determining that the speaker is positively authenticated as an enrolled user; and making a decision based on the authentication of the speaker. 17. The method of claim 14 , further comprising: determining that the speaker is positively identified as an enrolled user; and making a decision based on the identity of the speaker.

Assignees

Inventors

Classifications

  • G10L17/04Primary

    Training, enrolment or model building · CPC title

  • using biometric data, e.g. fingerprints, iris scans or voiceprints · CPC title

  • the user being prompted to utter a password or a predefined phrase · CPC title

  • Use of distortion metrics or a particular distance between probe pattern and reference templates · CPC title

  • Program or device authentication · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11929077B2 cover?
Embodiments of systems and methods for user enrollment in speaker authentication and speaker identification systems are disclosed. In some embodiments, the enrollment process includes collecting speech samples that are examples of multiple speech types spoken by a user, computing a speech representation for each speech sample, and aggregating the example speech representations to form a robust …
Who is the assignee on this patent?
Dts Inc
What technology area does this patent fall under?
Primary CPC classification G10L17/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 12 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).