User speech profile management
US-2022180859-A1 · Jun 9, 2022 · US
US11929077B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11929077-B2 |
| Application number | US-202017131702-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 22, 2020 |
| Priority date | Dec 23, 2019 |
| Publication date | Mar 12, 2024 |
| Grant date | Mar 12, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of systems and methods for user enrollment in speaker authentication and speaker identification systems are disclosed. In some embodiments, the enrollment process includes collecting speech samples that are examples of multiple speech types spoken by a user, computing a speech representation for each speech sample, and aggregating the example speech representations to form a robust overall representation or user voiceprint of the user's speech.
Opening claim text (preview).
What is claimed is: 1. A method for enrolling a user in a multi-stage enrollment voice authentication or identification system, comprising: capturing a plurality of speech samples of the user's speech to obtain user speech samples'; categorizing the plurality of speech samples into the following sentence types: (1) a declarative sentence; (2) an imperative sentence; (3) an interrogative sentence; (4) an exclamatory sentence; computing feature-space representations for each of the sentence types, wherein a boundary is determined for the feature-space representations for each of the sentence types; generating a user enrollment voiceprint by aggregating the feature-space representations, wherein the aggregated feature-space representations include at least two different sentence types, wherein the aggregated feature-space representations have a modified boundary; associating the user enrollment voiceprint with user information; and storing the user enrollment voiceprint and associated user information in a database of enrolled users. 2. The method of claim 1 , wherein the plurality of speech samples includes a free-response example wherein the user is not reading aloud a specified sentence. 3. The method of claim 1 , further comprising: capturing a speech sample of the user requesting authentication or identification; generating a voiceprint for the requesting user; determining whether a requesting user voiceprint matches any enrolled-user voiceprints in the database of enrolled users. 4. The method of claim 3 , wherein determining whether the requesting user voiceprint matches any enrolled-user voiceprints in the database of enrolled users further comprises: computing correlations between a requesting user voiceprint vector and enrolled-user voiceprint vectors; and comparing the correlations to a threshold. 5. The method of claim 4 , further comprising: determining that the requesting user voiceprint and at least one of the enrolled-user voiceprints is a match if one of the correlations exceeds the threshold. 6. The method of claim 5 , further comprising: determining that two or more of the correlations exceeds the threshold; and determining that a one of the enrolled-user voiceprints having a maximum correlation is a match to the requesting user voiceprint. 7. The method of claim 4 , further comprising: determining that no match exists between the requesting user voiceprint and the enrolled-user voiceprints if none of the correlations exceeds the threshold. 8. A method for matching a speaker in a multi-stage enrollment voice authentication or identification system, comprising: capturing a plurality of speech samples from the speaker during an enrollment process, categorizing the plurality of speech samples into the following sentence types: (1) a declarative sentence; (2) an imperative sentence; (3) an interrogative sentence; (4) an exclamatory sentence; computing feature-space representations for each of the sentence types, wherein a boundary is determined for the feature-space representations for each of the sentence types; generating an enrollment voiceprint for the speaker by aggregating the features-space representations using at least two different categorized sentence types to obtain a speaker enrollment voiceprint, wherein the aggregated feature-space representations have a modified boundary; associating the speaker enrollment voiceprint with information about the speaker gathered during enrollment; inputting an input audio signal containing speech samples of the speaker; determining a target speech signal by identifying and retaining segments of the input audio signal that contain speech and identifying and discarding segments of the input audio signal which do not contain speech; computing an authentication voiceprint for the speaker from the target speech signal and comparing the speaker authentication voiceprint to a set of one or more enrolled-user voiceprints corresponding to one or more enrolled users to obtain a comparison; making an output determination based on the comparison to determine whether the speaker is a match with an enrolled user; and making a decision based on whether the speaker is a match. 9. The method of claim 8 , wherein the set of one or more enrolled-user voiceprints includes the speaker enrollment voiceprint. 10. The method of claim 9 , further comprising: determining that the comparison is a match between the speaker authentication voiceprint and the speaker enrollment voiceprint; and wherein making a decision based on whether the speaker is a match further comprises making the decision based on there being a positive match. 11. The method of claim 8 , wherein computing an authentication voiceprint for the speaker from the target speech signal further comprises providing the target speech signal as input to a processing system which includes a deep neural network (DNN). 12. The method of claim 11 , wherein the deep neural network is configured to compute a vector representation of the target speech signal and wherein the vector representation of the target speech signal is the speaker authentication voiceprint. 13. The method of claim 11 , further comprising normalizing the output of the deep neural network such that the vector representation has unit norm. 14. The method of claim 8 , wherein comparing the speaker authentication voiceprint to a set of one or more enrolled-user voiceprints corresponding to one or more enrolled users further comprises computing a correlation between a vector representation of the speaker authentication voiceprint and vector representations of each of the enrolled-user voiceprints. 15. The method of claim 14 , further comprising: determining that the correlation exceeds a threshold; and determining that the speaker authentication voiceprint is a match with at least one of the enrolled-user voiceprints for which the correlation exceeds a threshold. 16. The method of claim 14 , further comprising: determining that the speaker is positively authenticated as an enrolled user; and making a decision based on the authentication of the speaker. 17. The method of claim 14 , further comprising: determining that the speaker is positively identified as an enrolled user; and making a decision based on the identity of the speaker.
Training, enrolment or model building · CPC title
using biometric data, e.g. fingerprints, iris scans or voiceprints · CPC title
the user being prompted to utter a password or a predefined phrase · CPC title
Use of distortion metrics or a particular distance between probe pattern and reference templates · CPC title
Program or device authentication · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.