What technology area does this patent fall under?

Primary CPC classification G10L15/07. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 11 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

User speech profile management

US11626104B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11626104-B2
Application number	US-202017115158-A
Country	US
Kind code	B2
Filing date	Dec 8, 2020
Priority date	Dec 8, 2020
Publication date	Apr 11, 2023
Grant date	Apr 11, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device includes processors configured to determine, in a first power mode, whether an audio stream corresponds to speech of at least two talkers. The processors are configured to, based on determining that the audio stream corresponds to speech of at least two talkers, analyze, in a second power mode, audio feature data of the audio stream to generate a segmentation result. The processors are configured to perform a comparison of a plurality of user speech profiles to an audio feature data set of a plurality of audio feature data sets of a talker-homogenous audio segment to determine whether the audio feature data set matches any of the user speech profiles. The processors are configured to, based on determining that the audio feature data set does not match any of the plurality of user speech profiles, generate a user speech profile based on the plurality of audio feature data sets.

First claim

Opening claim text (preview).

What is claimed is: 1. A device for audio analysis comprising: a memory configured to store a plurality of user speech profiles of a plurality of users; and one or more processors configured to: determine, in a first power mode, whether an audio stream corresponds to speech of at least two distinct talkers; based on determining that the audio stream corresponds to speech of at least two distinct talkers, analyze, in a second power mode, audio feature data of the audio stream to generate a segmentation result using one or more machine-learning segmentation models that are trained to perform speaker segmentation, the segmentation result indicating talker-homogenous audio segments of the audio stream; perform a comparison of the plurality of user speech profiles to a first audio feature data set of a first plurality of audio feature data sets of a first talker-homogenous audio segment to determine whether the first audio feature data set matches any of the plurality of user speech profiles; and based on determining that the first audio feature data set does not match any of the plurality of user speech profiles: store the first audio feature data set in a first enrollment buffer associated with a first talker; store subsequent audio feature data sets corresponding to speech of the first talker in the first enrollment buffer until a stop condition is satisfied, wherein the first plurality of audio feature data sets of the first talker-homogenous audio segment includes the first audio feature data set and the subsequent audio feature data sets; generate a first user speech profile based on the first plurality of audio feature data sets; and add the first user speech profile to the plurality of user speech profiles. 2. The device of claim 1 , wherein the first audio feature data set includes a first audio feature vector. 3. The device of claim 1 , wherein the one or more processors are configured to analyze the audio feature data by applying a speaker segmentation neural network to the audio feature data. 4. The device of claim 1 , wherein the one or more processors are configured to determine that the stop condition is satisfied in response to determining that longer than threshold silence is detected in the audio stream. 5. The device of claim 1 , wherein the one or more processors are configured to add a particular audio feature data set to the first enrollment buffer based at least in part on determining that the particular audio feature data set corresponds to speech of a single talker, wherein the single talker includes the first talker. 6. The device of claim 1 , wherein the one or more processors are configured to, based on determining that a count of the first plurality of audio feature data sets of the first talker-homogenous audio segment stored in the first enrollment buffer is greater than an enrollment threshold, generate the first user speech profile based on the first plurality of audio feature data sets. 7. The device of claim 1 , wherein the one or more processors are configured to, based on determining that the first audio feature data set matches a particular user speech profile, update the particular user speech profile based on the first audio feature data set. 8. The device of claim 7 , wherein the one or more processors are configured to, based at least in part on determining that the first audio feature data set corresponds to speech of a single talker, update the particular user speech profile based on the first audio feature data set. 9. The device of claim 1 , wherein the one or more processors are configured to determine whether a second audio feature data set of a second plurality of audio feature data sets of a second talker-homogenous audio segment matches any of the plurality of user speech profiles. 10. The device of claim 9 , wherein the one or more processors are configured to, based on determining that the second audio feature data set does not match any of the plurality of user speech profiles: generate a second user speech profile based on the second plurality of audio feature data sets; and add the second user speech profile to the plurality of user speech profiles. 11. The device of claim 9 , wherein the one or more processors are configured to, based on determining that the second audio feature data set matches a particular user speech profile of the plurality of user speech profiles, update the particular user speech profile based on the second audio feature data set. 12. The device of claim 1 , wherein the memory is configured to store profile update data, and wherein the one or more processors are configured to: in response to generating the first user speech profile, update the profile update data to indicate that the first user speech profile is updated; and based on determining that the profile update data indicates that a first count of the plurality of user speech profiles have been updated, output the first count as a count of talkers detected in the audio stream. 13. The device of claim 1 , wherein the memory is configured to store user interaction data, and wherein the one or more processors are configured to: in response to generating the first user speech profile, update the user interaction data based on a speech duration of the first talker-homogenous audio segment to indicate that a first user associated with the first user speech profile interacted for the speech duration; and output at least the user interaction data. 14. The device of claim 1 , wherein the first power mode is a lower power mode as compared to the second power mode. 15. The device of claim 1 , wherein the one or more processors are configured to: determine, in the first power mode, audio information of the audio stream, the audio information including a count of talkers detected in the audio stream, voice activity detection (VAD) information, or both; activate one or more audio analysis applications in the second power mode; and provide the audio information to one or more audio analysis applications. 16. The device of claim 1 , wherein the one or more processors are configured to, in response to determining that the segmentation result indicates that one or more second audio segments of the audio stream correspond to multiple talkers, refrain from updating the plurality of user speech profiles based on the one or more second audio segments. 17. A method of audio analysis comprising: determining, in a first power mode at a device, whether an audio stream corresponds to speech of at least two distinct talkers; based on determining that the audio stream corresponds to speech of at least two distinct talkers, analyzing, in a second power mode, audio feature data of the audio stream to generate a segmentation result using one or more machine-learning segmentation models that are trained to perform speaker segmentation, the segmentation result indicating talker-homogenous audio segments of the audio stream; performing, at the device, a comparison of a plurality of user speech profiles to a first audio feature data set of a first plurality of audio feature data sets of a first talker-homogenous audio segment to determine whether the first audio feature data set matches any of the plurality of user speech profiles; and based on determining that the first audio feature data set does not match any of the plurality of user speech profiles: storing the first audio feature data set in a first enrollment buffer associated with a first talker; storing subsequent audio feature data sets corresponding to speech of the first talker in the first enro

Assignees

Qualcomm Inc

Inventors

Classifications

G10L15/07Primary
to the speaker · CPC title
G10L15/16
using artificial neural networks · CPC title
G10L17/04
Training, enrolment or model building · CPC title
G10L21/0272
Voice signal separating · CPC title
G10L17/00Primary
Speaker identification or verification techniques · CPC title

Patent family

Related publications grouped by family.

View patent family 78303075

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11626104B2 cover?: A device includes processors configured to determine, in a first power mode, whether an audio stream corresponds to speech of at least two talkers. The processors are configured to, based on determining that the audio stream corresponds to speech of at least two talkers, analyze, in a second power mode, audio feature data of the audio stream to generate a segmentation result. The processors are…
Who is the assignee on this patent?: Qualcomm Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/07. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 11 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Dynamic speech output configuration

Automated meeting minutes generation service

Voice-Controlled Management of User Profiles

Real-time class recognition for an audio stream

Voice profile management and speech signal generation

Frequently asked questions