What technology area does this patent fall under?

Primary CPC classification H04R3/005. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Jan 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Shared speech processing network for multiple speech applications

US12200450B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12200450-B2
Application number	US-202318324622-A
Country	US
Kind code	B2
Filing date	May 26, 2023
Priority date	Apr 9, 2020
Publication date	Jan 14, 2025
Grant date	Jan 14, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device to process speech includes a speech processing network that includes an input configured to receive audio data. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules.

First claim

Opening claim text (preview).

What is claimed is: 1. A device to process speech, comprising: a speech processing network comprising: an input configured to receive audio data; one or more network layers configured to process the audio data to generate a network output; and an output configured to be coupled to multiple speech application modules to enable the network output to be provided as a common input to each of the multiple speech application modules. 2. The device of claim 1 , wherein a first speech application module corresponding to a speaker verifier is coupled to the output and configured to generate a speaker verification output based on the network output, and wherein the speaker verification output is indicative of whether the audio data corresponds to audio that exhibits voice characteristics matching a particular speaker. 3. The device of claim 2 , wherein a second speech application module corresponding to a speech recognition network is coupled to the output and configured to generate a speech recognition output based on the network output, and wherein the speech recognition output includes text corresponding to detected speech. 4. The device of claim 1 , wherein the multiple speech application modules include a voice activation detector coupled to the output. 5. The device of claim 1 , wherein the multiple speech application modules include a speaker recognizer coupled to the output. 6. The device of claim 1 , wherein the multiple speech application modules include a speech enhancer coupled to the output. 7. The device of claim 1 , wherein the one or more network layers are trained based on at least a first performance metric associated with a first speech application module and a second performance metric associated with a second speech application module. 8. The device of claim 7 , wherein the one or more network layers are trained responsive to a combined performance metric that corresponds to a combination of at least the first performance metric and the second performance metric. 9. The device of claim 1 , wherein the speech processing network is implemented in an application-specific integrated circuit (ASIC), wherein at least one speech application module of the multiple speech application modules is external to the ASIC, and wherein the output of the speech processing network is coupled to a chip interface to enable the network output to be provided to the at least one speech application module that is external to the ASIC. 10. The device of claim 1 , wherein the input is configured to synchronously receive multiple frames of the audio data, and wherein the speech processing network is configured to process the multiple frames to generate a single frame of the network output. 11. The device of claim 10 , wherein the audio data corresponds to audio captured by multiple microphones, and wherein each of the multiple frames is from a respective microphone of the multiple microphones. 12. The device of claim 1 , wherein the speech processing network is included in a vehicle. 13. The device of claim 1 , wherein the speech processing network is implemented in an audio device, and wherein the audio device includes a wireless speaker and voice activated device with an integrated assistant application. 14. The device of claim 1 , further comprising: an antenna; and a transceiver coupled to the antenna and configured to receive the audio data via wireless transmission. 15. The device of claim 14 , wherein the speech processing network, the antenna, and the transceiver are integrated into a mobile device. 16. A method of speech processing, comprising: receiving audio data at an input of a speech processing network; processing, at the speech processing network, the audio data using one or more network layers to generate a network output; and providing the network output at an output of the speech processing network to enable the network output to be accessible as a common input to multiple speech application modules. 17. The method of claim 16 , further comprising generating, at a first speech application module, a speaker verification output based on the network output and indicative of whether the audio data corresponds to audio that exhibits voice characteristics matching a particular speaker. 18. The method of claim 17 , further comprising generating, at a second speech application module, a speech recognition output based on the network output and including text corresponding to detected speech. 19. The method of claim 16 , further comprising providing the network output to a speech application module to generate at least one of: a voice activation output; a speaker recognition output; or an enhanced speech output. 20. The method of claim 16 , further comprising training the speech processing network based on at least a first performance metric associated with a first speech application module and a second performance metric associated with a second speech application module. 21. The method of claim 20 , wherein the speech processing network is trained responsive to a combined performance metric that corresponds to a combination of at least the first performance metric and the second performance metric. 22. The method of claim 16 , wherein the network output is provided to a chip interface of an application-specific integrated circuit (ASIC) to enable the network output to be provided to at least one speech application module of the multiple speech application modules that is external to the ASIC. 23. The method of claim 16 , wherein the audio data is received as multiple frames that are synchronously received, and wherein the speech processing network processes the multiple frames to generate a single frame of the network output. 24. The method of claim 23 , wherein the audio data corresponds to audio captured by multiple microphones, and wherein each of the multiple frames is from a respective microphone of the multiple microphones. 25. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to: receive audio data at an input of a speech processing network; process, at the speech processing network, the audio data using one or more network layers to generate a network output; and provide the network output at an output of the speech processing network to enable the network output to be accessible as a common input to multiple speech application modules. 26. The non-transitory computer-readable medium of claim 25 , wherein execution of the instructions further causes the one or more processors to: provide the network output to a first speech application module to generate a speaker verification output based on the network output; and provide the network output to a second speech application module to generate a speech recognition output based on the network output. 27. The non-transitory computer-readable medium of claim 25 , wherein execution of the instructions further causes the one or more processors to train the speech processing network based on at least a first performance metric associated with a first speech application module and a second performance metric associated with a second speech application module. 28. The non-transitory computer-readable medium of claim 27 , wherein the speech processing network is trained responsive to a combined performance m

Assignees

Qualcomm Inc

Inventors

Classifications

G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/09
Supervised learning · CPC title
G06V20/20
in augmented reality scenes · CPC title
G06V10/82
using neural networks · CPC title
G06F18/217
Validation; Performance evaluation; Active pattern learning techniques · CPC title

Patent family

Related publications grouped by family.

View patent family 78006071

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12200450B2 cover?: A device to process speech includes a speech processing network that includes an input configured to receive audio data. The speech processing network also includes one or more network layers configured to process the audio data to generate a network output. The speech processing network includes an output configured to be coupled to multiple speech application modules to enable the network out…
Who is the assignee on this patent?: Qualcomm Inc
What technology area does this patent fall under?: Primary CPC classification H04R3/005. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Jan 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).