Electronic device control method and apparatus
US-2023410806-A1 · Dec 21, 2023 · US
US2024029743A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024029743-A1 |
| Application number | US-202318206231-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 6, 2023 |
| Priority date | Jun 29, 2021 |
| Publication date | Jan 25, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Some speech processing systems may handle some commands on-device rather than sending the audio data to a second device or system for processing. The first device may have limited speech processing capabilities sufficient for handling common language and/or commands, while the second device (e.g., an edge device and/or a remote system) may call on additional language models, entity libraries, skill components, etc. to perform additional tasks. An intermediate data generator may facilitate dividing speech processing operations between devices by generating a stream of data that includes a first-pass ASR output (e.g., a word or sub-word lattice) and other characteristics of the audio data such as whisper detection, speaker identification, media signatures, etc. The second device can perform the additional processing using the data stream; e.g., without using the audio data. Thus, privacy may be enhanced by processing the audio data locally without sending it to other devices/systems.
Opening claim text (preview).
1 .- 20 . (canceled) 21 . A computer-implemented method, comprising: receiving first audio data representing a first portion of an utterance; performing first automatic speech recognition (ASR) processing on the first audio data using a first ASR model of a first device to generate first encoded data representing a possible transcription of the first portion of the utterance; sending the first encoded data to a second device; performing second ASR processing on the first encoded data using a second ASR model of the second device to determine a first ASR hypothesis corresponding to the first portion of the utterance, wherein the second ASR model is different from the first ASR model; and based at least in part on the first ASR hypothesis, causing an action to be performed responsive to the utterance. 22 . The computer-implemented method of claim 21 , wherein the first encoded data represents lattice data corresponding to the first ASR processing. 23 . The computer-implemented method of claim 21 , wherein the second ASR model corresponds to at least one command executable by the first device. 24 . The computer-implemented method of claim 21 , further comprising: processing the first audio data to identify one or more characteristics of the first audio data; and sending second data representing the one or more characteristics to the second device. 25 . The computer-implemented method of claim 24 , wherein the second ASR processing is based at least in part on the second data. 26 . The computer-implemented method of claim 24 , wherein the second data represents an identifier corresponding to a speaker of the utterance. 27 . The computer-implemented method of claim 24 , wherein the second data represents an identifier corresponding to media detected being output in an environment corresponding to the utterance. 28 . The computer-implemented method of claim 21 , further comprising: processing the first encoded data by the second device to determine output data corresponding to whether the utterance was device directed. 29 . The computer-implemented method of claim 21 , further comprising: determining a directive corresponding to the first ASR hypothesis, wherein causing the action to be performed comprises sending the directive to the first device. 30 . The computer-implemented method of claim 21 , wherein the second device performs the second ASR processing without receiving the first audio data. 31 . A system comprising: at least one processor; and at least one memory comprising instructions that, when executed by the at least one processor, cause the system to: receive first audio data representing a first portion of an utterance; perform first automatic speech recognition (ASR) processing on the first audio data using a first ASR model of a first device to generate first encoded data representing a possible transcription of the first portion of the utterance; send the first encoded data to a second device; perform second ASR processing on the first encoded data using a second ASR model of the second device to determine a first ASR hypothesis corresponding to the first portion of the utterance, wherein the second ASR model is different from the first ASR model; and based at least in part on the first ASR hypothesis, cause an action to be performed responsive to the utterance. 32 . The system of claim 31 , wherein the first encoded data represents lattice data corresponding to the first ASR processing. 33 . The system of claim 31 , wherein the second ASR model corresponds to at least one command executable by the first device. 34 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: process the first audio data to identify one or more characteristics of the first audio data; and send second data representing the one or more characteristics to the second device. 35 . The system of claim 34 , wherein the second ASR processing is based at least in part on the second data. 36 . The system of claim 34 , wherein the second data represents an identifier corresponding to a speaker of the utterance. 37 . The system of claim 34 , wherein the second data represents an identifier corresponding to media detected being output in an environment corresponding to the utterance. 38 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: process the first encoded data by the second device to determine output data corresponding to whether the utterance was device directed. 39 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine a directive corresponding to the first ASR hypothesis, wherein the instructions that cause the action to be performed comprise instructions that, when executed by the at least one processor, cause the system to send the directive to the first device. 40 . The system of claim 31 , wherein the second device performs the second ASR processing without receiving the first audio data.
Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices · CPC title
using context dependencies, e.g. language models · CPC title
Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.