Intermediate data for inter-device speech processing

US2024029743A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024029743-A1
Application numberUS-202318206231-A
CountryUS
Kind codeA1
Filing dateJun 6, 2023
Priority dateJun 29, 2021
Publication dateJan 25, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Some speech processing systems may handle some commands on-device rather than sending the audio data to a second device or system for processing. The first device may have limited speech processing capabilities sufficient for handling common language and/or commands, while the second device (e.g., an edge device and/or a remote system) may call on additional language models, entity libraries, skill components, etc. to perform additional tasks. An intermediate data generator may facilitate dividing speech processing operations between devices by generating a stream of data that includes a first-pass ASR output (e.g., a word or sub-word lattice) and other characteristics of the audio data such as whisper detection, speaker identification, media signatures, etc. The second device can perform the additional processing using the data stream; e.g., without using the audio data. Thus, privacy may be enhanced by processing the audio data locally without sending it to other devices/systems.

First claim

Opening claim text (preview).

1 .- 20 . (canceled) 21 . A computer-implemented method, comprising: receiving first audio data representing a first portion of an utterance; performing first automatic speech recognition (ASR) processing on the first audio data using a first ASR model of a first device to generate first encoded data representing a possible transcription of the first portion of the utterance; sending the first encoded data to a second device; performing second ASR processing on the first encoded data using a second ASR model of the second device to determine a first ASR hypothesis corresponding to the first portion of the utterance, wherein the second ASR model is different from the first ASR model; and based at least in part on the first ASR hypothesis, causing an action to be performed responsive to the utterance. 22 . The computer-implemented method of claim 21 , wherein the first encoded data represents lattice data corresponding to the first ASR processing. 23 . The computer-implemented method of claim 21 , wherein the second ASR model corresponds to at least one command executable by the first device. 24 . The computer-implemented method of claim 21 , further comprising: processing the first audio data to identify one or more characteristics of the first audio data; and sending second data representing the one or more characteristics to the second device. 25 . The computer-implemented method of claim 24 , wherein the second ASR processing is based at least in part on the second data. 26 . The computer-implemented method of claim 24 , wherein the second data represents an identifier corresponding to a speaker of the utterance. 27 . The computer-implemented method of claim 24 , wherein the second data represents an identifier corresponding to media detected being output in an environment corresponding to the utterance. 28 . The computer-implemented method of claim 21 , further comprising: processing the first encoded data by the second device to determine output data corresponding to whether the utterance was device directed. 29 . The computer-implemented method of claim 21 , further comprising: determining a directive corresponding to the first ASR hypothesis, wherein causing the action to be performed comprises sending the directive to the first device. 30 . The computer-implemented method of claim 21 , wherein the second device performs the second ASR processing without receiving the first audio data. 31 . A system comprising: at least one processor; and at least one memory comprising instructions that, when executed by the at least one processor, cause the system to: receive first audio data representing a first portion of an utterance; perform first automatic speech recognition (ASR) processing on the first audio data using a first ASR model of a first device to generate first encoded data representing a possible transcription of the first portion of the utterance; send the first encoded data to a second device; perform second ASR processing on the first encoded data using a second ASR model of the second device to determine a first ASR hypothesis corresponding to the first portion of the utterance, wherein the second ASR model is different from the first ASR model; and based at least in part on the first ASR hypothesis, cause an action to be performed responsive to the utterance. 32 . The system of claim 31 , wherein the first encoded data represents lattice data corresponding to the first ASR processing. 33 . The system of claim 31 , wherein the second ASR model corresponds to at least one command executable by the first device. 34 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: process the first audio data to identify one or more characteristics of the first audio data; and send second data representing the one or more characteristics to the second device. 35 . The system of claim 34 , wherein the second ASR processing is based at least in part on the second data. 36 . The system of claim 34 , wherein the second data represents an identifier corresponding to a speaker of the utterance. 37 . The system of claim 34 , wherein the second data represents an identifier corresponding to media detected being output in an environment corresponding to the utterance. 38 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: process the first encoded data by the second device to determine output data corresponding to whether the utterance was device directed. 39 . The system of claim 31 , wherein the at least one memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine a directive corresponding to the first ASR hypothesis, wherein the instructions that cause the action to be performed comprise instructions that, when executed by the at least one processor, cause the system to send the directive to the first device. 40 . The system of claim 31 , wherein the second device performs the second ASR processing without receiving the first audio data.

Assignees

Inventors

Classifications

  • G10L17/26Primary

    Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices · CPC title

  • using context dependencies, e.g. language models · CPC title

  • Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024029743A1 cover?
Some speech processing systems may handle some commands on-device rather than sending the audio data to a second device or system for processing. The first device may have limited speech processing capabilities sufficient for handling common language and/or commands, while the second device (e.g., an edge device and/or a remote system) may call on additional language models, entity libraries, s…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L17/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 25 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).