On-device learning in a hybrid speech processing system

US11087739B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11087739-B1
Application numberUS-201816189303-A
CountryUS
Kind codeB1
Filing dateNov 13, 2018
Priority dateNov 13, 2018
Publication dateAug 10, 2021
Grant dateAug 10, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speech interface device is configured to receive response data from a remote speech processing system for responding to user speech. This response data may be enhanced with information such as remote NLU data. The response data from the remote speech processing system may be compared to local NLU data to improve a speech processing model on the device. Thus, the device may perform supervised on-device learning based on the remote NLU data. The device may determine differences between the updated speech processing model and an original speech processing model received from the remote system and may send data indicating these differences to the remote system. The remote system may aggregate data received from a plurality of devices and may generate an improved speech processing model.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, the method comprising, by a first device: receiving first audio data corresponding to a first utterance; processing the first audio data using at least one first model to generate first natural language understanding (NLU) data; sending the first audio data to a remote speech processing system; receiving, from the remote speech processing system, second NLU data corresponding to the first audio data; generating at least one second model configured to be used to operate on audio data and generate NLU data, wherein the generating uses the second NLU data and the at least one first model; generating first data representing one or more differences between the at least one first model and the at least one second model; and sending the first data to the remote speech processing system. 2. The computer-implemented method of claim 1 , wherein: processing the first audio data further comprises: determining, during a first period of time, that the first device is not communicatively coupled to the remote speech processing system, processing the first audio data using the at least one first model to generate the first NLU data, performing an action corresponding to the first NLU data, and storing the first audio data and the first NLU data; and sending the first audio data to the remote speech processing system further comprises: determining, during a second period of time after the first period of time, that the first device is communicatively coupled to the remote speech processing system, and sending the first audio data to the remote speech processing system. 3. A computer-implemented method, the method comprising, by a first device: receiving first audio data corresponding to a first utterance; processing the first audio data using at least one first model to generate first natural language understanding (NLU) data that indicates a determined intent of the first utterance; sending the first audio data to a remote speech processing system; receiving, from the remote speech processing system, second NLU data corresponding to the first audio data; and generating at least one second model configured to be used to operate on audio data and generate NLU data, wherein the generating comprises: determining a difference between the first NLU data and the second NLU data, identifying, based on the difference, a first weight value associated with the at least one first model, and generating the at least one second model at least in part by replacing the first weight value with a second weight value in the at least one second model. 4. The computer-implemented method of claim 3 , further comprising: generating first data representing one or more differences between the at least one first model and the at least one second model; and sending the first data to the remote speech processing system. 5. The computer-implemented method of claim 1 , further comprising: receiving, from the remote speech processing system, at least one third model based on the first data, the at least one third model configured to be used to operate on audio data and generate NLU data; and storing the at least one third model. 6. The computer-implemented method of claim 1 , further comprising: receiving, from the remote speech processing system, at least one third model, the at least one third model configured to be used to operate on audio data and generate NLU data; and generating at least one fourth model configured to be used to operate on audio data and generate NLU data, wherein the generating uses the at least one third model and the first data. 7. The computer-implemented method of claim 1 , further comprising: generating second data representing differences between a first plurality of weight values associated with the at least one first model and a second plurality of weight values associated with the at least one second model; and generating, based on the second data, the first data to correspond to a portion of the differences. 8. The computer-implemented method of claim 1 , further comprising: determining a first difference value between a first weight value associated with the at least one first model and a second weight value associated with the at least one second model; determining a second difference value between a third weight value associated with the at least one first model and a fourth weight value associated with the at least one second model; determining that the first difference value is above a threshold value; determining that the second difference value is below the threshold value; and including the first difference value, but not the second difference value, in the first data. 9. A system comprising: at least one processor; and memory including instructions operable to be executed by the at least one processor to cause the system to: store, by a first device, at least one first spoken language understanding (SLU) model; generate at least one second SLU model based on first response data and the at least one first SLU model, the at least one second SLU model configured to process audio data to generate natural language understanding (NLU) data; generate first data representing differences between the at least one first SLU model and the at least one second SLU model; send the first data to a remote speech processing system; and receive, from the remote speech processing system, at least one third SLU model based on the first data. 10. The system of claim 9 , wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to: generate second data representing differences between a first plurality of weight values associated with the at least one first SLU model and a second plurality of weight values associated with the at least one second SLU model; and generate, based on the second data, the first data, wherein the first data corresponds to a portion of the differences. 11. The system of claim 10 , wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine a first difference value between a first weight value associated with the at least one first SLU model and a second weight value associated with the at least one second SLU model; determine a second difference value between a third weight value associated with the at least one first SLU model and a fourth weight value associated with the at least one second SLU model; determine that the first difference value is above a threshold value; determine that the second difference value is below the threshold value; and generate the first data by including the first difference value, but not the second difference value, in the first data. 12. The system of claim 9 , wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine a first plurality of weight values associated with the at least one first SLU model; determine a second plurality of weight values associated with the at least one second SLU model; generate the first data by determining differences between the first plurality of weight values and the second plurality of weight values; and generate at least one fourth SLU model configured to be used to operate on audio data and generate NLU data, wherein the generating modifies the at least one third SLU model based on the differences between the first plurality of weight values and the second plurality of weight values. 13. The system of claim 9 , wherein the m

Assignees

Inventors

Classifications

  • Named entity recognition · CPC title

  • Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • Morphological analysis · CPC title

  • using statistical methods · CPC title

  • updating or merging of old and new templates; Mean values; Weighting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11087739B1 cover?
A speech interface device is configured to receive response data from a remote speech processing system for responding to user speech. This response data may be enhanced with information such as remote NLU data. The response data from the remote speech processing system may be compared to local NLU data to improve a speech processing model on the device. Thus, the device may perform supervised …
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/1822. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 10 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).