Spoken language understanding models

US11574637B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11574637-B1
Application numberUS-202017014042-A
CountryUS
Kind codeB1
Filing dateSep 8, 2020
Priority dateSep 8, 2020
Publication dateFeb 7, 2023
Grant dateFeb 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for using a federated learning framework to update machine learning models for spoken language understanding (SLU) system are described. The system determines which labeled data is needed to update the models based on the models generating an undesired response to an input. The system identifies users to solicit labeled data from, and sends a request to a user device to speak an input. The device generates labeled data using the spoken input, and updates the on-device models using the spoken input and the labeled data. The updated model data is provided to the system to enable the system to update the system-level (global) models.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: determining a set of past user inputs associated with a first profile identifier that were used to previously train spoken language understanding (SLU) models; determining first data based on the set of past user inputs, the first data representing a likelihood of receiving an input, associated with the first profile identifier, to a system-generated request; determining a time cost metric representing a length of time between when a past system-generated request for a first utterance was output and when a past response to the past system-generated request was received; identifying a first profile attribute corresponding to the first profile identifier; selecting, from among a plurality of profile identifiers, the first profile identifier based on the first data, the time cost metric and the first profile attribute; determining request data indicating a request to speak a second utterance associated with intent data and slot data; performing text-to-speech (TTS) processing on the request data to determine output audio data; associating the output audio data with a session identifier; sending the output audio data to a device associated with the first profile identifier; receiving, from the device, input audio data representing the second utterance; associating the input audio data with the session identifier; processing the input audio data to determine a portion of the second utterance representing the slot data; determining labeled data by associating: the intent data with the input audio data, and the slot data with the portion of the second utterance; determining a first SLU model associated with first model data stored by the device; determining second model data by updating the first SLU model using the labeled data; determining gradient data representing a difference between the first model data and the second model data; and sending, to a remote system, the gradient data to enable the remote system to update a second SLU model stored at the remote system. 2. The computer-implemented method of claim 1 , further comprising: determining a confidence score associated with an output responsive to a past user input, the confidence score being generated by the first SLU model in processing audio data representing the past user input, the confidence score representing a likelihood of the output being a desired response to the past user input; determining negative feedback data associated with the output, the negative feedback data representing a user interrupting presentation of the output; determining a number of times the past user input was received; and selecting, from among a plurality of past user inputs, the past user input based on the confidence score, the negative feedback data and the number of times, wherein determining the request data comprises determining the request data using the past user input. 3. The computer-implemented method of claim 2 , further comprising: receiving the negative feedback data from a set of profile identifiers; determining a set of profile attributes corresponding to the set of profile identifiers; identifying the first profile attribute included in the set of profile attributes; and determining that the first profile identifier is associated with the first profile attribute. 4. The computer-implemented method of claim 1 , further comprising: receiving image data, captured by a second device, corresponding to an output responsive to a past user input, the image data representing a user sentiment in response to the output; using the image data, determining negative user feedback data associated with the past user input; determining intent data associated with the past user input; determining slot data associated with the past user input; determining request template data to be used to generate the request to speak the first utterance, the request template data representing a natural language output corresponding to the intent data and the slot data; and populating the request template data with the slot data to determine the request data. 5. A computer-implemented method comprising: determining, using past input data associated with a first profile identifier, first data indicating a likelihood of receiving a user input, associated with the first profile identifier, to a system-generated request; selecting the first profile identifier from a plurality of profile identifiers based on the first data; determining request data representing a system-generated request to prompt a first utterance associated with a first intent and at least a first slot type; sending, to a device associated with the first profile identifier, the request data for output by the device; receiving, from the device, first audio data representing the first utterance; determining second data representing an association of the first audio data with the first intent and the at least first slot type; and generating a second machine learning model configured to process spoken inputs, the second machine learning model being generated by updating a first machine learning model using the second data. 6. The computer-implemented method of claim 5 , further comprising: determining the past input data associated with the first profile identifier, the past input data associated with a second intent and a second slot value; receiving stored data representing at least one natural language input associated with a third intent and at least a third slot value; determining user reliability data associated with the first profile identifier based at least in part on processing the past input data with respect to the stored data; and selecting the first profile identifier based on the user reliability data. 7. The computer-implemented method of claim 5 , further comprising: determining a first response time associated with the first profile identifier, the first response time representing a first length of time between when a first past system-generated request was sent to the first profile identifier and when a first past response to the first past system-generated request was received from the first profile identifier; determining a number of inputs associated with the first profile identifier, the inputs being responsive to a plurality of system-generated requests; and selecting the first profile identifier based on the first response time and the number of inputs. 8. The computer-implemented method of claim 5 , further comprising: determining a third utterance previously processed by the first machine learning model and resulting in an undesired response, the third utterance corresponding to the first intent and the first slot type; determining at least a second profile identifier that provided the third utterance; identifying at least a profile attribute corresponding to the second profile identifier; and selecting the first profile identifier based on the first profile identifier corresponding to the profile attribute. 9. The computer-implemented method of claim 5 , further comprising: determining a confidence score associated with the first utterance, the confidence score being generated by the first machine learning model in processing second audio data representing the first utterance; determining feedback data associated with an output responsive to the first utterance, the feedback data representing feedback received in response to presenting the output; determining a number of times the first utterance is received; and selecting the first utterance based on the confidence score, the feedback data and the number of times. 10. The computer-implemented method of claim 5

Assignees

Inventors

Classifications

  • Discourse or dialogue representation · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Parsing for meaning understanding · CPC title

  • Ensemble learning · CPC title

  • Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11574637B1 cover?
Techniques for using a federated learning framework to update machine learning models for spoken language understanding (SLU) system are described. The system determines which labeled data is needed to update the models based on the models generating an undesired response to an input. The system identifies users to solicit labeled data from, and sends a request to a user device to speak an inpu…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).