What technology area does this patent fall under?

Primary CPC classification G10L15/22. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 26 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Intent-specific automatic speech recognition result generation

US11398236B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11398236-B2
Application number	US-202015929796-A
Country	US
Kind code	B2
Filing date	May 21, 2020
Priority date	Dec 20, 2013
Publication date	Jul 26, 2022
Grant date	Jul 26, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Features are disclosed for generating intent-specific results in an automatic speech recognition system. The results can be generated by utilizing a decoding graph containing tags that identify portions of the graph corresponding to a given intent. The tags can also identify high-information content slots and low-information carrier phrases for a given intent. The automatic speech recognition system may utilize these tags to provide a semantic representation based on a plurality of different tokens for the content slot portions and low information for the carrier portions. A user can be presented with a user interface containing top intent results with corresponding intent-specific top content slot values.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a computer-readable memory storing an automatic speech recognition (“ASR”) decoding graph comprising semantic metadata, wherein a first portion of the semantic metadata identifies a first arc of the ASR decoding graph as associated with a first intent, wherein a second portion of the semantic metadata identifies a second arc of the ASR decoding graph as associated with a second intent, and wherein a third portion of the semantic metadata identifies a plurality of tokens as associated with the first intent; and one or more processors in communication with the computer-readable memory and programmed by executable instructions to at least: receive audio data regarding a user request; generate, using the ASR decoding graph and the audio data, a user interface comprising: a first user interface element associated with a selected intent, wherein activation of the first user interface element causes change of the selected intent from the first intent to the second intent; and a second user interface element associated with a selected content slot value, wherein activation of the second user interface element causes change of the selected content slot value from a first token of the plurality of tokens to a second token of the plurality of tokens; and generate a response to the user request based at least partly on the selected intent and the selected content slot value. 2. The system of claim 1 , wherein the ASR decoding graph comprises a finite state transducer. 3. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to generate speech recognition results using the audio data and the ASR decoding graph, wherein the user interface is generated based at least partly on the speech recognition results. 4. The system of claim 3 , wherein to generate the speech recognition results, the one or more processors are programmed by further executable instructions to: determine a first score using a value associated with the first arc, wherein the first score indicates a probability that the audio data is associated with the first intent; and determine a second score using a value associated with the second arc, wherein the second score indicates a probability that the audio data is associated with the second intent, and wherein the first score is greater than the second score. 5. The system of claim 3 , wherein to generate the speech recognition results, the one or more processors are programmed by further executable instructions to: determine a first score using a value associated with a third arc of the ASR decoding graph, wherein the first score indicates a probability that the audio data is associated a content slot value corresponding to the first token; and determine a second score using a value associated with a fourth arc of the ASR decoding graph, wherein the second score indicates a probability that the audio data is associated a content slot value corresponding to the second token, and wherein the first score is greater than the second score. 6. The system of claim 3 , wherein the one or more processors are programmed by further executable instructions to: select a subset of tokens from the speech recognition results, wherein each token of the subset of tokens is associated with a same content slot of the first intent; rank the subset of tokens to generate a ranked subset of tokens; and generate a list of options selectable using the second user interface element, wherein the list of options is based on the ranked subset of tokens. 7. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to generate a semantic representation of the user request using the ASR decoding graph, wherein the semantic representation comprises a carrier phrase portion associated with the first intent and a content slot portion associated with the first token, and wherein the user interface comprises a textual representation of the semantic representation. 8. The system of claim 1 , wherein to generate the response, the one or more processors are programmed by further executable instructions to generate the response using the selected intent and a multi-domain natural language understanding (“NLU”) subsystem, wherein the multi-domain NLU subsystem comprises a plurality of NLU models, and wherein the selected intent is associated with an NLU model of the plurality of NLU models. 9. The system of claim 1 , wherein the first token is a word, a phoneme, or a phoneme in context. 10. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to: cause display of the user interface; and receive user interaction data representing selection of one of the first user interface element or the second user interface element. 11. A computer-implemented method comprising: under control of a computing system comprising one or more computing devices configured with specific computer-executable instructions, loading, into memory of the computing system, an automatic speech recognition (“ASR”) decoding graph comprising semantic metadata, wherein a first portion of the semantic metadata identifies a first arc of the ASR decoding graph as associated with a first intent, wherein a second portion of the semantic metadata identifies a second arc of the ASR decoding graph as associated with a second intent, and wherein a third portion of the semantic metadata identifies a plurality of tokens as associated with the first intent; receiving audio data regarding a user request; generating, using the ASR decoding graph and the audio data, a user interface comprising: a first user interface element associated with a selected intent, wherein activation of the first user interface element causes change of the selected intent from the first intent to the second intent; and a second user interface element associated with a selected content slot value, wherein activation of the second user interface element causes change of the selected content slot value from a first token of the plurality of tokens to a second token of the plurality of tokens; and generating a response to the user request based at least partly on the selected intent and the selected content slot value. 12. The computer-implemented method of claim 11 , wherein loading the ASR decoding graph comprises loading a finite state transducer. 13. The computer-implemented method of claim 11 , further comprising generating speech recognition results using the audio data and the ASR decoding graph, wherein the user interface is generated based at least partly on the speech recognition results. 14. The computer-implemented method of claim 13 , wherein generating the speech recognition results comprises: determining a first score using a value associated with the first arc, wherein the first score indicates a probability that the audio data is associated with the first intent; and determining a second score using a value associated with the second arc, wherein the second score indicates a probability that the audio data is associated with the second intent, and wherein the first score is greater than the second score. 15. The computer-implemented method of claim 13 , wherein generating the speech recognition results comprises: determining a first score using a value associated with a third arc of the ASR decoding graph, wherein the first score indicates a probability that the audio data is associated a content slot value corresponding to the first token; and determining a second score using a valu

Assignees

Amazon Tech Inc

Inventors

Classifications

G10L2015/221
Announcement of recognition results · CPC title
G10L15/1822
Parsing for meaning understanding · CPC title
G10L15/22Primary
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L15/26Primary
Speech to text systems (G10L15/08 takes precedence) · CPC title

Patent family

Related publications grouped by family.

View patent family 72838605

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11398236B2 cover?: Features are disclosed for generating intent-specific results in an automatic speech recognition system. The results can be generated by utilizing a decoding graph containing tags that identify portions of the graph corresponding to a given intent. The tags can also identify high-information content slots and low-information carrier phrases for a given intent. The automatic speech recognition s…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 26 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Systems and methods for integrating third party services with a digital assistant

Intelligent automated assistant

Knowledge Source Personalization To Improve Language Models

Providing contextual data for selected link units

Factor graph for semantic parsing

Frequently asked questions