Multi-assistant natural language input processing to determine a voice model for synthesized speech

US11393477B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11393477-B2
Application numberUS-201916580721-A
CountryUS
Kind codeB2
Filing dateSep 24, 2019
Priority dateSep 24, 2019
Publication dateJul 19, 2022
Grant dateJul 19, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for a natural language processing (NLP) system to implement more than one assistant are described. The NLP system may receive a natural language input from a device. The NLP system may also receive one or more signals representing one or more assistants to be implemented with respect to the natural language input. The NLP system may intelligently select an assistant to be invoked with respect to the natural language input. Once the assistant is selected, the NLP system may cause content, output to a user, to have characteristics specific to the assistant.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving natural language understanding (NLU) results data representing a natural language input; receiving an indication of a spoken wakeword received by a first device, the spoken wakeword causing the first device to switch from a low power state to a high power state to capture the natural language input; determining, based at least in part on the NLU results data and the indication, a first natural language processing (NLP) system assistant identifier from among a plurality of NLP system assistant identifiers; determining a first assistant skill system corresponding to the first NLP system assistant identifier; sending, to the first assistant skill system, first data representing the NLU results data; receiving, from the first assistant skill system, first response data including: a first portion representing first text data to be output prior to a result of processing to be performed by a first skill system, and a second portion representing the processing to be performed by the first skill system; determining a voice model associated with the first NLP system assistant identifier, the voice model representing speech characteristics specific to the first NLP system assistant identifier and unique from other NLP system assistant identifiers of the plurality of NLP system assistant identifiers; performing, using the voice model, text-to-speech (TTS) processing on the first text data to generate first audio data corresponding to a voice specific to the first NLP system assistant identifier and unique from the other NLP system assistant identifiers; sending the first audio data to the first device for output; sending, to the first skill system, the NLU results data; receiving from the first skill system, second text data corresponding to a first response to the natural language input; performing, using the voice model, TTS processing on the second text data to generate second audio data; and sending the second audio data to the first device for output. 2. The method of claim 1 , further comprising: determining the first NLP system assistant identifier further based at least in part on a natural language name associated with the first NLP system assistant identifier and included in the natural language input; determining a second NLP system assistant identifier associated with a device type corresponding to the first device; determining a first weight associated with the natural language name; determining a second weight associated with the device type; and sending the first data to the first assistant skill system based at least in part on: the natural language name being included in the natural language input, the first weight, the device type corresponding to the first device, and the second weight. 3. The method of claim 1 , further comprising: storing second data associating the first NLP system assistant identifier and a dialog identifier corresponding to the natural language input, the dialog identifier associated with a plurality of related natural language inputs and NLP system outputs occurring via the first device over a period of time; determining the voice model based at least in part on the first NLP system assistant identifier being associated with the dialog identifier; receiving, from the first device, third data representing the second audio data has been output; and causing, after receiving the third data, the voice of the first NLP system assistant identifier to no longer be used to output data corresponding to the dialog identifier. 4. The method of claim 3 , further comprising: determining a second NLP system assistant identifier; determining a second assistant skill system corresponding to the second NLP system assistant identifier; receiving, from the second assistant skill system, fourth data including: a first portion representing third text data to be output prior to a result of processing to be performed by a second skill system, and a second portion representing the processing to be performed by the second skill system; and storing fifth data associating the second NLP system assistant identifier and the dialog identifier, the fifth data causing a second voice of the second NLP system assistant identifier to be used to output data responsive to natural language inputs corresponding to the dialog identifier. 5. A system comprising: a first component that outputs natural language understanding (NLU) results data corresponding to a natural language input; a second component that: receives first data representing at least one trigger; determines the at least one trigger corresponds to a first natural language processing (NLP) system assistant identifier of a plurality of NLP system assistant identifiers; outputs the first NLP system assistant identifier; sends the NLU results data to a first assistant skill system corresponding to the first NLP system assistant identifier; receives, from the first assistant skill system: first output data to be output prior to a result of processing to be performed by a first skill system, and second data representing the processing to be performed by the first skill system; sends, to the first skill system in response to receiving the second data, the NLU results data; and receives, from the first skill system, second output data corresponding to a first response to the NLU results data; and a speech synthesis component that: receives the first output data; receives the first NLP system assistant identifier; determines a first voice model associated with the first NLP system assistant identifier from among a plurality of voice models comprising the first voice model and at least a second voice model associated with a second NLP system assistant identifier, the first voice model corresponding to a voice specific to the first NLP system assistant identifier and different from other NLP system assistant identifiers of the plurality of NLP system assistant identifiers; generates, using the first voice model and the first output data, first synthesized speech in the voice specific to the first NLP system assistant identifier; receives the second output data; and generates, using the first voice model and the second output data, second synthesized speech. 6. The system of claim 5 , wherein the first voice model corresponds to a first lexicon different from a second lexicon corresponding to the second voice model. 7. The system of claim 5 , wherein the second component: determines a first NLP system assistant trigger in the first data; determines a second NLP system assistant trigger in the first data; determines a first weight associated with the first NLP system assistant trigger; determines a second weight associated with the second NLP system assistant trigger; and outputs the first NLP system assistant identifier based at least in part on the first NLP system assistant trigger, the first weight, the second NLP system assistant trigger, and the second weight. 8. The system of claim 5 , wherein: the first data indicates a device type associated with the first NLP system assistant identifier and representing a first device that captured a natural language input corresponding to the NLU results data. 9. The system of claim 5 , wherein the first NLP system assistant identifier corresponds to a default NLP system assistant. 10. The system of claim 5 , wherein the second component: causes the first synthesized speech and the second synthesized speech to be output; and causes, after causing the first synthesized speech and the second synthesized speech to be output, the first NLP system assistant identifier to no longer be an a

Assignees

Inventors

Classifications

  • G10L13/033Primary

    Voice editing, e.g. manipulating the voice of the synthesiser · CPC title

  • G10L15/30Primary

    Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

  • Parsing for meaning understanding · CPC title

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11393477B2 cover?
Techniques for a natural language processing (NLP) system to implement more than one assistant are described. The NLP system may receive a natural language input from a device. The NLP system may also receive one or more signals representing one or more assistants to be implemented with respect to the natural language input. The NLP system may intelligently select an assistant to be invoked wit…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L13/033. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 19 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).