What technology area does this patent fall under?

Primary CPC classification G10L13/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 06 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Speech processing techniques

US11670285B1 · US · B1

Patent metadata
Field	Value
Publication number	US-11670285-B1
Application number	US-202017102910-A
Country	US
Kind code	B1
Filing date	Nov 24, 2020
Priority date	Nov 24, 2020
Publication date	Jun 6, 2023
Grant date	Jun 6, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for an interactive turn-based reading experience are described. A system may take turns reading content, such as a book, with a user. The system may process audio data representing a user reading a portion of the content, determine reading evaluation data, and determine how to proceed for the next turn based on the reading evaluation data. For example, based on the reading evaluation data, the system may read a portion of the content by outputting synthesized speech representing the content, may ask the user re-read a portion of the content, or may ask the user to read a different, smaller portion of the content.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, from a device, first input audio data representing a spoken natural language input including a request to read a book, the first input audio data associated with a session identifier; processing the first input audio data to determine a book identifier associated with the book; receiving first text data associated with the book identifier, the first text data representing a first portion of the book; receiving, from the device, second input audio data including speech corresponding to the first portion of the book, the second input audio data associated with the session identifier; determining that the second input audio data corresponds to an entirety of the first portion of the book; determining, using a trained machine learning (ML) model, first reading evaluation data based on the second input audio data and the first text data, the first reading evaluation data associated with the session identifier; based on the first reading evaluation data, determining to output a second portion of the book; receiving second text data associated with the book identifier, the second text data representing the second portion of the book; performing text-to-speech (TTS) processing on the second text data to generate first output audio data representing the second portion of the book; and sending the first output audio data to the device. 2. The computer-implemented method of claim 1 , further comprising: after the first output audio data is sent, enabling a listening mode to capture speech; receiving, from the device, third input audio data corresponding to a third portion of the book, the third input audio data associated with the session identifier; determining that the third input audio data corresponds to an entirety of the third portion of the book; enabling the listening mode; determining second reading evaluation data based on the third input audio data and third text data representing the third portion of the book, the second reading evaluation data associated with the session identifier; based on the second reading evaluation data, sending, to the device, second output audio data representing a request to read the third portion of the book; enabling the listening mode; and receiving, from the device, fourth input audio data corresponding to the third portion of the book. 3. The computer-implemented method of claim 1 , further comprising: after the first output audio data is presented, enabling a listening mode to capture speech; receiving, from the device, third input audio data corresponding to a third portion of the book, the third input audio data associated with the session identifier, wherein the third portion of the book corresponds to a page in the book; determining that the third input audio data corresponds to an entirety of the third portion of the book; disabling the listening mode; determining second reading evaluation data based on the third input audio data and third text data representing the third portion of the book, the second reading evaluation data associated with the session identifier; based on the second reading evaluation data, determining to decrease amount of book to be read; and sending, to the device, second output audio data representing a request to read a fourth portion of the book, wherein the fourth portion of the book corresponds to a paragraph on the page. 4. The computer-implemented method of claim 1 , wherein determining the first reading evaluation data comprises: performing automatic speech recognition (ASR) processing using the second input audio data to determine ASR output data; processing the ASR output data with respect to the first text data to determine first data representing a reading accuracy, the first data based at least on one of: deletion of a word in the ASR output data with respect to the first text data, insertion of a word in the ASR output data with respect to the first text data, and substitution of a word in the ASR output data with respect to the first text data; processing the second input audio data using the trained ML model to determine second data representing a pronunciation accuracy, the trained ML model configured to perform phoneme alignment; and determining the first reading evaluation data using the first data and the second data. 5. A computer-implemented method comprising: receiving first input audio data corresponding to speech representing a first portion of content; determining that the first input audio data corresponds to an entirety of the first portion of the content; determining, using a first trained machine learning (ML) model, first reading evaluation data based on the first input audio data and the first portion of the content; based on the first reading evaluation data, determining to output a second portion of the content; performing text-to-speech (TTS) processing to generate first output audio data including synthesized speech corresponding to the second portion of the content; and outputting the first output audio data. 6. The computer-implemented method of claim 5 , further comprising: prior to receiving the first input audio data, receiving second input audio data representing a request to read content; receiving data representing the content; enabling a listening mode to capture speech; and disabling the listening mode after determining that the first input audio data corresponds to the entirety of the first portion of the content. 7. The computer-implemented method of claim 5 , further comprising: receiving second input audio data corresponding to reading of a third portion of content; determining that the second input audio data corresponds to an entirety of the third portion of the content; determining second reading evaluation data based on the second input audio data and the third portion of the content; based on the second reading evaluation data, outputting second output audio data representing a request to read the third portion of the content; and receiving third input audio data corresponding to the third portion of the content. 8. The computer-implemented method of claim 5 , further comprising: receiving second input audio data corresponding to reading of a third portion of content including a first number of words; determining that the second input audio data corresponds to an entirety of the third portion of the content; determining second reading evaluation data based on the second input audio data and the third portion of the content; and based on the second reading evaluation data, outputting second output audio data representing a request to read a fourth portion of the content including a second plurality of words, wherein the second plurality of words is less than the first number of words. 9. The computer-implemented method of claim 5 , wherein determining the first reading evaluation data comprises: performing automatic speech recognition (ASR) processing using the first input audio data to determine ASR output data; processing the ASR output data with respect to the first portion of the content to determine first data representing a reading accuracy; processing the first input audio data the first trained ML model to determine second data representing a pronunciation accuracy, the first trained ML model configured to perform phoneme alignment; and determining the first reading evaluation data using the first data and the second data. 10. The computer-implemented method of claim 5 , further comprising: prior to receiving the first input audio data, receiving second input audio data requesting to read a book; receiving image data representing a b

Assignees

Amazon Tech Inc

Inventors

Classifications

G10L13/04
Details of speech synthesis systems, e.g. synthesiser structure or memory management · CPC title
G10L13/08Primary
Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title
G10L2015/025
Phonemes, fenemes or fenones being the recognition units · CPC title
G10L15/187
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
G10L13/00
Speech synthesis; Text to speech systems · CPC title

Patent family

Related publications grouped by family.

View patent family 86609352

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11670285B1 cover?: Techniques for an interactive turn-based reading experience are described. A system may take turns reading content, such as a book, with a user. The system may process audio data representing a user reading a portion of the content, determine reading evaluation data, and determine how to proceed for the next turn based on the reading evaluation data. For example, based on the reading evaluation…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G10L13/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 06 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).