What technology area does this patent fall under?

Primary CPC classification G10L13/047. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Nov 16 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Text-to-speech synthesis system and method

US2023368775A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2023368775-A1
Application number	US-202318346694-A
Country	US
Kind code	A1
Filing date	Jul 3, 2023
Priority date	Mar 28, 2018
Publication date	Nov 16, 2023
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, computer program product, and computer system for text-to-speech synthesis is disclosed. Synthetic speech data for an input text may be generated. The synthetic speech data may be compared to recorded reference speech data corresponding to the input text. Based on, at least in part, the comparison of the synthetic speech data to the recorded reference speech data, at least one feature indicative of at least one difference between the synthetic speech data and the recorded reference speech data may be extracted. A speech gap filling model may be generated based on, at least in part, the at least one feature extracted. A speech output may be generated based on, at least in part, the speech gap filling model.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computing system including one or more processors and one or more memories configured to perform operations comprising: comparing synthetic speech data for an input text to recorded reference speech data corresponding to the input text; extracting at least one feature indicative of at least one difference between the synthetic speech data and the recorded reference speech data based on, at least in part, the comparison of the synthetic speech data to the recorded reference speech data; generating a speech gap filling model based on, at least in part, the at least one feature extracted; generating a speech output based on, at least in part, the speech gap filling model; comparing the speech output generated for a second input text to recorded reference speech data corresponding to the second input text; and extracting an updated at least one feature indicative of at least one difference between the speech output generated for the second input text and the recorded reference speech data corresponding to the second input text based on, at least in part, the comparison of the speech output for the second input text to the recorded reference speech data corresponding to the second input text. 2 . The computing system of claim 1 , wherein generating the speech output comprises: generating an interim set of parameters; processing the interim set of parameters based on, at least in part, the speech gap filling model to generate a final set of parameters; and generating the speech output based on, at least in part, the final set of parameters. 3 . The computing system of claim 1 , wherein the synthetic speech data generated is based on, at least in part, at least one of a parametric acoustic model and a linguistic model pre-configured for a speaker. 4 . The computing system of claim 1 , wherein the synthetic speech data generated is further based on, at least in part, the recorded reference speech data pre-recorded by a speaker. 5 . The computing system of claim 1 further comprising aligning the synthetic speech data and the recorded reference speech data preceding the comparison. 6 . The computing system of claim 5 , wherein aligning the synthetic speech data and the recorded reference speech data comprises implementing one or more of pitch shifting, time normalization, and time alignment between the synthetic speech data and the recorded reference speech data. 7 . The computing system of claim 1 further comprising training a neural network based on, at least in part, the at least one feature to generate the speech gap filling model. 8 . The computing system of claim 1 further comprising updating the speech gap filling model based on, at least in part, the updated at least one feature. 9 . A computer-implemented method, comprising: comparing synthetic speech data for an input text to recorded reference speech data corresponding to the input text; extracting at least one feature indicative of at least one difference between the synthetic speech data and the recorded reference speech data based on, at least in part, the comparison of the synthetic speech data to the recorded reference speech data; generating a speech gap filling model based on, at least in part, the at least one feature extracted; and generating a speech output based on, at least in part, the speech gap filling model; comparing the speech output generated for a second input text to recorded reference speech data corresponding to the second input text; and extracting an updated at least one feature indicative of at least one difference between the speech output generated for the second input text and the recorded reference speech data corresponding to the second input text based on, at least in part, the comparison of the speech output for the second input text to the recorded reference speech data corresponding to the second input text. 10 . The computer-implemented method of claim 9 , wherein generating the speech output comprises: generating an interim set of parameters; processing the interim set of parameters based on, at least in part, the speech gap filling model to generate a final set of parameters; and generating the speech output based on, at least in part, the final set of parameters. 11 . The computer-implemented method of claim 9 , wherein the synthetic speech data generated is based on, at least in part, at least one of a parametric acoustic model and a linguistic model pre-configured for a speaker. 12 . The computer-implemented method of claim 9 , wherein the synthetic speech data generated is further based on, at least in part, the recorded reference speech data pre-recorded by a speaker. 13 . The computer-implemented method of claim 9 further comprising aligning the synthetic speech data and the recorded reference speech data preceding the comparison. 14 . The computer-implemented method of claim 13 , wherein aligning the synthetic speech data and the recorded reference speech data comprises implementing one or more of pitch shifting, time normalization, and time alignment between the synthetic speech data and the recorded reference speech data. 15 . The computer-implemented method of claim 9 further comprising training a neural network based on, at least in part, the at least one feature to generate the speech gap filling model. 16 . The computer-implemented method of claim 9 further comprising updating the speech gap filling model based on, at least in part, the updated at least one feature. 17 . A computer program product residing on a computer readable storage medium having a plurality of instructions stored thereon which, when executed across one or more processors, causes at least a portion of the one or more processors to perform operations comprising: comparing synthetic speech data for an input text to recorded reference speech data corresponding to the input text; extracting at least one feature indicative of at least one difference between the synthetic speech data and the recorded reference speech data based on, at least in part, the comparison of the synthetic speech data to the recorded reference speech data; generating a speech gap filling model based on, at least in part, the at least one feature extracted; generating a speech output based on, at least in part, the speech gap filling model; comparing the speech output generated for a second input text to recorded reference speech data corresponding to the second input text; and extracting an updated at least one feature indicative of at least one difference between the speech output generated for the second input text and the recorded reference speech data corresponding to the second input text based on, at least in part, the comparison of the speech output for the second input text to the recorded reference speech data corresponding to the second input text. 18 . The computer program product of claim 17 , wherein generating the speech output comprises: generating an interim set of parameters; processing the interim set of parameters based on, at least in part, the speech gap filling model to generate a final set of parameters; and generating the speech output based on, at least in part, the final set of parameters. 19 . The computer program product of claim 17 , wherein the synthetic speech data generated is based on, at least in part, at least one of a parametric acoustic model and a linguistic model pre-configured for a speaker. 20 . The computer program product of claim 17

Assignees

Telepathy Labs Inc

Inventors

Classifications

G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/09
Supervised learning · CPC title
G10L13/047Primary
Architecture of speech synthesisers · CPC title
G06N3/08
Learning methods · CPC title
G10L13/08
Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title

Patent family

Related publications grouped by family.

View patent family 68058486

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023368775A1 cover?: A method, computer program product, and computer system for text-to-speech synthesis is disclosed. Synthetic speech data for an input text may be generated. The synthetic speech data may be compared to recorded reference speech data corresponding to the input text. Based on, at least in part, the comparison of the synthetic speech data to the recorded reference speech data, at least one feature…
Who is the assignee on this patent?: Telepathy Labs Inc
What technology area does this patent fall under?: Primary CPC classification G10L13/047. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Nov 16 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).