Who is the assignee on this patent?

Sorin Alexander, Shechtman Slava, Pollet Vincent, and 1 more

What technology area does this patent fall under?

Primary CPC classification G10L25/48. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 01 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method for automatic prediction of speech suitability for statistical modeling

US9484045B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9484045-B2
Application number	US-201213606618-A
Country	US
Kind code	B2
Filing date	Sep 7, 2012
Priority date	Sep 7, 2012
Publication date	Nov 1, 2016
Grant date	Nov 1, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer server system for automatically determining suitability of at least a portion of a speech signal, comprising voice data, for statistical modeling, the system comprising: a memory storing computer code instructions thereon; and a processor, the memory, with the computer code instructions, and the processor being configured to cause the computer server system to implement: a modelability estimator configured to: determine a statistical modelability score of the at least a portion of the speech signal comprising voice data, the statistical modelability score indicating favorability of the at least a portion of the speech signal for statistical modeling in terms of human perception and based at least in part on determining a temporal stationarity of the at least a portion of the speech signal comprising voice data; and forward the statistical modelability score to a speech synthesis system executed by the processor, wherein the speech synthesis system is configured to utilize the modelability score in converting text to speech; and a decision maker configured to determine a preferred speaker selection for use by the speech synthesis system in building a statistical text-to-speech system based on the statistical modelability score determined for speech provided by each of a plurality of speakers. 2. The computer server system according to claim 1 , wherein the modelability estimator is further configured to determine the temporal stationarity based on variability of an instantaneous spectrum of the at least a portion of the speech signal. 3. The computer server system according to claim 2 , wherein the modelability estimator is still further configured to determine the variability of the instantaneous spectrum based on (i) a first moment of an instantaneous spectrum component distribution and (ii) a second moment of the instantaneous spectrum component distribution. 4. The computer server system according to claim 1 , wherein the decision maker is further configured to: determine a segment representation type to be used by the speech synthesis system in a multi-form segment speech synthesis based on the statistical modelability score. 5. The computer server system according to claim 4 , wherein the modelability estimator is further configured to determine the statistical modelability score for at least one segment comprising at least a portion of an output speech signal being synthesized, and wherein the decision maker is further configured to determine the segment representation type, for the at least one segment, based on at least the statistical modelability score for the at least one segment. 6. The computer server system according to claim 4 , wherein the modelability estimator is further configured to determine for at least one segment comprising at least a portion of an output speech signal being synthesized, the statistical modelability score for a segment cluster that includes the at least one segment, and wherein the decision maker is further configured to determine the segment representation type, for the at least one segment, based on at least the statistical modelability score of the segment cluster that includes the at least one segment. 7. The computer server system according to claim 4 , further comprising a templates pruner configured to remove from a voice dataset at least one segment relative to its statistical modelability score. 8. The computer server system according to claim 4 , wherein the statistical modelability score is further based at least in part on a loudness score. 9. A computerized method of automatically determining, by a server, suitability of at least a portion of a speech signal, comprising voice data, for statistical modeling, the computerized method comprising: determining a statistical modelability score of the at least a portion of the speech signal comprising voice data, the statistical modelability score indicating favorability of the at least a portion of the speech signal for statistical modeling in terms of human perception and based at least in part on a temporal stationarity of the at least a portion of the speech signal comprising voice data; forwarding the statistical modelability score to a speech synthesis system implemented by the server, wherein the speech synthesis system is configured to utilize the modelability score in converting text to speech; and determining a preferred speaker selection for use by the speech synthesis system in building a statistical text-to-speech system based on the statistical modelability score determined for speech provided by each of a plurality of speakers. 10. The computerized method according to claim 9 , wherein the temporal stationarity is determined based on variability of an instantaneous spectrum of the at least a portion of the speech signal. 11. The computerized method according to claim 10 , wherein the variability of the instantaneous spectrum is determined based on (i) a first moment of an instantaneous spectrum component distribution and (ii) a second moment of the instantaneous spectrum component distribution. 12. The computerized method according to claim 9 , wherein the method comprises determining a segment representation type to be used by the speech synthesis system in a multi-form segment speech synthesis system based on the statistical modelability score. 13. The computerized method according to claim 12 , further comprising: determining the statistical modelability score for at least one segment comprising at least a portion of an output speech signal being synthesized; and determining the segment representation type, for the at least one segment, based on at least the statistical modelability score for the at least one segment. 14. The computerized method according to claim 12 , further comprising: determining, for at least one segment comprising at least a portion of an output speech signal being synthesized, the statistical modelability score for a segment cluster that includes the at least one segment; and determining the segment representation type, for the at least one segment based on at least the statistical modelability score of the segment cluster that includes the at least one segment. 15. The computerized method according to claim 14 , further comprising removing from a voice dataset at least one segment relative to its statistical modelability score. 16. The computerized method according to claim 12 , further comprising determining the statistical modelability score based at least in part on a loudness score. 17. A non-transitory computer-readable storage medium having computer-readable code stored thereon, which, when executed by a computer processor, causes the computer processor to automatically determine suitability of at least a portion of a speech signal, comprising voice data, for statistical modeling, by causing the processor to: determine a statistical modelability score of the at least a portion of the speech signal comprising voice data, the statistical modelability score indicating favorability of the at least a portion of the speech signal for statistical modeling in terms of human perception and the statistical modelability score being based at least in part on a temporal stationarity of the at least a portion of the speech signal comprising voice data; forward the statistical modelability score to a speech synthesis system executed by the processor, wherein the speech synthesis system is configured to utilize the modelability score in converting text to speech; and determine a preferred speaker selection

Assignees

Inventors

Classifications

G10L13/04
Details of speech synthesis systems, e.g. synthesiser structure or memory management · CPC title
G10L25/48Primary
specially adapted for particular use · CPC title
G10L25/18
the extracted parameters being spectral information of each sub-band · CPC title

Patent family

Related publications grouped by family.

View patent family 50234198

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9484045B2 cover?: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model d…
Who is the assignee on this patent?: Sorin Alexander, Shechtman Slava, Pollet Vincent, and 1 more
What technology area does this patent fall under?: Primary CPC classification G10L25/48. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 01 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).