System and method for unit selection text-to-speech using a modified Viterbi approach

US10079011B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10079011-B2
Application numberUS-201414282040-A
CountryUS
Kind codeB2
Filing dateMay 20, 2014
Priority dateJun 18, 2010
Publication dateSep 18, 2018
Grant dateSep 18, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.

First claim

Opening claim text (preview).

I claim: 1. A method comprising: selecting candidate speech units for converting text to speech; ordering the candidate speech units according to a respective fundamental frequency of each candidate speech unit in the candidate speech units relative to all other fundamental frequencies in the candidate speech units, to yield a linear list of ordered candidate speech units; constructing a sublist of the ordered candidate speech units, wherein a respective fundamental frequency of each candidate speech unit in the sublist is within a threshold distance of a respective proximate fundamental frequency associated with at least one candidate speech unit in a next linear list of ordered candidate speech units; concatenating a proposed speech unit in the sublist with a chosen speech unit outside of the candidate speech units, to yield a concatenated speech unit; and synthesizing the speech using the concatenated speech unit. 2. The method of claim 1 , wherein the respective fundamental frequency of each candidate speech unit comprises a leading edge frequency of the each candidate speech unit that is within the threshold distance of a trailing edge frequency of the proximate speech unit. 3. The method of claim 1 , wherein the respective fundamental frequency of each candidate speech unit comprises a trailing edge frequency of the each candidate speech unit that is within the threshold distance of a leading edge frequency of the proximate speech unit. 4. The method of claim 1 , further comprising adjusting the threshold distance based on a number of candidate speech units selected. 5. The method of claim 4 , wherein the threshold distance is decreased when more candidate speech units are selected and increases when fewer candidate speech units are selected. 6. The method of claim 1 , further comprising assigning a pitch to units which do not have an assigned pitch. 7. The method of claim 1 , wherein respective fundamental frequency is a dominant one of multiple factors by which the ordered candidate speech units are ordered. 8. A system comprising: a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising: selecting candidate speech units for converting text to speech; ordering the candidate speech units according to a respective fundamental frequency of each candidate speech unit in the candidate speech units relative to all other fundamental frequencies in the candidate speech units, to yield a linear list of ordered candidate speech units; constructing a sublist of the ordered candidate speech units, wherein a respective fundamental frequency of each candidate speech unit in the sublist is within a threshold distance of a respective proximate fundamental frequency associated with at least one candidate speech unit in a next linear list of ordered candidate speech units; concatenating a proposed speech unit in the sublist with a chosen speech unit outside of the candidate speech units, to yield a concatenated speech unit; and synthesizing the speech using the concatenated speech unit. 9. The system of claim 8 , wherein the respective fundamental frequency of each candidate speech unit comprises a leading edge frequency of the each candidate speech unit that is within the threshold distance of a trailing edge frequency of the proximate speech unit. 10. The system of claim 8 , wherein the respective fundamental frequency of each candidate speech unit comprises a trailing edge frequency of the each candidate speech unit that is within the threshold distance of a leading edge frequency of the proximate speech unit. 11. The system of claim 8 , the computer-readable storage medium having additional instructions stored which result in operations comprising adjusting the threshold distance based on a number of candidate speech units selected. 12. The system of claim 11 , wherein the threshold distance is decreased when more candidate speech units are selected and increases when fewer candidate speech units are selected. 13. The system of claim 8 , the computer-readable storage medium having additional instructions stored which result in operations comprising assigning a pitch to units which do not have an assigned pitch. 14. The system of claim 8 , wherein respective fundamental frequency is a dominant one of multiple factors by which the ordered candidate speech units are ordered. 15. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising: selecting candidate speech units for converting text to speech; ordering the candidate speech units according to a respective fundamental frequency of each candidate speech unit in the candidate speech units relative to all other fundamental frequencies in the candidate speech units, to yield a linear list of ordered candidate speech units; constructing a sublist of the ordered candidate speech units, wherein a respective fundamental frequency of each candidate speech unit in the sublist is within a threshold distance of a respective proximate fundamental frequency associated with at least one candidate speech unit in a next linear list of ordered candidate speech units; concatenating a proposed speech unit in the sublist with a chosen speech unit outside of the candidate speech units, to yield a concatenated speech unit; and synthesizing the speech using the concatenated speech unit. 16. The computer-readable storage device of claim 15 , wherein the respective fundamental frequency of each candidate speech unit comprises a leading edge frequency of the each candidate speech unit that is within the threshold distance of a trailing edge frequency of the proximate speech unit. 17. The computer-readable storage device of claim 15 , wherein the respective fundamental frequency of each candidate speech unit comprises a trailing edge frequency of the each candidate speech unit that is within the threshold distance of a leading edge frequency of the proximate speech unit. 18. The computer-readable storage device of claim 15 , having additional instructions stored which result in operations comprising adjusting the threshold distance based on a number of candidate speech units selected. 19. The computer-readable storage device of claim 18 , wherein the threshold distance is decreased when more candidate speech units are selected and increases when fewer candidate speech units are selected. 20. The computer-readable storage device of claim 15 , having additional instructions stored which result in operations comprising assigning a pitch to units which do not have an assigned pitch.

Assignees

Inventors

Classifications

  • G10L13/02Primary

    Methods for producing synthetic speech; Speech synthesisers · CPC title

  • Details of speech synthesis systems, e.g. synthesiser structure or memory management · CPC title

  • Elementary speech units used in speech synthesisers; Concatenation rules · CPC title

  • G10L13/07Primary

    Concatenation rules · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10079011B2 cover?
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysi…
Who is the assignee on this patent?
Nuance Communications Inc
What technology area does this patent fall under?
Primary CPC classification G10L13/02. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 18 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).