Data processing method, and storage medium and electronic device thereof
US-2024339107-A1 · Oct 10, 2024 · US
US10079011B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10079011-B2 |
| Application number | US-201414282040-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 20, 2014 |
| Priority date | Jun 18, 2010 |
| Publication date | Sep 18, 2018 |
| Grant date | Sep 18, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.
Opening claim text (preview).
I claim: 1. A method comprising: selecting candidate speech units for converting text to speech; ordering the candidate speech units according to a respective fundamental frequency of each candidate speech unit in the candidate speech units relative to all other fundamental frequencies in the candidate speech units, to yield a linear list of ordered candidate speech units; constructing a sublist of the ordered candidate speech units, wherein a respective fundamental frequency of each candidate speech unit in the sublist is within a threshold distance of a respective proximate fundamental frequency associated with at least one candidate speech unit in a next linear list of ordered candidate speech units; concatenating a proposed speech unit in the sublist with a chosen speech unit outside of the candidate speech units, to yield a concatenated speech unit; and synthesizing the speech using the concatenated speech unit. 2. The method of claim 1 , wherein the respective fundamental frequency of each candidate speech unit comprises a leading edge frequency of the each candidate speech unit that is within the threshold distance of a trailing edge frequency of the proximate speech unit. 3. The method of claim 1 , wherein the respective fundamental frequency of each candidate speech unit comprises a trailing edge frequency of the each candidate speech unit that is within the threshold distance of a leading edge frequency of the proximate speech unit. 4. The method of claim 1 , further comprising adjusting the threshold distance based on a number of candidate speech units selected. 5. The method of claim 4 , wherein the threshold distance is decreased when more candidate speech units are selected and increases when fewer candidate speech units are selected. 6. The method of claim 1 , further comprising assigning a pitch to units which do not have an assigned pitch. 7. The method of claim 1 , wherein respective fundamental frequency is a dominant one of multiple factors by which the ordered candidate speech units are ordered. 8. A system comprising: a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising: selecting candidate speech units for converting text to speech; ordering the candidate speech units according to a respective fundamental frequency of each candidate speech unit in the candidate speech units relative to all other fundamental frequencies in the candidate speech units, to yield a linear list of ordered candidate speech units; constructing a sublist of the ordered candidate speech units, wherein a respective fundamental frequency of each candidate speech unit in the sublist is within a threshold distance of a respective proximate fundamental frequency associated with at least one candidate speech unit in a next linear list of ordered candidate speech units; concatenating a proposed speech unit in the sublist with a chosen speech unit outside of the candidate speech units, to yield a concatenated speech unit; and synthesizing the speech using the concatenated speech unit. 9. The system of claim 8 , wherein the respective fundamental frequency of each candidate speech unit comprises a leading edge frequency of the each candidate speech unit that is within the threshold distance of a trailing edge frequency of the proximate speech unit. 10. The system of claim 8 , wherein the respective fundamental frequency of each candidate speech unit comprises a trailing edge frequency of the each candidate speech unit that is within the threshold distance of a leading edge frequency of the proximate speech unit. 11. The system of claim 8 , the computer-readable storage medium having additional instructions stored which result in operations comprising adjusting the threshold distance based on a number of candidate speech units selected. 12. The system of claim 11 , wherein the threshold distance is decreased when more candidate speech units are selected and increases when fewer candidate speech units are selected. 13. The system of claim 8 , the computer-readable storage medium having additional instructions stored which result in operations comprising assigning a pitch to units which do not have an assigned pitch. 14. The system of claim 8 , wherein respective fundamental frequency is a dominant one of multiple factors by which the ordered candidate speech units are ordered. 15. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising: selecting candidate speech units for converting text to speech; ordering the candidate speech units according to a respective fundamental frequency of each candidate speech unit in the candidate speech units relative to all other fundamental frequencies in the candidate speech units, to yield a linear list of ordered candidate speech units; constructing a sublist of the ordered candidate speech units, wherein a respective fundamental frequency of each candidate speech unit in the sublist is within a threshold distance of a respective proximate fundamental frequency associated with at least one candidate speech unit in a next linear list of ordered candidate speech units; concatenating a proposed speech unit in the sublist with a chosen speech unit outside of the candidate speech units, to yield a concatenated speech unit; and synthesizing the speech using the concatenated speech unit. 16. The computer-readable storage device of claim 15 , wherein the respective fundamental frequency of each candidate speech unit comprises a leading edge frequency of the each candidate speech unit that is within the threshold distance of a trailing edge frequency of the proximate speech unit. 17. The computer-readable storage device of claim 15 , wherein the respective fundamental frequency of each candidate speech unit comprises a trailing edge frequency of the each candidate speech unit that is within the threshold distance of a leading edge frequency of the proximate speech unit. 18. The computer-readable storage device of claim 15 , having additional instructions stored which result in operations comprising adjusting the threshold distance based on a number of candidate speech units selected. 19. The computer-readable storage device of claim 18 , wherein the threshold distance is decreased when more candidate speech units are selected and increases when fewer candidate speech units are selected. 20. The computer-readable storage device of claim 15 , having additional instructions stored which result in operations comprising assigning a pitch to units which do not have an assigned pitch.
Methods for producing synthetic speech; Speech synthesisers · CPC title
Details of speech synthesis systems, e.g. synthesiser structure or memory management · CPC title
Elementary speech units used in speech synthesisers; Concatenation rules · CPC title
Concatenation rules · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.