Data processing method, and storage medium and electronic device thereof
US-2024339107-A1 · Oct 10, 2024 · US
US9293129B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9293129-B2 |
| Application number | US-201313785573-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 5, 2013 |
| Priority date | Mar 5, 2013 |
| Publication date | Mar 22, 2016 |
| Grant date | Mar 22, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Pronunciation issues for synthesized speech are automatically detected using human recordings as a reference within a Speech Recognition Assisted Evaluation (SRAE) framework including a Text-To-Speech flow and a Speech Recognition (SR) flow. A pronunciation issue detector evaluates results obtained at multiple levels of the TTS flow and the SR flow (e.g. phone, word, and signal level) by using the corresponding human recordings as the reference for the synthesized speech, and outputs possible pronunciation issues. A signal level may be used to determine similarities/differences between the recordings and the TTS output. A model level checker may provide results to the pronunciation issue detector to check the similarities of the TTS and the SR phone set including mapping relations. Results from a comparison of the SR output and the recordings may also be evaluation by the pronunciation issue detector. The pronunciation issue detector outputs a list that lists potential pronunciation issue candidates.
Opening claim text (preview).
What is claimed is: 1. A method for determining pronunciation issues, comprising: receiving text comprising sentences for a Text-To-Speech (TTS) component and a recording of the text that is used as a reference for the text; receiving synthesized speech generated by the TTS component using the text as input to the TTS component; evaluating results received by an evaluation performed at a text level by determining a similarity of the synthesized speech to the recording, wherein the evaluation at the text level comprises performing a similarity measurement of a phone sequence of a sentence in the text and a corresponding phone sequence of a sentence in the recording; evaluating results obtained from a Speech Recognition (SR) component related to different inputs to the SR component comprising the synthesized speech and the recording; and generating a list that includes a ranking of pronunciation issue candidates based on the evaluations. 2. The method of claim 1 , further comprising evaluating results from a signal level evaluation of phone sequences of the text using a phone sequence determined from the TTS component and an SR phone sequence of the recording. 3. The method of claim 1 , wherein the evaluation at the text level further comprises performing evaluations for a word sequence and a phone sequence of each sentence within the text. 4. The method of claim 1 , further comprising performing a model level check for an acoustic model that determines a similarity of a TTS phone set and an SR phone set including determining a mapping relation between the TTS acoustic model and the SR acoustic model. 5. The method of claim 1 , wherein the evaluation performed at the text level comprises determining a similarity using an equation as defined by: s = 1 - C Sub + C Ins C Corr + C Sub + C Del where s is a similarity score; C Corr , C Sub , C Ins and C Del denote counts of correct components, substitution errors, insertion errors, and deletion errors in a sentence. 6. The method of claim 1 , wherein generating the list that includes the ranking of pronunciation issue candidates comprises filtering out mismatched words for judgment labels based on at least one of the evaluations using the synthesized speech and the recording. 7. The method of claim 1 , wherein the results received by the evaluation performed at the text level and the results obtained from the SR component are received by a pronunciation issue detector that is configured to perform the evaluations and to generate the list. 8. A tangible computer-readable storage device storing computer-executable instructions for determining pronunciation issues, comprising: receiving text comprising sentences for a Text-To-Speech (TTS) component and a recording of the text that is used as a reference for the text; receiving synthesized speech generated by the TTS component using the text as input to the TTS component; evaluating results received by an evaluation performed at a text level by determining a similarity of the synthesized speech to the recording; evaluating results obtained from a Speech Recognition (SR) component related to different inputs to the SR component comprising the synthesized speech and the recording; evaluating results from a signal level evaluation of the text and the recording; and generating a list that includes a ranking of pronunciation issue candidates based on the evaluations. 9. The tangible computer-readable storage device of claim 8 , wherein the signal level evaluation of the text comprises evaluating a similarity of the recording of phone sequences of the text using a phone sequence determined from the TTS component and an SR phone sequence of the recording. 10. The tangible computer-readable storage device of claim 8 , wherein the evaluation at the text level comprises performing a similarity measurement of a phone sequence of each sentence in the text and a corresponding phone sequence of each sentence in the recording. 11. The tangible computer-readable storage device of claim 8 , further comprising performing a model level check for an acoustic model that determines a similarity of a TTS phone set and an SR phone set including determining a mapping relation between the TTS acoustic model and the SR acoustic model. 12. The tangible computer-readable storage device of claim 8 , wherein the evaluation performed at the text level comprises determining a similarity using an equation as defined by: s = 1 - C Sub + C Ins C Corr + C Sub + C Del where s is a similarity score; C Corr , C Sub , C Ins and C Del denote counts of correct components, substitution errors, insertion errors, and deletion errors in a sentence. 13. The tangible computer-readable storage device of claim 8 , wherein generating the list that includes the ranking of pronunciation issue candidates comprises filtering out mismatched words for judgment labels based on at least one of the evaluations using the synthesized speech and the recording. 14. A system for determining pronunciation issues, comprising: a processor and memory; an operating environment executing using the processor; text comprising sentences and a recording that corresponds to the text; a Text-To-Speech (TTS) component configured to generate synthesized speech using the text; a Speech Recognition (SR) component configured to recognize speech; and a pronunciation issue detector that is configured to perform actions comprising: receiving the synthesized speech generated by the TTS component; evaluating results received by an evaluation performed at a text level by determining a similarity of the synthesized speech to the recording; evaluating results obtained from the SR component related to different inputs to the SR component comprising the synthesized speech and the recording; evaluating results from a signal level evaluation of the text and the recording; and generating a list that includes a ranking of pronunciation issue candidates based on the evaluations.
Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title
Detection of language · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.