Audio signal processing device and method for synchronizing speech and text by using machine learning model
US-2024321265-A1 · Sep 26, 2024 · US
US9583095B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9583095-B2 |
| Application number | US-201013383527-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 4, 2010 |
| Priority date | Jul 17, 2009 |
| Publication date | Feb 28, 2017 |
| Grant date | Feb 28, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A speech recognition unit ( 102 ) includes a phrase determination unit ( 103 ) which determines a phrase boundary based on the comparison between the hypothetical word group generated by speech recognition and set words representing phrase boundaries. In this speech processing device, the speech recognition unit ( 102 ) outputs recognition results for each phrase based on a phrase boundary determined by the phrase determination unit ( 103 ).
Opening claim text (preview).
The invention claimed is: 1. A speech processing device comprising: an analysis unit that is configured to output a feature amount by performing speech detection/analysis of input speech; and a speech recognition unit that is configured to output a recognition result by performing speech recognition based on the feature amount, wherein: said speech recognition unit comprises a phrase determination unit that is configured to determine a phrase boundary based on comparison between a hypothetical word group generated by the speech recognition and a word representing phrase boundary set in advance, said speech recognition unit is configured to output the recognition result for each phrase up to the phrase boundary determined by said phrase determination unit, said phrase determination unit is configured to stand by until an occupation ratio of a number of the words representing the phrase boundaries in the hypothetical word group generated by the speech recognition unit to a number of all the words of the hypothetical word group exceeds a set threshold, and said phrase determination unit is configured to determine the phrase boundary based on a likelihood of the word representing the phrase boundary in the hypothetical word group when the occupation ratio exceeds the set threshold. 2. A speech processing device according to claim 1 , wherein said phrase determination unit is configured to determine the phrase boundary when the word hypothesis representing the phrase boundary exhibits a maximum likelihood among overall word hypotheses and a likelihood difference from a word hypothesis exhibiting a second highest likelihood exceeds a set threshold. 3. A speech processing device according to claim 1 , wherein said phrase determination unit further comprises a section designation unit that is configured to designate section information of input speech, and said phrase determination unit is configured to temporarily change the threshold within a set section for each section set by said section designation unit. 4. A speech processing device according to claim 1 , wherein the word representing the phrase boundary comprises a word representing a phrase boundary appearing at a head or tail of a phrase. 5. A speech processing device according to claim 4 , wherein the word representing the phrase boundary comprises a preposition or a conjunction, and a position immediately before the word is a phrase boundary. 6. A speech processing method comprising: an analysis step of outputting a feature amount by performing speech detection/analysis of input speech; and a speech recognition step, executed by a speech processor, of outputting a recognition result by performing speech recognition based on the feature amount, wherein the speech recognition step comprises the phrase determination step of determining a phrase boundary based on comparison between a hypothetical word group generated by the speech recognition and a word representing phrase boundary set in advance, the recognition result being output for each phrase up to the phrase boundary determined in the phrase determination step, wherein said phrase determination step comprises standing by until an occupation ratio of a number of the words representing the phrase boundaries in the hypothetical word group generated in the speech recognition step to a number of all the words of the hypothetical word group exceeds a set threshold, and in the phrase determination step, the phrase boundary is determined based on a likelihood of the word representing the phrase boundary in the hypothetical word group when the occupation ratio exceeds the set threshold. 7. A non-transitory computer-readable storage medium storing a program for causing a computer to execute: an analysis step of outputting a feature amount by performing speech detection/analysis of input speech; and a speech recognition step of outputting a recognition result by performing speech recognition based on the feature amount, wherein the speech recognition step comprises the phrase determination step of determining a phrase boundary based on comparison between a hypothetical word group generated by the speech recognition and a word representing phrase boundary set in advance, wherein said phrase determination step comprises standing by until an occupation ratio of a number of the words representing the phrase boundaries in the hypothetical word group generated in the speech recognition step to a number of all the words of the hypothetical word group exceeds a set threshold, and wherein the speech recognition step outputs the recognition result for each phrase up to the phrase boundary determined by the phrase determination step, and in the phrase determination step, the phrase boundary is determined based on a likelihood of the word representing the phrase boundary in the hypothetical word group when the occupation ratio exceeds the set threshold. 8. A speech processing device comprising: analysis means for outputting a feature amount by performing speech detection/analysis of input speech; and speech recognition means for outputting a recognition result by performing speech recognition based on the feature amount, wherein: said speech recognition means comprises phrase determination means for determining a phrase boundary based on comparison between a hypothetical word group generated by the speech recognition and a word representing phrase boundary set in advance, said speech recognition means outputs the recognition result for each phrase up to the phrase boundary determined by said phrase determination means, said phrase determination means stands by until an occupation ratio of a number of the words representing the phrase boundaries in the hypothetical word group generated by the speech recognition means to all words of the hypothetical word group exceeds a set threshold, and said phrase determination means determines the phrase boundary based on a likelihood of the word representing the phrase boundary in the hypothetical word group when the occupation ratio exceeds the set threshold.
Related publications grouped by family.
Answers are generated from the same data shown on this page.