System and method of automated evaluation of transcription quality

US10147418B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10147418-B2
Application numberUS-201715676306-A
CountryUS
Kind codeB2
Filing dateAug 14, 2017
Priority dateJul 30, 2013
Publication dateDec 4, 2018
Grant dateDec 4, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods automatedly evaluate a transcription quality. Audio data is obtained. The audio data is segmented into a plurality of utterances with a voice activity detector operating on a computer processor. The plurality of utterances are transcribed into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor. A minimum Bayes risk decoder is applied to the at least one word lattice to create at least one confusion network. At least conformity ratio is calculated from the at least one confusion network.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of automated evaluation of a transcription quality, the method comprising: obtaining audio data; segmenting the audio data into a plurality of utterances with a voice activity detector operating on a computer processor, wherein each of the plurality of utterances is separated by non-speech segments in the audio data; transcribing the plurality of utterances into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor; applying, by the processor, a minimum Bayes risk decoder to the at least one word lattice to create at least one confusion network representing the at least one word lattice as a plurality sequential word bins and ε-bins; and calculating, by the processor, at least one conformity ratio from the least one confusion network, wherein the at least one conformity ratio is an automated indication of transcription quality. 2. The method of claim 1 , wherein the audio data is streaming audio data. 3. The method of claim 1 , wherein calculating the at least one conformity ratio for the at least one confusion network further comprises: identifying a probability value of a most probable word arc in each word bin; and calculating a joint probability for each ε-bin and a preceding word bin; wherein the at least one conformity ratio is an average of the calculated joint probabilities for the at least one confusion network. 4. The method of claim 1 , further comprising calculating a transcription quality score from the at least one conformity ratio. 5. The method of claim 4 , wherein the transcription quality score is a normalized value of the at least one conformity ratio. 6. The method of claim 4 , further comprising producing an indication of the of the transcription quality score. 7. The method of claim 1 , wherein each of the plurality of utterances is transcribed into a word lattice. 8. The method of claim 7 , further comprising calculating an overall conformity ratio for a transcription of the audio data from the conformity ratios calculated from the confusion network of each of the utterances in the plurality of utterances. 9. The method of claim 7 , wherein transcribing the plurality of utterances comprises applying at least one transcription model to each of the plurality of utterances and wherein the at least one conformity ratio is indicative of a conformity between the audio data and the at least one transcription model. 10. The method of claim 9 , further comprising: selecting a new at least one transcription model based upon the at least one conformity ratio; and transcribing the plurality of utterances by applying the new at least one transcription model to each of the plurality of utterances. 11. A system for automated evaluation of transcription quality, the system comprising: an audio data source upon which a plurality of audio data files are stored; a processor that receives the plurality of audio data files, segments the audio data files into a plurality of utterances and applies at least one transcription model to the plurality of utterances to transcribe the plurality of utterances into at least one word lattice, wherein each of the plurality of utterances is separated by non-speech segments in the audio data; and a non-transient computer readable medium communicatively connected to the processor and programmed with computer readable code that when executed by the processor causes the processor to: apply a minimum Bayes risk decoder to the at least one word lattice to create at least one confusion network representing the at least one word lattice as a plurality of sequential word bins and ε-bins; and calculate at least one conformity ratio from the at least one confusion network, wherein the at least one conformity ratio is an automated indication of transcription quality. 12. The system of claim 11 , wherein the audio data source is a streaming audio data source. 13. The system of claim 11 , wherein the computer readable code to calculate the at least one conformity ratio for the at least one confusion network further comprises code that when executed by the processor causes the processor to: identify a probability value of a most probable word arc in each word bin; and calculate a joint probability for each ε-bin and a preceding word bin; wherein the at least one conformity ratio is an average of the calculated joint probabilities for the at least one confusion network. 14. The system of claim 11 , wherien the computer readable code further comprises code that when executed by the processor causes the processor to: calculate a transcription quality score from the at least one conformity ratio. 15. The system of claim 14 , wherein the transcription quality score is a normalized value of the at least one conformity ratio. 16. The system of claim 14 , wherein the computer readable code further comprises code that when executed by the processor causes the processor to: produce an indication of the of the transcription quality score. 17. The system of claim 11 , wherein each of the plurality of utterances is transcribed into a word lattice. 18. The system of claim 17 , wherein the computer readable code further comprises code that when executed by the processor causes the processor to: calculate an overall conformity ratio for a transcription of the audio data from the conformity ratios calculated from the confusion network of each of the utterances in the plurality of utterances. 19. The system of claim 17 , wherein the computer readable code further comprises code that when executed by the processor causes the processor to: apply at least one transcription model to each of the plurality of utterances and wherein the at least one conformity ratio is indicative of a conformity between the audio data and the at least one transcription model. 20. The system of claim 19 , wherein the computer readable code further comprises code that when executed by the processor causes the processor to: select a new at least one transcription model based upon the at least one conformity ratio; and transcribe the plurality of utterances by applying the new at least one transcription model to each of the plurality of utterances.

Assignees

Inventors

Classifications

  • G10L15/01Primary

    Assessment or evaluation of speech recognition systems · CPC title

  • Segmentation; Word boundary detection · CPC title

  • using dynamic programming techniques, e.g. dynamic time warping [DTW] · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10147418B2 cover?
Systems and methods automatedly evaluate a transcription quality. Audio data is obtained. The audio data is segmented into a plurality of utterances with a voice activity detector operating on a computer processor. The plurality of utterances are transcribed into at least one word lattice with a large vocabulary continuous speech recognition system operating on the processor. A minimum Bayes ri…
Who is the assignee on this patent?
Verint Systems Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/01. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 04 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).