Processing of audio data

US2016133251A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016133251-A1
Application numberUS-201314890538-A
CountryUS
Kind codeA1
Filing dateMay 31, 2013
Priority dateMay 31, 2013
Publication dateMay 12, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Examples of processing audio data are described. In certain examples, a transcript language model is based on text data representative of a transcript associated with the audio data. The audio data is processed to determine at least a set of confidence values for language elements in a text output of the processing, wherein the processing uses the transcript language model. The set of confidence values enable a determination to be made. The determination relates to whether the text data is associated with said audio data based on said set of confidence values.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for processing audio data, comprising: generating a transcript language model based on text data representative of a transcript associated with said audio data; processing said audio data with a transcription engine to determine at least a set of confidence values for a plurality of language elements in a text output of the transcription engine, the transcription engine using said transcript language model; and determining whether the text data is associated with said audio data based on said set of confidence values. 2 . The method of claim 1 , wherein said audio data comprises a plurality of audio tracks for a media item, each audio track having an associated language and the method further comprises: accessing a plurality of transcripts, each transcript being associated with a particular language; wherein the step of generating a transcript language model comprises generating a transcript language model for each transcript in the plurality of transcripts; wherein the step of processing said audio data comprises processing at least one audio track with the transcription engine to determine confidence values associated with use of each transcription language model; and wherein the step of determining whether the text data is associated with at least a portion of said audio data comprises determining a match between at least one audio track and at least one transcript based on the determined confidence values. 3 . The method of claim 1 , wherein the step of processing said audio data comprises producing a text output with associated timing information and the method further comprises: responsive to a determination that the text data is associated with at least a portion of said audio data, reconciling the text output with the text data representative of said transcript so as to append the timing information to the transcript. 4 . The method of claim 1 , wherein processing said audio data comprises determining a matrix of confidence values. 5 . The method of claim 1 , wherein the transcript language model is a statistical N-gram model than is configured using said text data representative of said transcript. 6 . The method of claim 1 , wherein the transcription engine uses an acoustic model representative of phonemic sound patterns in a spoken language. 7 . The method of claim 6 , wherein the transcription language model embodies statistical data on at least occurrences of words within the spoken language and wherein the transcription engine uses a pronunciation dictionary to words to phonemic sound patterns. 8 . The method of claim 1 , further comprising, prior to generating a transcript language model: normalizing the text data representative of said transcript. 9 . The method of claim 1 , wherein said audio data forms part of a media broadcast and the transcript comprises closed-caption data for said media broadcast. 10 . A system processing media data, the media data comprising at least an audio portion, the system comprising: a first component to instruct configuration of a language model based on text data representative of audible language elements within said audio portion; and a second component to instruct conversion of the audio portion of the media data to a text equivalent based on said language model, said conversion outputting a set of confidence values for a plurality of language elements in the text equivalent, wherein the system determines whether the text data is associated with said audio data based on said set of confidence values. 11 . The system of claim 10 , further comprising: a third component to compare the text equivalent with the received text data so as to add said timing information to the received text data; and a fourth component to determine whether the text data is associated with at least a portion of said audio data based on said set of confidence values, wherein the third component is arranged to perform a comparison responsive to a positive determination from the fourth component. 12 . The system of claim 10 , comprising: a speech-to-text engine communicatively coupled to the second component to convert the audio portion of the media data to the text equivalent, the speech-to-text engine making use of the language model and a sound model, the sound model being representative of sound patterns in a spoken language and the language model being representative of word patterns in a written language. 13 . The system of claim 10 , further comprising: an interface to receive at least text data associated with the media data, wherein the interface is arranged to convert said received text data to a canonical form. 14 . The system of claim 10 , wherein: the media data comprises a plurality of audio portions, each audio portion being associated with a respective language; the text data comprises a plurality of text portions, each text portion being associated with a respective language; the first component instructs configuration of a plurality of language models, each language model being based on a respective text portion; the second component instructs conversion of at least one audio portion of the media data to a plurality of text equivalents, the conversion of a particular audio portion being repeated for each of the plurality of language models; and the system further comprises: a fourth component to receive probability variables for language elements within each text equivalent and to determine a language from the set of languages for a particular audio portion based on said probability variables. 15 . A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processors, cause the one or more processors to: generate a transcript language model based on text data representative of a transcript associated with said audio data; process said audio data with a transcription engine to determine at least a set of confidence values for a plurality of language elements in a text output of the transcription engine, the transcription engine using said transcript language model; and determine whether the text data is associated with said audio data based on said set of confidence values.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016133251A1 cover?
Examples of processing audio data are described. In certain examples, a transcript language model is based on text data representative of a transcript associated with the audio data. The audio data is processed to determine at least a set of confidence values for language elements in a text output of the processing, wherein the processing uses the transcript language model. The set of confidenc…
Who is the assignee on this patent?
Longsand Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/197. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 12 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).