What technology area does this patent fall under?

Primary CPC classification G10L15/26. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Mar 01 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method for transcription of spoken words using multilingual mismatched crowd

US2018061417A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2018061417-A1
Application number	US-201715476505-A
Country	US
Kind code	A1
Filing date	Mar 31, 2017
Priority date	Aug 30, 2016
Publication date	Mar 1, 2018
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosure generally relates to transcription of spoken words, and more particularly to a system and method for transcription of spoken words using multilingual mismatched words. The process comprises collection of multi-scripted noisy transcriptions of the spoken word obtained from workers of the multilingual mismatched crowd. The collected words are mapped to a phoneme sequence in the source language using script specific graphemes to phoneme model. Further, it builds a multi-scripted transcription script specific, worker specific and a global insertion-deletion-substitution (IDS) channel. Furthermore, the disclosure also determines reputation of workers to allocate the transcription task. Determination of reputation is based on word belief. The word belief is determined by taking ratio of likelihood probability of mapped phoneme sequence of transcriptions given the current estimate of word to the sum of likelihood probabilities of mapped phoneme sequences of the transcriptions given the phoneme sequence of each dictionary word.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer implemented method for transcribing one or more spoken word utterances of a source language using a multilingual mismatched crowd, the method comprises: collecting, at word transcription table, a plurality of multi-scripted noisy transcriptions of the spoken word obtained from plurality of workers of the multilingual mismatched crowd; mapping each of the collected plurality of multi-scripted transcriptions to a phoneme sequence in the source language using script specific graphemes to phoneme model; building worker specific insertion-deletion-substitution (IDS) channel model, multi-scripted transcription script specific IDS channel model and a global IDS channel model from the multi-scripted transcriptions; filtering out a set of workers of the plurality of workers based on the reputation of the workers, estimated by simulating IDS channel for worker specific on the dictionary words using worker reputation module; allocating the transcription tasks to the set of workers such that the required number of transcriptions per word are minimized; and decoding, at a transcription decoding module, the plurality of multi-scripted transcriptions are combined to decode the transcription in source script, wherein the decoding comprises steps of: finding likelihood probability of the mapped phoneme sequences of the multi-scripted mismatched crowd transcriptions with each of the predefined dictionary word's phoneme sequence using insertion-deletion-substitution channel parameters and voting the dictionary word that maximizes above likelihood; and determining word belief by taking ratio of likelihood probability of mapped phoneme sequences of transcriptions given the current estimate of word to the sum of likelihood probabilities of mapped phoneme sequences of the transcriptions given the phoneme sequence of each dictionary word. 2 . The method of claim 1 , wherein the building of the worker specific IDS channel model using the gold standard transcription test. 3 . The method of claim 1 , wherein the multilingual mismatched crowd includes the plurality of workers who are having different script for the same word, unfamiliar to the source language and transcribing the given words of the source language in their own language. 4 . The method of claim 1 , wherein the multi-scripted transcriptions of words, whose ground truth phoneme sequences are known in source language, are used to train worker script's grapheme to source language's phoneme mapping models using expectation maximization algorithm. 5 . The method of claim 4 , further wherein with the help of ground truth phoneme sequences, the worker specific, the transcription script specific and the global IDS channel models are trained using expectation maximization algorithm. 6 . The method of claim 1 , wherein the multi-script transcriptions of words whose ground truths are not known are first mapped to phoneme sequence using G2P model, and with the help of phoneme sequences of dictionary words the worker specific, the transcription script specific and the global IDS channel models are trained in iterative fashion using expectation maximization algorithm. 7 . The method of claim 1 , wherein the estimation of worker reputation is based on transcribed words whose ground truth is known and task allocation follows the estimated worker reputation. 8 . The method of claim 1 , wherein the task allocation utilizes the bipartite matching algorithm to allocate the tasks to worker such that average word belief is maximized. 9 . The method of claim 1 , wherein the likelihood probability of a mapped phoneme sequence of the plurality of worker's transcription of the spoken word given a phoneme sequence of dictionary word is obtained by aligning both phoneme sequences using the linearly combined IDS channel parameters of worker, his/her script and global. 10 . The method of claim 1 , wherein the likelihood probabilities of each mapped phoneme sequences of multi-scripted transcriptions of a word with a predefined dictionary word's phoneme sequences are obtained and multiplied so as to obtain the likelihood probability. Further wherein, the predefined dictionary word that provides maximum likelihood probability is considered as decoded word. 11 . A system for transcribing one or more spoken word utterances of a source language using a multilingual mismatched crowd, the system comprising: a processor; a memory communicatively coupled to the processor and the memory contains instructions that are readable by the processor; a database parted within the memory, wherein the database comprises an audio chunk table and a word transcription table; a plurality of typing interfaces are configured according to script preference of each of the plurality of workers of mismatched crowd; a reputation module is configured to compute the worker reputation and filter out the spammer from the plurality of workers; a task allocation module is configured to compute word beliefs and to allocate the transcription tasks to the plurality of workers, wherein the reputation of the plurality of workers is estimated by simulating the worker specific IDS channel on dictionary words; and a transcription decoding module is configured to generate transcription in the source language from multi-transcriptions of the plurality of workers. 12 . The system of claim 11 , wherein the audio chunk table is configured to store one or more information of the plurality of workers, one or more spoken word segments of the each of the plurality of workers, number of responses given by each of the plurality of workers and transcription score of each of the plurality of workers; 13 . The system of claim 11 , wherein the word transcription table is configured to store transcription responses of the spoken word segments presented to the plurality of workers, the audio chunk id, the each of the plurality of workers id, and the workers transcription text; 14 . The system of claim 11 , wherein the transcription decoding module is configured to invoke a grapheme to phoneme model for mapping the received multi-transcriptions with the phoneme sequence of the source language. 15 . The system of claim 11 , wherein the transcription decoding module is configured to compute likelihood by aligning phoneme sequence of predefined dictionary with the phoneme sequence of multi-transcriptions of the plurality of workers. 16 . The system of claim 11 , wherein the transcription decoding module is configured to invoke insertion-deletion-substitution channel model for decoding the word transcription in the source language from multi-scripted transcriptions. 17 . The system of claim 11 , wherein the task allocation module is to decide the word that needs more transcriptions from its belief probability. 18 . A non-transitory computer readable medium embodying a program executable in a computing device for transcribing one or more spoken word utterances of a source language using a multilingual mismatched crowd, the program comprising: a program code for collecting, at word transcription table, a plurality of multi-scripted noisy transcriptions of the spoken word obtained from plurality of workers of the multilingual mismatched crowd; mapping each of the collected plurality of multi-scripted transcriptions to a phoneme sequence in the source language using script specific graphemes to phoneme model; building worker specific insertion-deletion-substitution (IDS) channel model, multi-scripted transcription script specific IDS channel

Assignees

Tata Consultancy Services Ltd

Inventors

Classifications

G10L15/26Primary
Speech to text systems (G10L15/08 takes precedence) · CPC title
G10L15/005
Language recognition · CPC title
G10L15/063
Training · CPC title
G06F16/24552
Database cache management · CPC title
G06F3/0238
Programmable keyboards (key guide holders G06F3/0224) · CPC title

Patent family

Related publications grouped by family.

View patent family 61243271

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018061417A1 cover?: The disclosure generally relates to transcription of spoken words, and more particularly to a system and method for transcription of spoken words using multilingual mismatched words. The process comprises collection of multi-scripted noisy transcriptions of the spoken word obtained from workers of the multilingual mismatched crowd. The collected words are mapped to a phoneme sequence in the sou…
Who is the assignee on this patent?: Tata Consultancy Services Ltd
What technology area does this patent fall under?: Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Mar 01 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).