Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G10L15/187. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Pronunciation learning through correction logs

US9589562B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9589562-B2
Application number	US-201414186476-A
Country	US
Kind code	B2
Filing date	Feb 21, 2014
Priority date	Feb 21, 2014
Publication date	Mar 7, 2017
Grant date	Mar 7, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A new pronunciation learning system for dynamically learning new pronunciations assisted by user correction logs. The user correction logs provide a record of speech recognition events and subsequent user behavior that implicitly confirms or rejects the recognition result and/or shows the user's intended words by via subsequent input. The system analyzes the correction logs and distills them down to a set of words which lack acceptable pronunciations. Hypothetical pronunciations, constrained by spelling and other linguistic knowledge, are generated for each of the words. Offline recognition determines the hypothetical pronunciations with a good acoustical match to the audio data likely to contain the words. The matching pronunciations are aggregated and adjudicated to select new pronunciations for the words to improve general or personalized recognition models.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for dynamically learning new pronunciations for speech recognition assisted by subsequent user inputs, the method comprising: determining that a task initiated by a spoken utterance was not completed successfully; determining that a recognition result for the spoken utterance includes a misrecognized word based on the determination that the task initiated by the recognition result was not completed successfully; determining that a subsequent task initiated by subsequent user inputs was completed successfully; associating the spoken utterance with the subsequent user inputs based on similarity between the spoken utterance and the subsequent user inputs; after associating the spoken utterance with the subsequent user inputs based on similarity between the spoken utterance and the subsequent user inputs, generating hypothetical pronunciations for the misrecognized word in the spoken utterance based on a predicted intended word derived from the associated subsequent user inputs; recognizing the spoken utterance using a language model containing the hypothetical pronunciations to find matching hypothetical pronunciations; and accepting a new pronunciation for the predicted intended word from the matching hypothetical pronunciations. 2. The method of claim 1 further comprising the act of determining that a spoken utterance was misrecognized based on a user confirmation. 3. The method of claim 1 further comprising the act of determining that subsequent user inputs correspond to spoken utterances. 4. The method of claim 3 wherein the act of determining that subsequent user inputs correspond to spoken utterances containing the misrecognized word further comprises the acts of: determining that tasks initiated by spoken utterances were not completed successfully; and determining that tasks initiated by subsequent user inputs were completed successfully; and pairing tasks initiated by spoken utterances that were not completed successfully with tasks initiated by subsequent user inputs that were completed successfully into successive input pairs. 5. The method of claim 4 wherein the act of determining that subsequent user inputs correspond to spoken utterances containing the misrecognized word further comprises the acts of determining that successive input pairs correspond to spoken utterances containing the misrecognized word and subsequent user inputs correcting the misrecognized word using at least one of session features, lexical features, and phonetic features. 6. The method of claim 4 wherein the act of determining that subsequent user inputs correspond to spoken utterances containing the misrecognized word further comprises the acts of determining that subsequent user inputs are closely related in time to spoken utterances containing the misrecognized word. 7. The method of claim 4 wherein the act of determining that subsequent user inputs correspond to spoken utterances containing the misrecognized word further comprises the acts of determining that subsequent user inputs are lexically similar to spoken utterances containing the misrecognized word. 8. The method of claim 4 wherein the act of determining that subsequent user inputs correspond to spoken utterances containing the misrecognized word further comprises the acts of determining that subsequent user inputs are phonetically similar to spoken utterances containing the misrecognized word. 9. The method of claim 3 wherein the act of accepting a new pronunciation for the predicted intended word from the matching hypothetical pronunciations further comprises the act of accepting the most frequent matching hypothetical pronunciation as the new pronunciation for the predicted intended word. 10. The method of claim 9 wherein the act of selecting a new pronunciation for the predicted intended word from the matching hypothetical pronunciations further comprises the acts of: determining how often phonemes occur in the matching hypothetical pronunciations for each grapheme; and accepting the most frequent matching hypothetical pronunciation as the new pronunciation if the most frequent phonemes for each common phone occur in the most frequent matching hypothetical pronunciation. 11. The method of claim 1 wherein the act of generating hypothetical pronunciations for misrecognized word in spoken utterances based on a predicted intended word from corresponding subsequent user inputs further comprises the acts of selecting possible phonemes corresponding to graphemes in the predicted intended word according to linguistic knowledge. 12. The method of claim 1 further comprising the act of collecting audio of recognized spoken utterances, subsequent user inputs following spoken utterances, and information indicating whether tasks initiated by the spoken utterances and subsequent user inputs were successful. 13. A speech recognition system for assisted dynamic learning of new pronunciations for user input, the speech recognition system comprising: at least one processor; a memory connected to the at least one processor; and a recognition event store encoded on the memory storing recognition event data for tasks initiated by spoken utterances, the recognition event data including audio data of the spoken utterances, recognition results obtained by decoding the spoken utterances, subsequent user inputs, and indicators of whether outcomes of tasks initiated based on the recognition results and the subsequent user inputs were accepted or rejected by users; wherein the memory contains computer executable instructions for: an event classifier configured to classify recognition results as misrecognized spoken utterances based on determining that tasks initiated by the recognition result were not completed successfully based on an indication that outcomes of the tasks were not accepted, determining that subsequent tasks initiated based on subsequent user inputs were completed successfully based on an indication that outcomes of the subsequent tasks being accepted, and determining that a subsequent user input and recognition result pair from a single source have significant similarity, and configured to identify misrecognized portions of the recognition results based on the subsequent user inputs; a pronunciation generator configured to generate hypothetical pronunciations for the identified misrecognized portions using the corresponding portions of the subsequent user inputs after the event classifier has classified recognition results as misrecognized spoken utterances based on determining that tasks initiated by the recognition result were not completed successfully based on an indication that outcomes of the tasks were not accepted, determining that subsequent tasks initiate based on subsequent user inputs were completed successfully based on an indication that outcomes of the subsequent tasks being accepted, and determining that a subsequent user input and recognition result pair from a single source have significant similarity; a speech recognizer configured to match hypothetical pronunciations with the audio data of spoken utterances that produced recognition results classified as misrecognized spoken utterances; and an aggregation adjudicator configured to select new pronunciations for the misrecognized words from the matching pronunciations. 14. The speech recognition system of claim 13 wherein the aggregation adjudicator is further configured to aggregate the matching pronunciations and select a new pronunciation based on the occurrence frequency of the matching pronunciations. 15. The speech recognition system of claim 14 wh

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G10L15/01
Assessment or evaluation of speech recognition systems · CPC title
G10L2015/025
Phonemes, fenemes or fenones being the recognition units · CPC title
G10L15/075
supervised, i.e. under machine guidance · CPC title
G10L15/187Primary
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
G10L2015/225
Feedback of the input speech · CPC title

Patent family

Related publications grouped by family.

View patent family 53882819

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9589562B2 cover?: A new pronunciation learning system for dynamically learning new pronunciations assisted by user correction logs. The user correction logs provide a record of speech recognition events and subsequent user behavior that implicitly confirms or rejects the recognition result and/or shows the user's intended words by via subsequent input. The system analyzes the correction logs and distills them do…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/187. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).