Enhanced maximum entropy models

US9412365B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9412365-B2
Application numberUS-201514667518-A
CountryUS
Kind codeB2
Filing dateMar 24, 2015
Priority dateMar 24, 2014
Publication dateAug 9, 2016
Grant dateAug 9, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to enhanced maximum entropy models. In some implementations, data indicating a candidate transcription for an utterance and a particular context for the utterance are received. A maximum entropy language model is obtained. Feature values are determined for n-gram features and backoff features of the maximum entropy language model. The feature values are input to the maximum entropy language model, and an output is received from the maximum entropy language model. A transcription for the utterance is selected from among a plurality of candidate transcriptions based on the output from the maximum entropy language model. The selected transcription is provided to a client device.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by one or more computers, the method comprising: receiving, by the one or more computers, data indicating a candidate transcription for an utterance and a particular context for the utterance; obtaining, by the one or more computers, a maximum entropy language model that includes (i) scores for one or more n-gram features that each correspond to a respective n-gram and (ii) scores for one or more backoff features that each correspond to a set of n-grams for which there are no corresponding n-gram features in the maximum entropy language model; determining, by the one or more computers, based on the candidate transcription and the particular context, a feature value for (i) each of the one or more n-gram features of the maximum entropy language model and (ii) each of the one or more backoff features of the maximum entropy language model; inputting, by the one or more computers, the feature values for the n-gram features and the feature values for the backoff features to the maximum entropy language model; and receiving, by the one or more computers, from the maximum entropy language model, an output indicative of a likelihood of occurrence of the candidate transcription; selecting, by the one or more computers, based on the output of the maximum entropy language model, a transcription for the utterance from among a plurality of candidate transcriptions; and providing, by the one or more computers, the selected transcription to a client device. 2. The method of claim 1 , wherein the maximum entropy language model is a log-linear model. 3. The method of claim 1 , wherein the scores of the maximum entropy language model comprise, for each context in a set of different contexts that each comprise a different sequence of one or more words, scores for: multiple different n-gram features that each correspond to the occurrence of a respective word with the context, and a backoff feature that corresponds to the occurrence of any of multiple words with the context. 4. The method of claim 1 , wherein the scores of the maximum entropy language model comprise, for each context in a set of different contexts that each comprise a different sequence of one or more words, scores for: multiple different n-gram features that each correspond to the occurrence of a respective word with the context, the respective words forming a set of words, and a backoff feature that corresponds to the occurrence, with the context, of any word that is not in the set of words. 5. The method of claim 4 , wherein the multiple different n-gram features for a context each correspond to the occurrence of a respective word after the one or more words of the context, and the backoff feature for the context corresponds to the occurrence of any of multiple words after the one or more words of the context. 6. The method of claim 1 , wherein the scores of the maximum entropy language model comprise, for each context in a set of different contexts that each comprise a different sequence of one or more words, respective scores for: multiple different n-gram features that each correspond to a respective language sequence that includes the one or more words of the context, each of the respective language sequences being formed of a same, particular number of words, and one or more backoff features that each correspond to a set of multiple language sequences, wherein, for each backoff feature, each language sequence in the set of language sequences (i) is formed of the particular number of words, (ii) comprises a particular subset of the one or more words of the context, and (iii) omits a particular sub-sequence of words within the context. 7. The method of claim 6 , wherein the one or more backoff features for a context comprise multiple backoff features that correspond to different sets of language sequences that include different portions of the context. 8. The method of claim 1 , wherein generating the feature values for the one or more backoff features based on the particular context comprises generating at least one feature value that indicates that at least a portion of the particular context does not correspond to any of the n-gram features. 9. The method of claim 1 , wherein determining the feature values comprises: identifying a particular n-gram that includes words of the candidate transcription immediately following a sequence of words indicated by the particular context, the n-gram of words being formed of a particular number of words; determining that that the maximum entropy language model does not have an n-gram feature corresponding to the particular n-gram; in response to determining that the maximum entropy language model does not have an n-gram feature corresponding to the particular n-gram: determining first feature values that indicate the non-occurrence of n-grams having the particular number of words that have corresponding n-gram features in the maximum entropy language model; determining, for at least one backoff feature, a second feature value that indicates the occurrence of a language sequence that (i) has the particular number of words and (ii) includes a specific portion of the particular context. 10. The method of claim 9 , wherein determining the feature values comprises determining multiple second feature values that each indicate the occurrence of an n-gram having the particular number of words, the multiple second feature values corresponding to different backoff features and indicating that different portions of the particular context occurred within an n-gram having the particular number of words. 11. The method of claim 1 , wherein the feature values for the n-gram features indicate whether a particular n-gram comprising the candidate transcription and the particular context matches n-grams associated with the respective n-gram features, and wherein the feature values for the backoff features indicate whether the portions of the particular n-gram are different from the n-grams associated with the respective n-gram features. 12. The method of claim 1 , wherein at least one of the feature values for the backoff features indicates that a portion of the particular context having particular size is different from the each of the contexts of the particular size that are associated with the n-gram features. 13. The method of claim 1 , wherein each of the feature values is a binary value. 14. The method of claim 1 , wherein scores for at least some of the backoff features of the maximum entropy language model indicate a non-zero probability of occurrence of sets of n-grams represented by the backoff features, and wherein the maximum entropy language model is normalized. 15. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving, by the one or more computers, data indicating a candidate transcription for an utterance and a particular context for the utterance; obtaining, by the one or more computers, a maximum entropy language model that includes (i) scores for one or more n-gram features that each correspond to a respective n-gram and (ii) scores for one or more backoff features that each correspond to a set of n-grams for which there are no corresponding n-gram features in the maximum entropy language model; determining, by the one or more computers, based on the candidate transcription and the particular context, a feature value for (i) each of the one or more n-gram features of the maximum ent

Assignees

Inventors

Classifications

  • G10L15/197Primary

    Probabilistic grammars, e.g. word n-grams · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9412365B2 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to enhanced maximum entropy models. In some implementations, data indicating a candidate transcription for an utterance and a particular context for the utterance are received. A maximum entropy language model is obtained. Feature values are determined for n-gram features and backoff feat…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/197. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 09 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).