What technology area does this patent fall under?

Primary CPC classification G06F40/44. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 12 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Multi-domain machine translation model adaptation

Patent metadata
Field	Value
Publication number	US-9235567-B2
Application number	US-201313740508-A
Country	US
Kind code	B2
Filing date	Jan 14, 2013
Priority date	Jan 14, 2013
Publication date	Jan 12, 2016
Grant date	Jan 12, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method adapted to multiple corpora includes training a statistical machine translation model which outputs a score for a candidate translation, in a target language, of a text string in a source language. The training includes learning a weight for each of a set of lexical coverage features that are aggregated in the statistical machine translation model. The lexical coverage features include a lexical coverage feature for each of a plurality of parallel corpora. Each of the lexical coverage features represents a relative number of words of the text string for which the respective parallel corpus contributed a biphrase to the candidate translation. The method may also include learning a weight for each of a plurality of language model features, the language model features comprising one language model feature for each of the domains.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: training a statistical machine translation model which outputs a score for a candidate translation, in a target language, of a text string in a source language, the training comprising: learning a weight for each of a set of lexical coverage features that are aggregated in the statistical machine translation model, the lexical coverage features comprising a lexical coverage feature for each of a plurality of parallel corpora, each of the lexical coverage features representing a relative number of words contributed by a respective one of the parallel corpora to the translation of the text string, the lexical coverage features being computed based on membership statistics which represent the membership, in each of the plurality of parallel corpora, of each biphrase used in generating the candidate translation, each parallel corpus corresponding to a respective domain from a set of domains and comprising pairs of text strings, each pair comprising a source text string in the source language and a target text string in the target language; and using the trained model in a statistical machine translation system for translation of a new source text string in the source language, wherein the training is performed with a computer processor. 2. The method of claim 1 , wherein in the model, the features are aggregated in a log-linear combination. 3. The method of claim 1 , wherein the lexical coverage features each represent a count the number of words of the text string that are translated using a biphrase originating from the respective parallel corpus, the count being weighted based on the membership statistics for others of the parallel corpora. 4. The method of claim 1 , wherein the learning weights further comprises computing lexical coverage features for candidate translations of each of a collection of source sentences in a development corpus. 5. The method of claim 1 , wherein the membership in each parallel corpus is based on a presence or absence in that corpus. 6. The method of claim 5 , wherein for at least some of the biphrases, the membership of the biphrase denotes a presence in at least two of the parallel corpora. 7. The method of claim 1 , wherein the training comprises computing the lexical coverage features for each of a collection of source strings and corresponding candidate translations generated with biphrases from a collection of biphrases and selecting the weights for the lexical coverage features to optimize a probability that candidate translations that have higher scoring metric scores have higher translation model scores. 8. The method of claim 1 , further comprising generating the membership statistics by determining, for each biphrase in a collection of biphrases, whether the biphrase is present in each of the parallel corpora and for each parallel corpus where the biphrase is present, storing a value in a bit vector corresponding to the presence. 9. The method of claim 8 , wherein the lexical coverage features represent a contribution of each of the parallel corpora to the candidate translation which is based on contributions of the biphrases used in the generating the candidate translation that are based on the membership statistics for the biphrase. 10. The method of claim 9 , wherein the contribution of each of the parallel corpora to each biphrase is based on a length, in words, of at least one of the target phrase and the source phrase in that biphrase. 11. The method of claim 1 , wherein each of the lexical coverage features is computed according to the expression: log ⁢ ⁢ ϕ LC d ⁡ ( e , f ) = ∑ 〈 e ~ , f ~ 〉 ⁢ ⁢ log ⁢ ⁢ ϕ LC d ⁡ ( 〈 e ~ , f ~ 〉 ) where log ⁢ ⁢ φ LC d ⁡ ( 〈 e ~ , f ~ 〉 ) = l ~ * b d 〈 e ~ , f ~ 〉 ∑ j = 1 D ⁢ ⁢ b j 〈

Assignees

Xerox Corp

Inventors

Classifications

G06F40/44Primary
Statistical methods, e.g. probability models · CPC title
G06F40/51Primary
Translation evaluation · CPC title
G06F17/2854Primary
Physics · mapped topic
G06F17/2818
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 51165829

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9235567B2 cover?: A method adapted to multiple corpora includes training a statistical machine translation model which outputs a score for a candidate translation, in a target language, of a text string in a source language. The training includes learning a weight for each of a set of lexical coverage features that are aggregated in the statistical machine translation model. The lexical coverage features include…
Who is the assignee on this patent?: Xerox Corp
What technology area does this patent fall under?: Primary CPC classification G06F40/44. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 12 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).