What technology area does this patent fall under?

Primary CPC classification G06F40/58. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 02 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Encoding and adaptive, scalable accessing of distributed models

US10089304B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10089304-B2
Application number	US-201715480722-A
Country	US
Kind code	B2
Filing date	Apr 6, 2017
Priority date	Feb 17, 2006
Publication date	Oct 2, 2018
Grant date	Oct 2, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for translating a text, comprising: receiving the text in a source language; partitioning the text into a plurality of segments; obtaining, for each segment, one or more candidate translations in a target language; for each of a plurality of possible n grams in each candidate translation: identifying a respective partition of a language model containing the n gram, wherein each partition includes a subset of all n grams in the target language and statistical data for the same subset of n grams, each n gram being a sequence of n tokens in the target language, wherein n is a positive integer, and wherein each partition is maintained by a different server of a plurality of servers; sending a lookup request to the server maintaining the respective partition containing the n-gram; obtaining, from the server maintaining the respective partition containing the n-gram, statistical data for the n gram; and determining, for each segment of the text, a best candidate translation of the one or more candidate translations based on the obtained statistical data. 2. The method of claim 1 , wherein the n-grams in a respective subset each have a plurality of common tokens at predetermined positions. 3. The method of claim 2 , wherein n is greater than or equal to 3. 4. The method of claim 2 , wherein the plurality of common tokens are the last two tokens in a sequence of tokens representing an n gram. 5. The method of claim 2 , wherein the statistical data for a respective n gram comprises a relative frequency of occurrence of the respective n gram in the target language. 6. The method of claim 2 , further comprising: in response to obtaining statistical data for a respective n gram from a server, storing the statistical data in a language model cache. 7. The method of claim 2 , wherein obtaining, for each segment, one or more candidate translations in the target language, comprises: evaluating each segment against a translation model. 8. The method of claim 7 , wherein the translation model comprises mapping information between the source language and the target language and scoring information associated with each mapping, the mapping information comprising a relation between (i) one or more tokens in the source language and (ii) one or more tokens in the target language. 9. The method of claim 8 , wherein the translation model is stored on a plurality of translation model servers, each storing and operable to serve different partitions of the translation model. 10. A system comprising: a plurality of servers, wherein each server is configured to store a partition of a language model of a target language, wherein each respective partition of the language model includes a subset of all n grams in the target language and statistical data for the same subset of n grams, each n gram being a sequence of n tokens in the target language, and wherein n is a positive integer; and one or more processors configured to perform operations comprising: receiving a text in a source language; partitioning the text into a plurality of segments; obtaining, for each segment, one or more candidate translations in the target language; for each of a plurality of possible n grams in each candidate translation: identifying the respective partition of the language model containing the n gram; sending a lookup request to the server maintaining the respective partition containing the n-gram; obtaining, from the server maintaining the respective partition containing the n-gram, statistical data for the n gram; and determining, for each segment of the text, a best candidate translation of the one or more candidate translations based on the obtained statistical data. 11. The system of claim 10 , wherein the n-grams in a respective subset each have a plurality of common tokens at predetermined positions. 12. The system of claim 11 , wherein n is greater than or equal to 3. 13. The system of claim 11 , wherein the plurality of common tokens are the last two tokens in a sequence of tokens representing an n gram. 14. The system of claim 11 , wherein the statistical data for a respective n gram comprises a relative frequency of occurrence of the respective n gram in the target language. 15. The system of claim 11 , further comprising: a language model cache configured to store statistical data for a respective n gram obtained from a server. 16. The system of claim 11 , further comprising: a plurality of translation model servers, wherein each translation model server is configured to store a different partition of a translation model, and wherein the translation model servers are configured to perform operations comprising: evaluating each segment of the text against the translation model; and providing candidate translations for each segment. 17. The system of claim 16 , wherein the translation model comprises mapping information between the source language and the target language and scoring information associated with each mapping, the mapping information comprising a relation between (i) one or more tokens in the source language and (ii) one or more tokens in the target language. 18. One or more computer storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising: receiving a text in a source language; partitioning the text into a plurality of segments; obtaining, for each segment, one or more candidate translations in a target language; for each of a plurality of possible n grams in each candidate translation: identifying a respective partition of the language model containing the n gram, wherein each respective partition of the language model includes a subset of all n grams in the target language and statistical data for the same subset of n grams, each n gram being a sequence of n tokens in the target language, wherein n is a positive integer, and wherein each partition is maintained by a different server of a plurality of servers; sending a lookup request to the server maintaining the respective partition containing the n-gram; obtaining, from the server maintaining the respective partition containing the n-gram, statistical data for the n gram; and determining, for each segment of the text, a best candidate translation of the one or more candidate translations based on the obtained statistical data. 19. The computer storage media of claim 18 , wherein the n-grams in a subset each have a plurality of common tokens at predetermined positions. 20. The computer storage media of claim 19 , wherein n is greater than or equal to 3.

Assignees

Google Llc

Inventors

Classifications

G06F40/58Primary
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
G06F40/49
using very large corpora, e.g. the web · CPC title
G06F40/47
Machine-assisted translation, e.g. using translation memory · CPC title
G06F40/44Primary
Statistical methods, e.g. probability models · CPC title
G06F40/40Primary
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

Patent family

Related publications grouped by family.

View patent family 38437899

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10089304B2 cover?: Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G06F40/58. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 02 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).