Encoding and adaptive, scalable accessing of distributed models

US10089304B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10089304-B2
Application numberUS-201715480722-A
CountryUS
Kind codeB2
Filing dateApr 6, 2017
Priority dateFeb 17, 2006
Publication dateOct 2, 2018
Grant dateOct 2, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for translating a text, comprising: receiving the text in a source language; partitioning the text into a plurality of segments; obtaining, for each segment, one or more candidate translations in a target language; for each of a plurality of possible n grams in each candidate translation: identifying a respective partition of a language model containing the n gram, wherein each partition includes a subset of all n grams in the target language and statistical data for the same subset of n grams, each n gram being a sequence of n tokens in the target language, wherein n is a positive integer, and wherein each partition is maintained by a different server of a plurality of servers; sending a lookup request to the server maintaining the respective partition containing the n-gram; obtaining, from the server maintaining the respective partition containing the n-gram, statistical data for the n gram; and determining, for each segment of the text, a best candidate translation of the one or more candidate translations based on the obtained statistical data. 2. The method of claim 1 , wherein the n-grams in a respective subset each have a plurality of common tokens at predetermined positions. 3. The method of claim 2 , wherein n is greater than or equal to 3. 4. The method of claim 2 , wherein the plurality of common tokens are the last two tokens in a sequence of tokens representing an n gram. 5. The method of claim 2 , wherein the statistical data for a respective n gram comprises a relative frequency of occurrence of the respective n gram in the target language. 6. The method of claim 2 , further comprising: in response to obtaining statistical data for a respective n gram from a server, storing the statistical data in a language model cache. 7. The method of claim 2 , wherein obtaining, for each segment, one or more candidate translations in the target language, comprises: evaluating each segment against a translation model. 8. The method of claim 7 , wherein the translation model comprises mapping information between the source language and the target language and scoring information associated with each mapping, the mapping information comprising a relation between (i) one or more tokens in the source language and (ii) one or more tokens in the target language. 9. The method of claim 8 , wherein the translation model is stored on a plurality of translation model servers, each storing and operable to serve different partitions of the translation model. 10. A system comprising: a plurality of servers, wherein each server is configured to store a partition of a language model of a target language, wherein each respective partition of the language model includes a subset of all n grams in the target language and statistical data for the same subset of n grams, each n gram being a sequence of n tokens in the target language, and wherein n is a positive integer; and one or more processors configured to perform operations comprising: receiving a text in a source language; partitioning the text into a plurality of segments; obtaining, for each segment, one or more candidate translations in the target language; for each of a plurality of possible n grams in each candidate translation: identifying the respective partition of the language model containing the n gram; sending a lookup request to the server maintaining the respective partition containing the n-gram; obtaining, from the server maintaining the respective partition containing the n-gram, statistical data for the n gram; and determining, for each segment of the text, a best candidate translation of the one or more candidate translations based on the obtained statistical data. 11. The system of claim 10 , wherein the n-grams in a respective subset each have a plurality of common tokens at predetermined positions. 12. The system of claim 11 , wherein n is greater than or equal to 3. 13. The system of claim 11 , wherein the plurality of common tokens are the last two tokens in a sequence of tokens representing an n gram. 14. The system of claim 11 , wherein the statistical data for a respective n gram comprises a relative frequency of occurrence of the respective n gram in the target language. 15. The system of claim 11 , further comprising: a language model cache configured to store statistical data for a respective n gram obtained from a server. 16. The system of claim 11 , further comprising: a plurality of translation model servers, wherein each translation model server is configured to store a different partition of a translation model, and wherein the translation model servers are configured to perform operations comprising: evaluating each segment of the text against the translation model; and providing candidate translations for each segment. 17. The system of claim 16 , wherein the translation model comprises mapping information between the source language and the target language and scoring information associated with each mapping, the mapping information comprising a relation between (i) one or more tokens in the source language and (ii) one or more tokens in the target language. 18. One or more computer storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations comprising: receiving a text in a source language; partitioning the text into a plurality of segments; obtaining, for each segment, one or more candidate translations in a target language; for each of a plurality of possible n grams in each candidate translation: identifying a respective partition of the language model containing the n gram, wherein each respective partition of the language model includes a subset of all n grams in the target language and statistical data for the same subset of n grams, each n gram being a sequence of n tokens in the target language, wherein n is a positive integer, and wherein each partition is maintained by a different server of a plurality of servers; sending a lookup request to the server maintaining the respective partition containing the n-gram; obtaining, from the server maintaining the respective partition containing the n-gram, statistical data for the n gram; and determining, for each segment of the text, a best candidate translation of the one or more candidate translations based on the obtained statistical data. 19. The computer storage media of claim 18 , wherein the n-grams in a subset each have a plurality of common tokens at predetermined positions. 20. The computer storage media of claim 19 , wherein n is greater than or equal to 3.

Assignees

Inventors

Classifications

  • G06F40/58Primary

    Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title

  • using very large corpora, e.g. the web · CPC title

  • Machine-assisted translation, e.g. using translation memory · CPC title

  • G06F40/44Primary

    Statistical methods, e.g. probability models · CPC title

  • G06F40/40Primary

    Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10089304B2 cover?
Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/58. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 02 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).