Automated efficient translation context delivery

US2016350108A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016350108-A1
Application numberUS-201514725867-A
CountryUS
Kind codeA1
Filing dateMay 29, 2015
Priority dateMay 29, 2015
Publication dateDec 1, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments relate to automatically providing textual context for source strings in a source language that are to be translated by a human translator to target strings in a target language. The source strings are compared against a dictionary of reference strings in the source language. For each source string, one or more of the reference strings that are most relevant, similar, etc., are selected. When a human translator is to translate the source strings, the selected reference strings are presented; each source string has one or more similar/related strings displayable in association therewith. For a given source string, the human translator can use the associated reference strings as a form of context to help estimate the intended meaning of the given source string when translating the given source string to a target string in the target language.

First claim

Opening claim text (preview).

1 . A method comprising: accessing resource strings each comprising respective text in a source human language, the resource strings having been obtained from a resource file of a source code project, the source code project comprising references to the resource strings that are configured to be used to compile the source code project; accessing a corpus of strings each comprising respective text in the source human language; performing text analysis on the resource strings and the corpus of strings to select and associate a string for each of the respective resource text strings; and storing the selected strings in association with the respective resource strings. 2 . A method according to claim 1 , further comprising: providing a translation file to an editor, the translation file storing the selected strings in association with the resource strings; and receiving target resource strings entered by a human translator using the editor, the target resource strings comprising respective text in a target human language that is not the source human language. 3 . A method according to claim 2 , wherein the editor displays a selected string in association with a resource string. 4 . A method according to claim 1 , wherein the strings comprise contextually similar strings selected based on determinations of context similarity to the target resource strings. 5 . A method according to claim 1 , wherein the text strings in the corpus comprise strings previously used on other source code projects. 6 . A method according to claim 1 , wherein the text analysis is performed by an algorithm that scores each of the resource text strings with respect to a respective set of the strings, wherein the strings are selected based on the scores. 7 . A method according to claim 6 , wherein the scores correspond to determinations, with respect to the resource text strings, of textual, semantic, or topical similarity as determined by the algorithm. 8 . A method comprising: displaying a graphical user interface on a display of a computing device, the graphical user interface displaying a resource text string, a historical resource text string, and an input element, wherein the historical resource text string was automatically selected from a corpus of historical resource text strings obtained from software projects that included the historical resource text strings as resources thereof, and wherein the historical resource text strings and the resource text string are comprised of one or more words in a first human language; receiving, via the input element, a target resource text string inputted by a user of the graphical user interface, wherein the target resource text string comprises one or more words in a second human language; and including the target resource text string in a resource file of a program project, the target resource text string referenced in a source code file of the program project. 9 . A method according to claim 8 , further comprising: training a machine learning model with the corpus of historical resource text strings, and applying the resource text string to the machine learning model to obtain the historical resource text string. 10 . A method according to claim 9 , wherein the machine learning model comprises a topic model that maps strings to vectors of abstract topics. 11 . A method according to claim 8 , the graphical user interface further displaying, in association with the historical resource text string, a second historical resource text string in the second human language. 12 . A method according to claim 8 , further comprising providing a resource file and historical text strings to a context identification module, the context identification module computing scores for the resource strings in the resource file and selecting historical resource text strings, including the historical resource text string. 13 . A method according to claim 8 , further comprising using the resource file to compile the program project. 14 . One or more computing devices comprising: processing hardware; storage hardware storing instructions that when executed by the processing hardware cause the processing hardware to perform a process comprising: accessing resource text strings obtained from a resource file of a program project, wherein the resource text strings are referenced in a source code file of the program project, wherein the program project is configured to be compiled to produce a corresponding program, and wherein the resource text strings are in a first human language; accessing a dictionary of historical resource text strings, wherein the historical resource text strings have been obtained from resource files of respective other program projects, and wherein the historical resource text strings are in the first human language; for each resource text string, selecting one or more of the historical resource text strings based on the resource text strings; storing, in a file, the resource text strings in association with respective selected historical resource text strings; providing the file to a computing device operated by a human translator; and receiving translation strings, in a second human language, inputted by the human translator based on the file. 15 . One or more computing devices according to claim 14 , the process further comprising adding a new resource file to the program project, the new resource file comprised of the translation strings, the translation strings and the resource text resource text strings associated with and referenced by same respective identifiers in the source code file. 16 . One or more computing devices according to claim 14 , wherein the selecting comprises, for a given resource text string, computing distance scores of the given resource text string to respective historical resource text strings in a set of the historical resource text strings. 17 . One or more computing devices according to claim 16 , wherein the selecting further comprises: determining that the given resource test string is related to the set of the historical resource text strings; and selecting one or more of the historical resource text strings in the set based on the scores. 18 . One or more computing devices according to claim 17 , the process further comprising applying a topic model to generate sets of topically related sets of the historical resource strings, the sets including the set. 19 . One or more computing devices according to claim 14 , wherein the selecting is performed with a text similarity algorithm. 20 . One or more computing devices according to claim 14 , wherein the selecting is performed with a machine learning algorithm trained according to the historical resource text strings.

Assignees

Inventors

Classifications

  • Text processing (natural language analysis G06F40/20; semantic analysis G06F40/30; processing or translation of natural language G06F40/40) · CPC title

  • Machine-assisted translation, e.g. using translation memory · CPC title

  • Multi-language systems; Localisation; Internationalisation · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016350108A1 cover?
Embodiments relate to automatically providing textual context for source strings in a source language that are to be translated by a human translator to target strings in a target language. The source strings are compared against a dictionary of reference strings in the source language. For each source string, one or more of the reference strings that are most relevant, similar, etc., are selec…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F8/71. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 01 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).