Who is the assignee on this patent?

Aue Anthony, Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06F40/58. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Language segmentation of multilingual texts

US9400787B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9400787-B2
Application number	US-201314073036-A
Country	US
Kind code	B2
Filing date	Nov 6, 2013
Priority date	Feb 8, 2011
Publication date	Jul 26, 2016
Grant date	Jul 26, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The claimed subject matter provides a system and/or method for segmenting a multi-language text. An exemplary method comprises determining an initial probability distribution for sentences in the multi-language text, the initial probability distribution indicating the likelihood of each sentence being in each of a set of languages. A probability of language transitions across sentences may be learned based on the initial probability distribution. Additionally, a highest probability language sequence of sentences in the multi-language text may be determined based on a combination of the probability of language transitions and the prior probability distribution provided by an initial model. Further, web documents are annotated at a sentence by sentence level such that each sentence of a web document is labeled in a given language according to the highest probability language determined.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of segmenting a multi-language text, comprising: determining, using a processing unit, an initial probability distribution for sentences in a web document in the multi-language text, the initial probability distribution indicating the likelihood of each sentence being in each of a set of languages; learning, using the processing unit, a probability of language transitions across sentences based on the initial probability distribution; determining, using the processing unit, a highest probability language sequence of sentences in the multi-language text based on a combination of the probability of language transitions and a prior probability distribution provided by an initial model; and annotating web documents at a sentence by sentence level such that each sentence of a web document is labeled in a given language according to the highest probability language determined. 2. The method recited in claim 1 , comprising using an automatic language detector to determine the sentences in the multi-language text. 3. The method recited in claim 1 , wherein learning the probability of language transitions comprises using a hidden Markov model. 4. The method recited in claim 1 , wherein learning the probability of language transitions comprises using a forward backward algorithm. 5. The method recited in claim 1 , wherein determining a highest probability language sequence comprises using a Viterbi Algorithm. 6. The method recited in claim 1 , comprising segmenting, using the processing unit, the multi-language text into a plurality of monolingual texts based on the highest probability language sequence. 7. The method recited in claim 1 , wherein learning the probability of language transitions comprises using a second order Markov model. 8. A system for segmenting a multi-language text, the system comprising: a processing unit; and a system memory, wherein the system memory comprises code configured to direct the processing unit to: determine an initial probability distribution for sentences in the multi-language text, the initial probability distribution indicating the likelihood of each sentence being in each of a set of languages; learn a probability of language transitions across sentences based on the initial probability distribution; determine, using the processing unit, a highest probability language sequence of sentences in the multi-language text based on a combination of the probability of language transitions and a prior probability distribution provided by an initial model; and annotate web documents at a sentence by sentence level such that each sentence of a web document is labeled in a given language according to the highest probability language determined. 9. The system recited in claim 8 , comprising using an automatic language detector to determine the sentences in the multi-language text. 10. The system recited in claim 8 , wherein learning the probability of language transitions comprises using a hidden Markov model. 11. The system recited in claim 8 , wherein learning the probability of language transitions comprises using a forward backward algorithm. 12. The system recited in claim 8 , wherein determining a highest probability language sequence comprises using a Viterbi Algorithm. 13. The system recited in claim 8 , comprising segmenting the multi-language text into a plurality of monolingual texts based on the highest probability language sequence. 14. The system recited in claim 8 , wherein learning the probability of language transitions comprises using a second order Markov model.

Assignees

Inventors

Aue Anthony

Classifications

G06F40/58Primary
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
G06F40/263Primary
Language identification · CPC title
G06F17/289Primary
Physics · mapped topic
G06F17/275
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 46601269

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9400787B2 cover?: The claimed subject matter provides a system and/or method for segmenting a multi-language text. An exemplary method comprises determining an initial probability distribution for sentences in the multi-language text, the initial probability distribution indicating the likelihood of each sentence being in each of a set of languages. A probability of language transitions across sentences may be l…
Who is the assignee on this patent?: Aue Anthony, Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06F40/58. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).