Speech translation processing apparatus
US-2024370669-A1 · Nov 7, 2024 · US
US9400787B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9400787-B2 |
| Application number | US-201314073036-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 6, 2013 |
| Priority date | Feb 8, 2011 |
| Publication date | Jul 26, 2016 |
| Grant date | Jul 26, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The claimed subject matter provides a system and/or method for segmenting a multi-language text. An exemplary method comprises determining an initial probability distribution for sentences in the multi-language text, the initial probability distribution indicating the likelihood of each sentence being in each of a set of languages. A probability of language transitions across sentences may be learned based on the initial probability distribution. Additionally, a highest probability language sequence of sentences in the multi-language text may be determined based on a combination of the probability of language transitions and the prior probability distribution provided by an initial model. Further, web documents are annotated at a sentence by sentence level such that each sentence of a web document is labeled in a given language according to the highest probability language determined.
Opening claim text (preview).
What is claimed is: 1. A method of segmenting a multi-language text, comprising: determining, using a processing unit, an initial probability distribution for sentences in a web document in the multi-language text, the initial probability distribution indicating the likelihood of each sentence being in each of a set of languages; learning, using the processing unit, a probability of language transitions across sentences based on the initial probability distribution; determining, using the processing unit, a highest probability language sequence of sentences in the multi-language text based on a combination of the probability of language transitions and a prior probability distribution provided by an initial model; and annotating web documents at a sentence by sentence level such that each sentence of a web document is labeled in a given language according to the highest probability language determined. 2. The method recited in claim 1 , comprising using an automatic language detector to determine the sentences in the multi-language text. 3. The method recited in claim 1 , wherein learning the probability of language transitions comprises using a hidden Markov model. 4. The method recited in claim 1 , wherein learning the probability of language transitions comprises using a forward backward algorithm. 5. The method recited in claim 1 , wherein determining a highest probability language sequence comprises using a Viterbi Algorithm. 6. The method recited in claim 1 , comprising segmenting, using the processing unit, the multi-language text into a plurality of monolingual texts based on the highest probability language sequence. 7. The method recited in claim 1 , wherein learning the probability of language transitions comprises using a second order Markov model. 8. A system for segmenting a multi-language text, the system comprising: a processing unit; and a system memory, wherein the system memory comprises code configured to direct the processing unit to: determine an initial probability distribution for sentences in the multi-language text, the initial probability distribution indicating the likelihood of each sentence being in each of a set of languages; learn a probability of language transitions across sentences based on the initial probability distribution; determine, using the processing unit, a highest probability language sequence of sentences in the multi-language text based on a combination of the probability of language transitions and a prior probability distribution provided by an initial model; and annotate web documents at a sentence by sentence level such that each sentence of a web document is labeled in a given language according to the highest probability language determined. 9. The system recited in claim 8 , comprising using an automatic language detector to determine the sentences in the multi-language text. 10. The system recited in claim 8 , wherein learning the probability of language transitions comprises using a hidden Markov model. 11. The system recited in claim 8 , wherein learning the probability of language transitions comprises using a forward backward algorithm. 12. The system recited in claim 8 , wherein determining a highest probability language sequence comprises using a Viterbi Algorithm. 13. The system recited in claim 8 , comprising segmenting the multi-language text into a plurality of monolingual texts based on the highest probability language sequence. 14. The system recited in claim 8 , wherein learning the probability of language transitions comprises using a second order Markov model.
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Language identification · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.