What technology area does this patent fall under?

Primary CPC classification G06F40/263. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for language detection

US9535896B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9535896-B2
Application number	US-201615161913-A
Country	US
Kind code	B2
Filing date	May 23, 2016
Priority date	Oct 17, 2014
Publication date	Jan 3, 2017
Grant date	Jan 3, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Implementations of the present disclosure are directed to a method, a system, and a computer program storage device for detecting a language in a text message. A plurality of different language detection tests are performed on a message associated with a user. Each language detection test determines a set of scores representing a likelihood that the message is in one of a plurality of different languages. One or more combinations of the score sets are provided as input to one or more distinct classifiers. Output from each of the classifiers includes a respective indication that the message is in one of the different languages. The language in the message may be identified as being the indicated language from one of the classifiers, based on a confidence score and/or an identified linguistic domain.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method of identifying a language of a message, the method comprising: performing a plurality of language detection tests on text, each language detection test determining a respective set of scores, each score in the set of scores representing a likelihood that the message is in a respective language of a plurality of different languages; providing one or more combinations of the score sets as input to one or more distinct classifiers including a first classifier and a second classifier, wherein the first classifier was trained using outputs from a first combination of the language detection tests and the second classifier was trained using outputs from a different second combination of the language detection tests; obtaining as output from each of the one or more classifiers a respective indication that the message is in one of the plurality of different languages, the indication comprising a confidence score; and identifying the language of the message based on one of the confidence scores. 2. The method of claim 1 wherein a particular output is respective scores each representing a likelihood that a respective message is in one of a plurality of different languages. 3. The method of claim 1 , wherein a particular classifier is a supervised learning model, a partially supervised learning model, an unsupervised learning model, or an interpolation. 4. The method of claim 1 , wherein identifying the language of the message comprises selecting the confidence score based on an expected language detection accuracy. 5. The method of claim 1 , wherein identifying the language of the message comprises selecting the confidence score based on the linguistic domain of the message. 6. The method of claim 1 , wherein the message comprises two or more of the following: a letter, a number, a symbol, and an emoticon. 7. The method of claim 1 , wherein a particular language detection test is a byte n-gram method, a dictionary-based method, an alphabet-based method, or a script-based method. 8. The method of claim 1 , wherein the one or more combinations comprise score sets from a byte n-gram method and a dictionary-based method. 9. The method of claim 1 , wherein the one or more combinations further comprise score sets from at least one of a script-based method and an alphabet-based method. 10. A system comprising: one or more computers programmed to perform operations comprising: performing a plurality of language detection tests on text, each language detection test determining a respective set of scores, each score in the set of scores representing a likelihood that the message is in a respective language of a plurality of different languages; providing one or more combinations of the score sets as input to one or more distinct classifiers including a first classifier and a second classifier, wherein the first classifier was trained using outputs from a first combination of the language detection tests and the second classifier was trained using outputs from a different second combination of the language detection tests; obtaining as output from each of the one or more classifiers a respective indication that the message is in one of the plurality of different languages, the indication comprising a confidence score; and identifying the language of the message based on one of the confidence scores. 11. The system of claim 10 wherein a particular output is respective scores each representing a likelihood that a respective message is in one of a plurality of different languages. 12. The system of claim 10 , wherein a particular classifier is a supervised learning model, a partially supervised learning model, an unsupervised learning model, or an interpolation. 13. The system of claim 10 , wherein identifying the language of the message comprises selecting the confidence score based on an expected language detection accuracy. 14. The system of claim 10 , wherein identifying the language of the message comprises selecting the confidence score based on the linguistic domain of the message. 15. The system of claim 10 , wherein the message comprises two or more of the following: a letter, a number, a symbol, and an emoticon. 16. The system of claim 10 , wherein a particular language detection test is a byte n-gram system, a dictionary-based system, an alphabet-based system, or a script-based system. 17. The system of claim 10 , wherein the one or more combinations comprise score sets from a byte n-gram system and a dictionary-based system. 18. The system of claim 10 , wherein the one or more combinations further comprise score sets from at least one of a script-based system and an alphabet-based system. 19. An article comprising a non-transitory computer-readable medium having instructions stored thereon that, when executed by a computer, perform operations comprising: performing a plurality of language detection tests on text, each language detection test determining a respective set of scores, each score in the set of scores representing a likelihood that the message is in a respective language of a plurality of different languages; providing one or more combinations of the score sets as input to one or more distinct classifiers including a first classifier and a second classifier, wherein the first classifier was trained using outputs from a first combination of the language detection tests and the second classifier was trained using outputs from a different second combination of the language detection tests; obtaining as output from each of the one or more classifiers a respective indication that the message is in one of the plurality of different languages, the indication comprising a confidence score; and identifying the language of the message based on one of the confidence scores. 20. The article of claim 19 wherein a particular output is respective scores each representing a likelihood that a respective message is in one of a plurality of different languages. 21. The article of claim 19 , wherein a particular classifier is a supervised learning model, a partially supervised learning model, an unsupervised learning model, or an interpolation. 22. The article of claim 19 , wherein identifying the language of the message comprises selecting the confidence score based on an expected language detection accuracy. 23. The article of claim 19 , wherein identifying the language of the message comprises selecting the confidence score based on the linguistic domain of the message. 24. The article of claim 19 , wherein the message comprises two or more of the following: a letter, a number, a symbol, and an emoticon. 25. The article of claim 19 , wherein a particular language detection test is a byte n-gram system, a dictionary-based system, an alphabet-based system, or a script-based system. 26. The article of claim 19 , wherein the one or more combinations comprise score sets from a byte n-gram system and a dictionary-based system. 27. The article of claim 19 , wherein the one or more combinations further comprise score sets from at least one of a script-based system and an alphabet-based system.

Assignees

Machine Zone Inc

Inventors

Classifications

G06F40/263Primary
Language identification · CPC title
G06F17/275Primary
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 55749215

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9535896B2 cover?: Implementations of the present disclosure are directed to a method, a system, and a computer program storage device for detecting a language in a text message. A plurality of different language detection tests are performed on a message associated with a user. Each language detection test determines a set of scores representing a likelihood that the message is in one of a plurality of different…
Who is the assignee on this patent?: Machine Zone Inc
What technology area does this patent fall under?: Primary CPC classification G06F40/263. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Rules-based language detection

Language Detection Based Upon a Social Graph

Cluster-Based Language Detection

Runtime data language selection in object instance

Techniques for providing a user interface having bi-directional writing tools

Automatically Creating Training Data For Language Identifiers

Frequently asked questions