Who is the assignee on this patent?

Gao Jianfeng, Hwang Mei Yuh, Huang Xuedong D, and 3 more

What technology area does this patent fall under?

Primary CPC classification G06F40/44. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 08 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Statistical machine translation based search query spelling correction

Patent metadata
Field	Value
Publication number	US-10176168-B2
Application number	US-201113296640-A
Country	US
Kind code	B2
Filing date	Nov 15, 2011
Priority date	Nov 15, 2011
Publication date	Jan 8, 2019
Grant date	Jan 8, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Statistical Machine Translation (SMT) based search query spelling correction techniques are described herein. In one or more implementations, search data regarding searches performed by clients may be logged. The logged data includes query correction pairs that may be used to ascertain error patterns indicating how misspelled substrings may be translated to corrected substrings. The error patterns may be used to determine suggestions for an input query and to develop query correction models used to translate the input query to a corrected query. In one or more implementations, probabilistic features from multiple query correction models are combined to score different correction candidates. One or more top scoring correction candidates may then be exposed as suggestions for selection by a user and/or provided to a search engine to conduct a corresponding search using the corrected query version(s).

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by one or more computing devices, the method comprising: receiving, at the one or more computing devices via a communications network, a plurality of search queries performed by a plurality of client devices; logging search data regarding the plurality of search queries performed by the plurality of client devices by storing the search data in a data storage device accessible to the one or more computing devices; ascertaining, at the one or more computing devices, error patterns based on query correction pairs described by the logged search data for both words and substrings that are contained by a word; developing, at the one or more computing devices based at least in part upon the ascertained error patterns, multiple query correction models reflecting how frequently the ascertained error patterns occur in the logged search data, including: at least a first query correction model that encodes probabilities for translations of substrings that are contained by a word based on character alignment, each probability for translation determined based on a frequency of a particular input substring being changed to a particular correct substring, and at least a second query correction model that encodes probabilities for n-gram models for both words and characters based on preceding words and characters; receiving, at the one or more computing devices via the communications network, an input search query performed by a subject client device; translating, at the one or more computing devices, the input search query performed by the subject client device to a corrected search query by selecting a top ranking candidate from a group of candidates, each candidate ranked using a combination of probabilities from each of the multiple query correction models; providing the corrected search query to a search engine to generate a set of search results based, at least in part, on the corrected search query; and providing the set of search results to the subject client device via the communications network. 2. A method as described in claim 1 , wherein the combination of probabilities includes multiple probabilities from one or more of the multiple query correction models. 3. A method as described in claim 1 , wherein the combination of probabilities further includes a feature descriptive of translations used for a given candidate. 4. A method as described in claim 1 , wherein the at least a second query correction model comprises a trigram word-based language model configured to encode probabilities for a next word in a sequence given two preceding words in the sequence. 5. A method as described in claim 1 , wherein the at least a second query correction model includes a nine-gram character-based language model configured to encode probabilities for a next character in a sequence given eight preceding characters in the sequence. 6. A method as described in claim 1 , wherein each candidate is ranked according to a weighted log-linear model that combines the probabilities from each of the multiple query correction models. 7. A method as described in claim 6 , wherein the log-linear model has the form of: P ⁡ ( C | Q ) = 1 Z ⁡ ( Q , C ) ⁢ exp ⁢ ∑ i ⁢ λ i ⁢ h i ⁡ ( Q , C ) ⁢ ⁢ where ⁢ ⁢ 1 Z ⁡ ( Q , C ) is a normalization factor, h i (Q,C) represents the probabilities that are combined and λ i represents weight factors applied to the probabilities. 8. A method as described in claim 6 , wherein the top ranking candidate is the candidate that maximizes the weighted log-linear model. 9. One or more computer-readable storage media storing instructions that, when executed via the one or more processors of one or more computing devices, implement a correction module to perform operations comprising: receiving, at the one or more computing devices via a communications network, a plurality of search queries performed by a plurality of client devices; logging search data regarding the plurality of search queries performed by the plurality of client devices by storing the search data in a data storage device accessible to the one or more computing devices; ascertaining, at the one or more computing devices, error patterns based on query correction pairs described by the logged search data for both words and substrings that are contained by a word; developing, at the one or more computing devices based at least in part upon the ascertained error patterns, multiple query correction models reflecting how frequently the ascertained error patterns occur in the logged search data, including: at least a first query correction model that encodes probabilities for translations of substrings that are contained by a word based on character alignment, each probability for translation determined based on a frequency of a particular input substring being changed to a particular correct substring, and at least a second query correction model that encodes probabilities for n-gram models for both words and characters based on preceding words and characters; receiving, at the one or more computing devices via the communications network, an input search query performed by a subject client device; parsing the input search query to obtain individual search terms and substrings contained in the individual search terms; generating a group of candidates corresponding to the input search query using the ascertained error patterns; ranking the group of candidates one to another in accordance with scores computed using the multiple query correction models developed based on the ascertained error patterns; tra

Assignees

Inventors

Classifications

G06F40/44Primary
Statistical methods, e.g. probability models · CPC title
G06F40/232Primary
Orthographic correction, e.g. spell checking or vowelisation · CPC title
G06F16/951
Indexing; Web crawling techniques · CPC title
G06F16/242
Query formulation · CPC title
G06F17/273
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 48281611

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10176168B2 cover?: Statistical Machine Translation (SMT) based search query spelling correction techniques are described herein. In one or more implementations, search data regarding searches performed by clients may be logged. The logged data includes query correction pairs that may be used to ascertain error patterns indicating how misspelled substrings may be translated to corrected substrings. The error patte…
Who is the assignee on this patent?: Gao Jianfeng, Hwang Mei Yuh, Huang Xuedong D, and 3 more
What technology area does this patent fall under?: Primary CPC classification G06F40/44. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 08 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).