Training a search query intent classifier using wiki article titles and a search click log

US9465864B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9465864-B2
Application numberUS-201013384589-A
CountryUS
Kind codeB2
Filing dateSep 29, 2010
Priority dateSep 29, 2010
Publication dateOct 11, 2016
Grant dateOct 11, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are described herein for training a search query intent classifier using wiki article titles and a search click log. Titles of wiki articles that correspond to links that are associated with a specified wiki article and/or titles of wiki articles that are included in a category that includes the specified wiki article are extracted and included with the title of the specified wiki article in an initial set. Each title in the initial set is correlated with respective clicked URI(s) using a search click log. The initial set is expanded to include search terms that are correlated to the clicked URIs based on the search click log to provide an expanded set. The search query intent classifier is trained to classify search queries with respect to a query intent that is associated with the title of the specified wiki article based on the expanded set.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining a designated query intent from a search query; in response to obtaining the designated query intent from the search query, obtaining a second wiki article corresponding to the designated query intent, wherein the second wiki article includes a second title, a second body, and a plurality of respective links; obtaining a plurality of first wiki articles that correspond to the plurality of respective links included in the second wiki article, wherein each of the first wiki articles includes a different title and a different body; extracting a plurality of first titles of the plurality of first wiki articles; extracting the second title of the second wiki article; generating an initial key term set including the plurality of first titles and the second title; correlating each of the plurality of first titles and the second title with at least one respective clicked uniform resource identifier based on a search click log, wherein the search click log comprises a record of historical search queries provided by users and documents clicked by the users from search results retrieved in response to the historical search queries, and wherein each of the documents has a respective uniform resource identifier; generating an expanded key term set to include the initial key term set expanded to further include search terms correlated to the at least one respective clicked uniform resource identifier; training a search query intent classifier to classify search queries with respect to the designated query intent based on the expanded key term set; obtaining the designated query intent from a subsequent search query; and in response to obtaining the subsequent search query corresponding to the designated query intent, controlling the trained search query intent classifier to select advertisement data based on the expanded key term set. 2. The method of claim 1 , further comprising: determining a category to which the second wiki article is assigned; and extracting a plurality of third titles of a plurality of respective third wiki articles that are assigned to the category to be included in the initial key term set; wherein correlating each of the plurality of first titles and the second title with at least one respective clicked uniform resource identifier comprises: correlating each of the plurality of first titles, the second title, and the plurality of third titles with at least one respective clicked uniform resource identifier based on the search click log; and wherein generating the expanded key term set comprises: expanding the initial key term set to include search terms, in addition to the plurality of first titles, the second title, and the plurality of third titles, that are correlated to the at least one respective clicked uniform resource identifier based on the search click log to provide the expanded key term set. 3. The method of claim 2 , further comprising: extracting a plurality of fourth titles of a plurality of respective fourth wiki articles that corresponds to a plurality of respective second links to be included in the initial key term set, each of the plurality of second links being associated with at least one of the plurality of third wiki articles; wherein correlating each of the plurality of first titles and the second title with at least one respective clicked uniform resource identifier comprises: correlating each of the plurality of first titles, the second title, the plurality of third titles, and the plurality of fourth titles with at least one respective clicked uniform resource identifier based on the search click log; and wherein generating the expanded key term set comprises: expanding the initial key term set to include search terms, in addition to the plurality of first titles, the second title, the plurality of third titles, and the plurality of fourth titles, that are correlated to the at least one respective clicked uniform resource identifier based on the search click log to provide the expanded key term set. 4. The method of claim 1 , further comprising: extracting a plurality of third titles of a plurality of respective third wiki articles that corresponds to a plurality of respective second links to be included in the initial key term set, each of the plurality of second links being associated with at least one of the plurality of first wiki articles; wherein correlating each of the plurality of first titles and the second title with at least one respective clicked uniform resource identifier comprises: correlating each of the plurality of first titles, the second title, and the plurality of third titles with at least one respective clicked uniform resource identifier based on the search click log; and wherein generating the expanded key term set comprises: expanding the initial key term set to include search terms, in addition to the plurality of first titles, the second title, and the plurality of third titles, that are correlated to the at least one respective clicked uniform resource identifier based on the search click log to provide the expanded key term set. 5. The method of claim 1 , wherein correlating each of the plurality of first titles and the second title with at least one respective clicked uniform resource identifier using a search click log comprises: generating a bipartite graph that correlates each of the plurality of first titles and the second title to the at least one respective clicked uniform resource identifier based on the search click log. 6. The method of claim 1 , further comprising: assigning a first probability to the plurality of first titles and to the second title; and assigning a respective second probability to each of the search terms, each second probability being less than the first probability, wherein generating the expanded key term set comprises: expanding the initial key term set to include a first subset of the search terms and to not include a second subset of the search terms to provide the expanded key term set, the first subset including search terms to which respective second probabilities that are greater than a threshold probability are assigned, the second subset including search terms to which respective second probabilities that are less than the threshold probability are assigned. 7. The method of claim 1 , further comprising: randomly selecting second search terms that are not included in the expanded key term set to provide a negative sample set, wherein training the search query intent classifier comprises: training the search query intent classifier to classify the search queries with respect to the designated query intent based on the expanded key term set and the negative sample set. 8. The method of claim 1 , wherein training the search query intent classifier comprises: training a maxentropy classifier to classify the search queries with respect to the designated query intent based on the expanded key term set. 9. The method of claim 1 , wherein training the search query intent classifier comprises: training the search query intent classifier to classify the search queries with respect to the designated query intent based on the expanded key term set and further based on text that is included in the second wiki article. 10. The method of claim 1 , wherein extracting the plurality of first titles comprises: extracting the plurality of first titles of the plurality of respective first wiki articles that corresponds to a plurality of respective article links that are associated with the second wiki article to provide the initial key term set. 11. The method of claim 1 , wherein extracting the

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • based on the proximity to a decision surface, e.g. support vector machines · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9465864B2 cover?
Techniques are described herein for training a search query intent classifier using wiki article titles and a search click log. Titles of wiki articles that correspond to links that are associated with a specified wiki article and/or titles of wiki articles that are included in a category that includes the specified wiki article are extracted and included with the title of the specified wiki ar…
Who is the assignee on this patent?
Hu Jian, Zheng Hao, Excalibur Ip Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/30705. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 11 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).