Generating query refinements using query components

US9703871B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9703871-B1
Application numberUS-84751010-A
CountryUS
Kind codeB1
Filing dateJul 30, 2010
Priority dateJul 30, 2010
Publication dateJul 11, 2017
Grant dateJul 11, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer program products, for generating query refinements using query components. In general, one aspect features a method that includes the acts of receiving a query comprising a plurality of terms; identifying first and second components of the query, wherein each component comprises one or more of the terms of the query and the components do not share a term from the query, and wherein the first component appears before the second component in the query; determining, for each component, a plurality of different respective component refinements; and combining the component refinements to create a plurality of query refinements for the query, including combining a first component refinement for the first component with a second component refinement for the second component to create a query refinement so that the first component refinement appears before the second component refinement in the query refinement.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving a query comprising a first plurality of terms in a first order; dividing the query into a plurality of combinations of n-grams, wherein each n-gram is a subset of terms of the first plurality of terms, the subset of terms being in a same order as in the first plurality of terms; for each combination of n-grams: determining, for each n-gram of the combination, a probability that the n-gram is a related phrase, wherein an n-gram is a related phrase when the terms of the n-gram are found together in training data more frequently than those terms would be found together if they were not associated with each other; summing the probabilities of each n-gram of the combination; identifying a particular combination of each combination of n-grams based on the summed probability of each combination, wherein the particular combination has a largest summed probability of the combinations; identifying first and second components of the query based on the identified particular combination, wherein each component comprises one or more of the terms of the query in order and the two components do not share a term from the query, and wherein the terms of the first component appear in the first order before the terms of the second component in the query; determining, for each of the first and second components, a plurality of different respective component refinements; and combining the component refinements to create a plurality of candidate query refinements for the query, including, for each candidate query refinement: combining a first component refinement for the first component with a second component refinement for the second component to create the candidate query refinement for the query, the candidate query refinement comprising a second plurality of terms in a second order and different from the first plurality of terms in the first order, wherein the first component refinement appears in the second order before the second component refinement in the query refinement; filtering the candidate query refinements of the query to create a subset of the candidate query refinements for the query, including: determining for each candidate query refinement a score based on a click-through rate for the candidate query refinement, wherein the click-through rate is a total number of clicks on a plurality of documents presented in response to the candidate query refinement divided by a total number of impressions for the plurality of documents presented in response to the candidate query refinement; and removing from the plurality of candidate query refinements any candidate query refinements having a score not satisfying a threshold score; and providing a plurality of the subset of the candidate query refinements in response to receiving the query. 2. The method of claim 1 , wherein identifying the particular combination further comprises determining that the largest summed probability exceeds a threshold. 3. The method of claim 1 , wherein determining the probability that the n-gram is a related phrase is based on a function of the n-gram's relative frequency in training data. 4. The method of claim 1 , wherein an initial score for the candidate query refinement is based on a number of times a user who submitted the query has searched for the candidate query refinement over a period of time. 5. The method of claim 1 , further comprising ranking the plurality of candidate query refinements by commonality with highest inverse document frequency components of the query. 6. The method of claim 1 , wherein filtering the plurality of candidate query refinements further comprises filtering the plurality of candidate query refinements based on syntactic similarity with the components of the query, including, for each candidate query refinement: determining a syntactic similarity score for the candidate query refinement based on an edit distance between the candidate query refinement and the query, the edit distance being the number of edits to change the candidate query refinement into the query; and removing the candidate query refinement from consideration if the syntactic similarity score does not meet a threshold. 7. A system comprising: one or more processors configured to interact with a computer storage medium in order to perform operations comprising: receiving a query comprising a first plurality of terms in a first order; dividing the query into a plurality of combinations of n-grams, wherein each n-gram is a subset of terms of the first plurality of terms, the subset of terms being in a same order as in the first plurality of terms; for each combination of n-grams: determining, for each n-gram of the combination, a probability that the n-gram is a related phrase, wherein an n-gram is a related phrase when the terms of the n-gram are found together in training data more frequently than those terms would be found together if they were not associated with each other; summing the probabilities of each n-gram of the combination; identifying a particular combination of each combination of n-grams based on the summed probability of each combination, wherein the particular combination has a largest summed probability of the combinations; identifying first and second components of the query based on the identified particular combination, wherein each component comprises one or more of the terms of the query in order and the two components do not share a term from the query, and wherein the terms of the first component appear in the first order before the terms of the second component in the query; determining, for each of the first and second components, a plurality of different respective component refinements; and combining the component refinements to create a plurality of candidate query refinements for the query, including, for each candidate query refinement: combining a first component refinement for the first component with a second component refinement for the second component to create the candidate query refinement for the query, the candidate query refinement comprising a second plurality of terms in a second order and different from the first plurality of terms in the first order, wherein the first component refinement appears in the second order before the second component refinement in the query refinement; filtering the candidate query refinements of the query to create a subset of the candidate query refinements for the query, including: determining for each candidate query refinement a score based on a click-through rate for the candidate query refinement, wherein the click-through rate is a total number of clicks on a plurality of documents presented in response to the candidate query refinement divided by a total number of impressions for the plurality of documents presented in response to the candidate query refinement; and removing from the plurality of candidate query refinements any candidate query refinements having a score not satisfying a threshold score; and providing a plurality of the subset of the candidate query refinements in response to receiving the query. 8. The system of claim 7 , wherein identifying the particular combination further comprises determining that the sum for the first possible combination exceeds a threshold sum. 9. The system of claim 7 , wherein determining the probability that the n-gram is a related phrase is based on a function of the n-gram's relative frequency in training data. 10. The system of claim 7 , wherein an initial score for the candidate query refinement is based on a number of times a user who submitted the query has searched for the candidate query refinemen

Assignees

Inventors

Classifications

  • Search customisation based on user profiles and personalisation · CPC title

  • using relevance feedback from the user, e.g. relevance feedback on documents, documents sets, document terms or passages · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

  • Query expansion · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9703871B1 cover?
Methods, systems, and apparatus, including computer program products, for generating query refinements using query components. In general, one aspect features a method that includes the acts of receiving a query comprising a plurality of terms; identifying first and second components of the query, wherein each component comprises one or more of the terms of the query and the components do not s…
Who is the assignee on this patent?
Das Anwis, Das Abhinandan S, Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/9535. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 11 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).