Extracting query dimensions from search results

US9785704B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9785704-B2
Application numberUS-201213343621-A
CountryUS
Kind codeB2
Filing dateJan 4, 2012
Priority dateJan 4, 2012
Publication dateOct 10, 2017
Grant dateOct 10, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are described for automatically mining query dimensions from web pages resulting from execution of a search query. Lists of items such as words, terms, or phrases are extracted from the web pages based on the recognition of free text, metadata tag, or repeated region patterns within the web page text. Extracted item lists are weighted according to document matching and/or inverse document frequency, and item lists are clustered based on shared or similar items within the lists to generate query dimensions. The generated query dimensions, and the items within each query dimension, are ranked according to quality, and high-quality query dimensions are provided for display alongside top search results.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: identifying a plurality of web pages resulting from a search based on a search query; employing at least one processor to automatically extract one or more item lists from one or more web pages of the plurality of web pages based at least on one or more list pattern identification techniques, the one or more list pattern identification techniques including identifying terms that are in parallel positions in a sentence or continuous lines according to a text pattern in a respective web page of the one or more web pages as a respective item list of the one or more item lists, the respective item list of the one or more item lists including two or more related words or phrases included in the respective web page of the one or more web pages, wherein at least one of the two or more related words or phrases is related to the search query and does not include a keyword for the search query; weighting the one or more item lists; clustering the one or more weighted item lists based on a determination that at least some of the item lists include one or more similar items; and generating, based at least in part on the clustering, one or more query dimensions that summarize an aspect of the search query, at least one of the one or more query dimensions comprising a plurality of items from a cluster of weighted item lists. 2. The method of claim 1 , further comprising: weighting the one or more query dimensions; and identifying one or more high-quality query dimensions as the one or more query dimensions that have a weight higher than a quality threshold value. 3. The method of claim 1 , wherein the text pattern includes one or more free text patterns. 4. The method of claim 1 , wherein the one or more list pattern identification techniques further comprise identifying one or more metadata tag patterns within the one or more web pages. 5. The method of claim 1 , wherein the one or more list pattern identification techniques further comprise identifying one or more repeated region patterns within the one or more web pages. 6. The method of claim 1 , wherein weighting the one or more item lists is based on a document matching weight. 7. The method of claim 1 , wherein weighting the one or more item lists is based on an inverse document frequency weight. 8. The method of claim 1 , wherein weighting the one or more item lists is based on a combination of a document matching weight and an inverse document frequency weight. 9. The method of claim 1 , further comprising ranking one or more items within each of the one or more query dimensions based on a frequency of the one or more items within the one or more item lists. 10. A system comprising: at least one processor; and one or more memories storing computer readable instructions that, when executed by the at least one processor, direct the system to perform acts comprising: extracting one or more item lists from one or more webpages of a plurality of web pages resulting from execution of a search query based at least on one or more list pattern identification techniques, the one or more list pattern identification techniques including identifying terms that are in parallel positions in a sentence or continuous lines according to a text pattern in a respective web page of the one or more web pages as a respective item list of the one or more item lists, the respective item list of the one or more item lists including two or more related words or phrases included in the respective web page of the one or more web pages, wherein at least one of the two or more related words or phrases is related to the search query and does not include a keyword for the search query; weighting each of the one or more item lists; clustering the weighted one or more item lists based on a determination that at least some of the item lists include one or more similar items; and generating, based at least in part on the clustering, one or more query dimensions that summarize a characteristic of the search query, at least one of the one or more query dimensions comprising a plurality of items from a cluster of weighted item lists. 11. The system of claim 10 , wherein the acts further comprise receiving the search query and executing a search based on the search query. 12. The system of claim 10 , wherein the acts further comprise ranking one or more items within each of the one or more query dimensions based on a frequency of the one or more items within the one or more item lists. 13. The system of claim 10 , wherein the acts further comprise ranking the one or more query dimensions. 14. The system of claim 13 , wherein the ranking includes identifying one or more high-quality query dimensions as a predetermined number of highest ranked query dimensions. 15. The system of claim 10 , wherein the one or more list pattern identification techniques further comprise at least one of: one or more metadata tag patterns; and one or more repeated region patterns. 16. A method comprising: ranking a plurality of web pages resulting from a search based on a search query; employing at least one processor to extract one or more item lists from the plurality of web pages based at least on one or more list pattern identification techniques, a respective item list of the one or more item lists including two or more related words or phrases included in a respective web page of the plurality of web pages; weighting each of the one or more item lists, the weighting including assigning a first weight to a first item list from a first web page and a second weight to a second item list from a second web page, the first weight being greater than a second weight, in response to determining that a ranking of the first web page is higher than a ranking of the second web page in the plurality of web pages; and clustering the one or more weighted item lists to generate one or more query dimensions to be displayed alongside a list of the plurality of web pages and that summarize a characteristic of a search query, at least one of the one or more query dimensions comprising a plurality of similar items from a cluster of weighted item lists. 17. The method of claim 16 , further comprising identifying at least one of the one or more dimensions to be displayed within an application, the identifying based at least in part on the application. 18. The method of claim 16 , wherein the clustering is based on a determination that at least some of the item lists include one or more similar items. 19. The method of claim 16 , further comprising ranking one or more items within each of the one or more dimensions based on a frequency of the one or more items within the one or more item lists. 20. The method of claim 16 , wherein weighting the one or more item lists is further based on at least one of a document matching weight and an inverse document frequency weight.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9785704B2 cover?
Techniques are described for automatically mining query dimensions from web pages resulting from execution of a search query. Lists of items such as words, terms, or phrases are extracted from the web pages based on the recognition of free text, metadata tag, or repeated region patterns within the web page text. Extracted item lists are weighted according to document matching and/or inverse doc…
Who is the assignee on this patent?
Dou Zhicheng, Song Ruihua, Wen Ji-Rong, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F16/35. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).