Clustering web pages on a search engine results page

US9842158B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9842158-B2
Application numberUS-201514700914-A
CountryUS
Kind codeB2
Filing dateApr 30, 2015
Priority dateAug 9, 2011
Publication dateDec 12, 2017
Grant dateDec 12, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and media are provided for delivering clustered search results for recent and non-recent events by maintaining the identification (ID) numbers of the respective clustered documents beyond the “fresh” life span of the clustered documents. When clusters are formed according to similar content, an ID number and associated attributes are assigned to each of the clusters. This provides a mechanism to track and retrieve the respective clusters for subsequent delivery of search results. The respective ID numbers of the clusters are maintained, even after the documents are no longer considered “fresh.” These similar-content clusters are further subdivided according to publication date. This provides individual subdivided clusters for similar content events that occurred at different time spans, which are delivered along with individual non-clustered search results in a SERP.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of clustering documents for search results, the method comprising: accessing a database that is associated with a search engine, wherein the database includes a plurality of stored documents retrievable by the search engine; clustering some of the stored documents into one or more clusters based on content similarity; subdividing each of the one or more clusters into one or more subdivided clusters according to publication date; assigning an identifier to each of the clusters of the stored documents, wherein the identifier is assigned during a life span of each of the clustered stored documents, and wherein the identifier of each of the clusters remains persistent throughout the life span of each of the clustered stored documents; responsive to a search query, generating search results for presentation on a search results page, the search results comprising one or more of the subdivided clusters; and presenting the search results on the search result page, wherein each subdivided cluster presented on the search results page includes a synopsis of the subdivided cluster and links to documents contained within the subdivided cluster. 2. The method of claim 1 , wherein some of the one or more clusters comprise retrieved documents that are fresh documents. 3. The method of claim 1 , wherein some of the one or more clusters comprise retrieved documents that are not fresh documents. 4. The method of claim 1 , wherein the one or more clusters comprise one or more grouped Uniform Resource Locators (URLs). 5. The method of claim 1 , further comprising: providing a thumbnail synopsis for each of the one or more clusters. 6. The method of claim 5 , wherein the thumbnail synopsis comprises one or more of: a number of documents, a host domain, or one or more dominant features for each of the one or more clusters. 7. A system for generating search results comprising clustered documents, comprising: one or more memory storage devices configured to store a database that includes a plurality of stored documents; one or more computing devices configured to: access the database that includes the plurality of stored documents, cluster some of the stored documents into one or more clusters based on content similarity, assign an identifier to each of the clusters of the stored documents, wherein the identifier of each of the clusters is assigned during a life span of each of the clustered stored documents, and remains persistent throughout the life span of each of the clustered stored documents, subdividing each of the one or more clusters into one or more subdivided clusters according to publication date, and responsive to a search query, generating search results for presentation on a search results page, wherein the search results are organized into one or more subdivided clusters. 8. The system of claim 7 , wherein some of the one or more clusters comprise retrieved documents that are fresh documents. 9. The system of claim 7 , wherein some of the one or more clusters comprise retrieved documents that are not fresh documents. 10. The system of claim 7 , wherein the one or more clusters comprise one or more grouped Uniform Resource Locators (URLs). 11. The system of claim 7 , further comprising the one or more computing devices configured to provide a thumbnail synopsis for each of the one or more subdivided clusters in the search results. 12. The system of claim 11 , wherein the thumbnail synopsis comprises one or more of: a number of documents, a host domain, or one or more dominant features for each of the one or more clusters. 13. A computer-implemented method of generating search results comprising clustered documents using a computing device having processor, memory, and data storage subsystems, the computer-implemented method comprising: grouping a plurality of documents stored in a database based on page content similarity to form one or more clusters; assigning an identifier and one or more respective related attributes to each of the one or more clusters; maintaining the assigned identifiers and the respective related attributes for each of the one or more clusters, wherein the identifier of each of the clusters remains persistent throughout an entire life span of each of the clustered stored documents; subdividing each of the one or more clusters into one or more subdivided clusters according to publication date; and responsive to a search query, generating search results for presentation on a search results page, the search results comprising one or more of the clusters. 14. The computer-implemented method of claim 13 , wherein grouping a plurality of documents comprises grouping a plurality of fresh documents. 15. The computer-implemented method of claim 13 , wherein grouping a plurality of documents comprises grouping a plurality of non-recent event documents. 16. The computer-implemented method of claim 13 , wherein the assigned identifiers remain persistent throughout a lifetime of each respective document's life. 17. The computer-implemented method of claim 13 , wherein each document of the plurality of documents is considered to be a fresh document for approximately a one-month life span. 18. The computer-implemented method of claim 13 , further comprising: displaying the one or more subdivided clusters by publication date for one of the one or more clusters to a user interface of the computing device in response to a user search query. 19. The computer-implemented method of claim 18 , wherein displaying each of the one or more subdivided clusters comprises displaying a respective one or more of: a dominant title, a dominant image, or a dominant news summary. 20. The computer-implemented method of claim 13 , wherein the one or more subdivided clusters comprise grouped Uniform Resource Locators (URLs) according to respective ID numbers of the one or more subdivided clusters.

Assignees

Inventors

Classifications

  • G06F16/355Primary

    Creation or modification of classes or clusters · CPC title

  • Presentation of query results · CPC title

  • Indexing; Web crawling techniques · CPC title

  • G06F16/285Primary

    Clustering or classification · CPC title

  • Search customisation based on user profiles and personalisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9842158B2 cover?
Methods, systems, and media are provided for delivering clustered search results for recent and non-recent events by maintaining the identification (ID) numbers of the respective clustered documents beyond the “fresh” life span of the clustered documents. When clusters are formed according to similar content, an ID number and associated attributes are assigned to each of the clusters. This prov…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/355. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 12 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).