Presenting search results for gallery web pages

US8938441B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-8938441-B2
Application numberUS-201113283878-A
CountryUS
Kind codeB2
Filing dateOct 28, 2011
Priority dateApr 28, 2011
Publication dateJan 20, 2015
Grant dateJan 20, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for classifying web pages as gallery web pages, and for presenting search results for gallery web pages. In one aspect, a method includes receiving a web page that includes text and one or more images, evaluating one or more characteristics of the web page against predefined criteria, generating a score for the web page based on evaluating the characteristics of the web page against the predefined criteria, and classifying the web page as a gallery web page or as not a gallery web page when the score meets or does not meet a predefined threshold, respectively.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving a web page that includes text and images; selecting a first subset of the images that are not excluded content-type images, wherein an excluded content-type image is an image that is boilerplate content or that is advertising content; determining, for each of the images in the first subset, (I) whether the image has a size ratio that is within a predetermined size ratio range, (II) whether the image has greater than a predetermined quantity of pixels, or (III) whether the image is located between a defined minimum altitude and a defined maximum altitude on the web page; selecting a second subset of the images in the first subset based on the determinations for the images in the first subset; determining (i) a quantity of images in the second subset, and (ii) a ratio of the area of the web page that is covered by the images of the second subset to the total area of the web page; generating a score for the web page based at least on (i) the quantity of the images in the second subset, and (ii) the ratio of the area of the web page that is covered by the images to the total area of the web page; classifying the web page as a gallery web page based on the score for the web page meeting a predefined threshold; and based on classifying the web page as a gallery web page, formatting a search result that references the web page, among a set of search results that each reference a different web page, using a search result format that is designated for web pages that are classified as gallery web pages. 2. The method of claim 1 , wherein generating a score for the web page comprises generating a score for the web page based on evaluating an amount of text that is not included in a boilerplate section of the web page, against a maximum value. 3. The method of claim 1 , wherein generating a score for the web page comprises generating a score for the web page based on evaluating the quantity of images in the second subset against a minimum value. 4. The method of claim 1 , wherein generating a score for the web page comprises generating a score for the web page based on evaluating a quantity of the images in the web page that share a same Document Object Model (DOM) path, against a minimum value. 5. The method of claim 1 , further comprising determining that the score for the web page meets a predefined threshold; wherein the web page is classified as a gallery web page in response to determining that the score for the web page meets the predefined threshold. 6. The method of claim 1 , comprising: labeling the web page that is classified as a gallery web page, as a gallery web page. 7. The method of claim 1 , wherein a gallery web page is a web page in which its principal content is images. 8. The method of claim 1 , wherein selecting the first subset of the images comprises selecting images that are not included in a boilerplate section of the web page and that are not included in an advertising section of the web page. 9. The method of claim 8 , wherein generating a score for the web page comprises generating a score for the web page based on evaluating an amount of text that is not included in the boilerplate section of the web page and that is not included in the advertising section of the web page, against a maximum value. 10. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving a web page that includes text and images; selecting a first subset of the images that are not excluded content-type images, wherein an excluded content-type image is an image that is boilerplate content or that is advertising content; determining, for each of the images in the first subset, (I) whether the image has a size ratio that is within a predetermined size ratio range, (II) whether the image has greater than a predetermined quantity of pixels, or (III) whether the image is located between a defined minimum altitude and a defined maximum altitude on the web page; selecting a second subset of the images in the first subset based on the determinations for the images in the first subset; determining (i) a quantity of images in the second subset, and (ii) a ratio of the area of the web page that is covered by the images of the second subset to the total area of the web page; generating a score for the web page based at least on (i) the quantity of the images in the second subset, and (ii) the ratio of the area of the web page that is covered by the images to the total area of the web page classifying the web page as a gallery web page based on the score for the web page meeting a predefined threshold; and based on classifying the web page as a gallery web page, formatting a search result that references the web page, among a set of search results that each reference a different web page, using a search result format that is designated for web pages that are classified as gallery web pages. 11. The system of claim 10 , wherein generating a score for the web page comprises generating a score for the web page based on evaluating an amount of text that is not included in a boilerplate section of the web page, against a maximum value. 12. The system of claim 10 , wherein generating a score for the web page comprises generating a score for the web page based on evaluating the quantity of images in the second subset against a minimum value. 13. The system of claim 10 , wherein generating a score for the web page comprises generating a score for the web page based on evaluating a quantity of the images in the web page that share a same Document Object Model (DOM) path, against a minimum value. 14. The system of claim 10 , wherein the operations further comprise determining that the score for the web page meets a predefined threshold; and wherein the web page is classified as a gallery web page in response to determining that the score for the web page meets the predefined threshold. 15. The system of claim 10 , wherein the operations comprise: labeling the web page that is classified as a gallery web page, as a gallery web page. 16. The system of claim 10 , wherein a gallery web page is a web page in which its principal content is images. 17. The system of claim 10 , wherein selecting the first subset of the images comprises selecting images that are not included in a boilerplate section of the web page and that are not included in an advertising section of the web page. 18. The system of claim 17 , wherein generating a score for the web page comprises generating a score for the web page based on evaluating an amount of text that is not included in the boilerplate section of the web page and that is not included in the advertising section of the web page, against a maximum value. 19. A computer storage medium encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving a web page that includes text and images; selecting a first subset of the images that are not excluded content-type images, wherein an excluded content-type image is an image that is boilerplate content or that is advertising content; determining, for each of the images in the first subset, (I) whether the image has a size ratio that is within a predetermined size ratio range, (II) whether the image has gr

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US8938441B2 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for classifying web pages as gallery web pages, and for presenting search results for gallery web pages. In one aspect, a method includes receiving a web page that includes text and one or more images, evaluating one or more characteristics of the web page against predefined criteria, generating a…
Who is the assignee on this patent?
Liao Yuguo, Wang Ning, Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F17/30864. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 20 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).