Search spam analysis and detection

US8972401B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-8972401-B2
Application numberUS-96612010-A
CountryUS
Kind codeB2
Filing dateDec 13, 2010
Priority dateMay 31, 2007
Publication dateMar 3, 2015
Grant dateMar 3, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Defeating click-through cloaking includes retrieving a search results page to set a browser variable, inserting a link to a page into the search results page and clicking through to the page using the inserted link. Investigating cloaking includes providing script associated with a suspected spam URL, modifying the script to de-obfuscate the script and executing the modified script to reveal cloaking logic associated with the script.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method comprising: performing a search using a search engine to generate search results, the search results including a uniform resource locator (URL); performing server-side cloaking based at least in part on the URL, the server-side cloaking comprising: accessing, by a processor, a first web page from the URL by clicking the URL included in the search results, the first web page containing spam content; accessing, by the processor, a second web page from the URL without clicking the URL included in the search results, the second web page containing non-spam content; comparing, by the processor, the first web page and the second web page; and identifying, based at least on the comparing, the URL as a suspected spam URL based on a difference between the first web page and the second web page. 2. The computer-implemented method of claim 1 further comprising reducing false positives of the URL identified as the suspected spam URL. 3. The computer-implemented method of claim 1 further comprising, responsive to the URL being identified as a suspected spam URL, manually investigating the URL to determine that the URL is serving spam. 4. The computer-implemented method of claim 1 further comprising adding a domain of the URL to a set of known spam domains for exclusion from search results provided by a search engine. 5. A method comprising: receiving at least one URL at a computing device; conducting, by a processor, a first scan of the at least one URL by directly visiting the at least one URL; recording a first scan result based at least on the first scan; conducting, by the processor, a second scan of the at least one URL by clicking-through the at least one URL from a search results page; recording a second scan result based at least on the second scan; comparing, by the processor, the first scan result and the second scan result; and determining, based at least on the comparing, that the at least one URL is a suspect spam URL when the second scan result shows a vector associated with a redirection domain that is different from a vector associated with a redirection domain from the first scan result. 6. The method of claim 5 , wherein conducting the second scan further comprises modifying a referrer field in an HTTP header to indicate a referrer URL from a search engine web site. 7. The method of claim 5 , wherein conducting the first scan further comprises recording URL redirections as an XML file and conducting the second scan further comprises recording URL redirections as an XML file. 8. The method of claim 7 further comprising applying a redirection analysis to the XML file to classify a URL that redirected to a known spammer redirection domain as spam. 9. The method of claim 5 further comprising, responsive to the at least one URL being determined to be the suspect spam URL, manually investigating the at least one URL to determine that the at least one URL is serving spam. 10. The method of claim 5 further comprising adding a domain of the at least one URL to a set of known spam domains for exclusion from search results provided by a search engine. 11. The method of claim 5 further comprising reducing false positives of the at least one URL determined to be the suspect spam URL. 12. The method of claim 5 , wherein directly visiting the at least one URL comprises entering the URL in an input window and not clicking the at least one URL. 13. A computing device comprising: computer storage media; and one or more processors in communication with the computer storage media, wherein the computer storage media stores computer-readable instructions executable by the one or more processors to perform operations comprising: performing a search using a search engine to generate search results, the search results including a uniform resource locator (URL); performing server-side cloaking based at least in part on the URL, the server-side cloaking comprising: accessing, by a processor, a first web page associated with the URL in the search results by clicking the URL included in the search results, the first web page containing spam content; accessing, by the processor, a second web page associated with the URL directly, wherein directly accessing the second web page comprises accessing the second web page associated with the URL independent of clicking the URL included in the search results, the second web page containing non-spam content; comparing, by the processor, the first web page and the second web page; identifying, based at least on the comparing, the URL as a suspected spam URL based on a difference between the first web page and the second web page. 14. The computing device of claim 13 , wherein the operations further comprise reducing false positives that the URL is the suspected spam URL. 15. The computing device of claim 14 , wherein directly accessing the second web page associated with the URL comprises entering the URL in an input window and not clicking the at least one URL. 16. The computing device of claim 13 , wherein the difference between the first web page and the second web page indicates that the URL is the suspected spam URL is determined by comparison of vectors for redirection domains determined from accessing the first web page and accessing the second web page. 17. The computing device of claim 13 , wherein the operations further comprise adding a domain of the URL to a set of known spam domains for exclusion from search results provided by a search engine.

Assignees

Inventors

Classifications

  • Authenticating web pages, e.g. with suspicious links · CPC title

  • Computer malware detection or handling, e.g. anti-virus arrangements · CPC title

  • Electricity · mapped topic

  • Filtering policies (mail message filtering H04L51/212) · CPC title

  • Event detection, e.g. attack signature detection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US8972401B2 cover?
Defeating click-through cloaking includes retrieving a search results page to set a browser variable, inserting a link to a page into the search results page and clicking through to the page using the inserted link. Investigating cloaking includes providing script associated with a suspected spam URL, modifying the script to de-obfuscate the script and executing the modified script to reveal cl…
Who is the assignee on this patent?
Wang Yi-Min, Ma Ming, Microsoft Corp
What technology area does this patent fall under?
Primary CPC classification H04L63/1416. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Mar 03 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).