Methods and systems of classifying spam URL

US9378465B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9378465-B2
Application numberUS-201313872811-A
CountryUS
Kind codeB2
Filing dateApr 29, 2013
Priority dateApr 29, 2013
Publication dateJun 28, 2016
Grant dateJun 28, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of operation of a URL spam detection system includes: identifying a feature dimension of a user action on a social networking system to detect anomalies; extracting URL chunks from a content associated with the user action; aggregating a non-content feature of the user action along the feature dimension into a URL distribution store to produce a feature distribution for each of the URL chunks; determining whether the feature distribution of a particular URL chunk within the URL chunks exceeds an expectation threshold for the feature dimension; and classifying the particular URL chunk as an illegitimate URL when the feature distribution exceeds the expectation threshold to restrict access to a particular URL chunk on a social networking system.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: identifying a feature dimension on a social networking system to detect anomalies, the feature dimension being a non-content feature dimension; extracting URL chunks from content associated with a user action, wherein the user action records an interaction between a user account and a content object and wherein the user action is captured by an action logger of the social networking system; maintaining a plurality of feature distributions respectively corresponding to a plurality of unique URL chunks identified in content of a plurality of user actions occurring on the social networking system, wherein each of the feature distributions represents an aggregation of non-content features along the identified feature dimension across the plurality of user actions for a unique URL chunk of the plurality of unique URL chunks; aggregating a non-content feature of the user action along the identified feature dimension into a subset of the plurality of feature distributions respectively corresponding to the extracted URL chunks; determining whether a feature distribution of a particular URL chunk from the plurality of feature distributions of the URL chunks exceeds an expectation threshold for the feature dimension, wherein the expectation threshold corresponds to a characterization of an expected distribution along the identified feature dimension; and classifying the particular URL chunk as an illegitimate URL when the feature distribution exceeds the expectation threshold to restrict access to the particular URL chunk on a social networking system. 2. The method of claim 1 , wherein identifying the feature dimension includes identifying the feature dimension of one or more content sharing actions to disseminate content in the social networking system. 3. The method of claim 1 , wherein identifying the feature dimension includes identifying the feature dimension of one or more association actions of one or more user accounts to associate with content in the social networking system. 4. The method of claim 1 , wherein identifying the feature dimension includes identifying the feature dimension of one or more indirect association actions of one or more user accounts to associate with a social object affiliated with content in the social networking system. 5. The method of claim 1 , wherein aggregating the non-content feature includes aggregating within a time window wherein the feature distribution is a moving distribution along the feature dimension. 6. The method of claim 1 , further comprising determining the expectation threshold by machine learning against known reliable URL chunks and known spam URL chunks. 7. The method of claim 1 , further comprising determining the expectation threshold by machine learning against known spammer user accounts and known reliable user accounts. 8. The method of claim 1 , wherein the feature distribution is a binomial distribution of whether the non-content feature exists for the user action. 9. The method of claim 1 , wherein the feature distribution is a discrete distribution of enumerated states along the feature dimension. 10. The method of claim 1 , wherein the feature distribution is a continuous distribution along the feature dimension. 11. The method of claim 1 , wherein extracting the URL chunks includes extracting the URL chunks from an embedded URL and one or more redirects of the embedded URL, the URL chunks being one or more subsets of the embedded URL delimited by one or more punctuations. 12. The method of claim 11 , wherein classifying the particular URL chunk is based on classification of a related URL chunk in a sibling family tree of the particular URL chunk, the sibling family tree and the particular URL chunk sharing a parent domain URL chunk. 13. A method, comprising: identifying a feature dimension on a social networking system to detect anomalies; extracting URL chunks from content associated with a user action, wherein the user action is an interaction between a user account and a content object and wherein the user action is captured by an action logger of the social networking system; aggregating a sender feature of the user action along the identified feature dimension into a plurality of feature distributions respectively corresponding the extracted URL chunks; detecting an anomaly in a feature distribution of a particular URL chunk, the feature distribution from the plurality of feature distributions of the extracted URL chunks, wherein said detecting includes comparing the feature distribution to an expected distribution along the feature dimension; and raising a suspicion level of the particular URL chunk when the anomaly is detected. 14. The method of claim 13 , wherein the expected distribution is a superset feature distribution of a superset URL chunk containing the particular URL chunk. 15. The method of claim 13 , wherein the expected distribution is a white list feature distribution of known reliable URL chunks. 16. The method of claim 13 , wherein raising the suspicion level includes raising the suspicion level when a pre-defined number of anomalies are detected along multiple feature dimensions. 17. The method of claim 13 , wherein raising the suspicion level includes classifying the particular URL chunk under a specific type of illegitimate sharing channel. 18. The method of claim 13 , wherein raising the suspicion level includes storing the suspicion level associated with the particular URL chunk in a classification table for a filter module restricting execution of the user action. 19. The method of claim 13 , further comprising: tracking the feature distribution to determine whether the anomaly of the feature distribution subsides within an acceptable threshold range of the expected distribution; and lowering the suspicion level when the anomaly subsides. 20. A processor-based system, comprising: a feature collector module stored on a non-transitory memory, when executed by a processor is configured to: identify a feature dimension on a social networking system, the feature dimension being a non-content feature dimension; extract URL chunks from content associated with a user action, wherein the user action is an interaction between a user account and a content object and wherein the user action is captured by an action logger of the social networking system; aggregate a sender feature of the user action along the feature dimension into a plurality of feature distributions respectively corresponding to the extracted URL chunks, the plurality of feature distributions stored in a URL distribution store; and a URL classifier module stored on a non-transitory memory, when executed by a processor is coupled to the feature collection module via the URL distribution store and configured to: detect an anomaly in a feature distribution of a particular URL chunk, the feature distribution from the plurality of feature distributions of the extracted URL chunks, by comparing the feature distribution to an expected distribution; and raise a suspicion level of the particular URL chunk when the anomaly is detected. 21. The method of claim 1 , wherein the expectation threshold corresponds to an expected range, expected mean, expected median, an expected mode, an expected variance, or any combination thereof, of the feature distribution.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9378465B2 cover?
A method of operation of a URL spam detection system includes: identifying a feature dimension of a user action on a social networking system to detect anomalies; extracting URL chunks from a content associated with the user action; aggregating a non-content feature of the user action along the feature dimension into a URL distribution store to produce a feature distribution for each of the URL…
Who is the assignee on this patent?
Facebook Inc
What technology area does this patent fall under?
Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 28 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).