Detecting content items in violation of an online system policy using semantic vectors

US11195099B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11195099-B2
Application numberUS-201715694321-A
CountryUS
Kind codeB2
Filing dateSep 1, 2017
Priority dateSep 1, 2017
Publication dateDec 7, 2021
Grant dateDec 7, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A content review system for an online system automatically determines if received content items to be displayed to users violate any policies of the online system. The content review system generates a semantic vector representing the semantic features of a content item, for example, using a neural network. By comparing the semantic vector for the content item with semantic vectors of content items previously determined to violate one or more policies, the content review system determines whether the content item also violates one or more policies. The content review system may also maintain templates corresponding to portions of semantic vectors shared by multiple content items. An analysis of historical content items that conform to the template is performed to determine a probability that received content items that conform to the template violate a policy.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving, by an online system associated with a plurality of policies, a request to determine whether an input content item violates any of the plurality of policies, each policy indicating a type of content item considered suitable or unsuitable for presenting to users of the online system; accessing one or more data stores storing, for each of the plurality of policies, a respective set of semantic vectors corresponding to content items previously determined to violate the policy; determining a semantic vector of the input content item, by: receiving the input content item as input at a deep neural network having a plurality of layers; determining the semantic vector representing the input content item based on the output of a hidden layer of the deep neural network; determining a plurality of distance metric values, each corresponding to a distance between the semantic vector of the input content item and one or more semantic vectors of the set of semantic vectors corresponding to a respective policy of the plurality of policies; and responsive to determining that a distance metric value of the plurality of distance metric values associated with a particular policy of the plurality of policies is below a first threshold value indicating a distance at which a likelihood that the input content item violates the particular policy is at least a first threshold probability, automatically withholding the content item from users of the online system. 2. The method of claim 1 , wherein the semantic vector representing the input content item is obtained by applying a hash function to the output of the hidden layer of the neural network. 3. The method of claim 1 , wherein the neural network is configured to output one or more scores indicating a probability that the input content item contains one or more corresponding semantic features. 4. The method of claim 1 , further comprising: receiving a request to determine whether a second input content item violates any of the plurality of policies; determining a second plurality of distance metric values, each corresponding to a distance between a semantic vector of the second input content item and one or more semantic vectors of the set of semantic vectors corresponding to a respective policy of the plurality of policies; determining that a distance metric value of the second plurality distance metric values associated with the particular policy is above a second threshold value, determining that the second input content item does not violate the particular policy; and responsive to determining that the second content item does not violate any of the policies of the plurality of policies, sending the second content item to the users of the online system. 5. The method of claim 4 , wherein the second threshold value is greater than the first threshold value, and indicates a distance at which a likelihood that the second input content item violates the particular is at least a second threshold probability that is lower than the first threshold probability. 6. The method of claim 1 , further comprising: responsive to determining that the distance metric value is above the first threshold value but below a second threshold value, flagging the content item for review. 7. The method of claim 1 , further comprising: responsive to determining that the distance metric value is below the first threshold value, storing the semantic vector of the input content item as part of the set of semantic vectors. 8. The method of claim 1 , wherein the semantic vector corresponds to a binary hash value, and wherein the distance metric value corresponds to a hamming distance. 9. A method comprising: receiving, by an online system associated with a plurality of policies, a request to determine whether an input content item violates any of the plurality of policies, each policy indicating a type of content item considered suitable or unsuitable for presenting to users of the online system; accessing one or more data stores storing, for each of the plurality of policies, a respective set of semantic vectors corresponding to content items previously determined to violate the policy; determining a semantic vector of the input content item using a deep neural network having a plurality of layers; determining a plurality of distance metric values, each corresponding to a distance between the semantic vector of the input content item and one or more semantic vectors of the set of semantic vectors corresponding to a respective policy of the plurality of policies; and responsive to determining that a distance metric value of the plurality of distance metric values associated with a particular policy of the plurality of policies is below a first threshold value indicating a distance at which a likelihood that the input content item violates the particular policy is at least a first threshold probability, automatically withholding the content item from users of the online system. 10. The method of claim 9 , further comprising: receiving a request to determine whether a second input content item violates any of the plurality of policies; determining a second plurality of distance metric values, each corresponding to a distance between a semantic vector of the second input content item and one or more semantic vectors of the set of semantic vectors corresponding to a respective policy of the plurality of policies; determining that a distance metric value of the second plurality distance metric values associated with the particular policy is above a second threshold value, determining that the second input content item does not violate the particular policy; and responsive to determining that the second content item does not violate any of the policies of the plurality of policies, sending the second content item to the users of the online system. 11. The method of claim 9 , further comprising: responsive to determining that the distance metric value is above the first threshold value but below a second threshold value, flagging the content item for review. 12. The method of claim 9 , further comprising: responsive to determining that the distance metric value is below the first threshold value, storing the semantic vector of the input content item as part of the set of semantic vectors. 13. The method of claim 9 , wherein the semantic vector corresponds to an embedding associated with the input content item obtained using the deep neural network. 14. The method of claim 9 , wherein the semantic vector corresponds to a binary hash value, and wherein the distance metric value corresponds to a hamming distance. 15. The method of claim 9 , wherein the input content item comprises an image. 16. A computer readable non-transitory storage medium, storing instructions for: receiving, by an online system associated with a plurality of policies, a request to determine whether an input content item violates any of the plurality of policies, each policy indicating a type of content item considered suitable or unsuitable for presenting to users of the online system; accessing one or more data stores storing, for each of the plurality of policies, a respective set of semantic vectors corresponding to content items previously determined to violate the policy; determining a semantic vector of the input content item using a deep neural network having a plurality of layers; determining a plurality of distance metric values, each corresponding to a distance between the semantic vector of the input content item and one or more semantic

Assignees

Inventors

Classifications

  • Business processes related to social networking or social networking services · CPC title

  • Knowledge-based neural networks; Logical representations of neural networks · CPC title

  • Combinations of networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11195099B2 cover?
A content review system for an online system automatically determines if received content items to be displayed to users violate any policies of the online system. The content review system generates a semantic vector representing the semantic features of a content item, for example, using a neural network. By comparing the semantic vector for the content item with semantic vectors of content i…
Who is the assignee on this patent?
Facebook Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/6218. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 07 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).