Filtering automated selection of keywords for computer modeling

US10353963B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10353963-B2
Application numberUS-201414577945-A
CountryUS
Kind codeB2
Filing dateDec 19, 2014
Priority dateDec 19, 2014
Publication dateJul 16, 2019
Grant dateJul 16, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A social networking system receives messages from users that include links to webpages that designate keywords of the webpage. The social networking system identifies webpages linked by users to generate computer models that predict whether a webpage or message should be associated with particular keywords. The social networking system generates computer models that are trained on example webpages and related keywords linked by users in messages. Prior to generating computer models, the social networking system applies one or more filters to exclude webpages and keywords from consideration. The filters may exclude webpages that have low-reliability, are associated with an excessive number of keywords, or keywords that appear on an insufficient number of domains. After training the computer models, messages composed by users may be analyzed and a keyword predicted for the message, which may be suggested to the user to categorize the message.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: identifying a set of webpages linked by a set of messages in a social networking system; identifying a plurality of keywords designated by the set of web pages; applying one or more exclusionary filters to identify a set of eligible keywords by excluding keywords in the plurality of keywords that meet at least one of the exclusionary filters; training a plurality of keyword classifiers, each keyword classifier corresponding to a single eligible keyword in the set of eligible keywords, the keyword classifier providing a classifier output indicating whether a message should be associated with the eligible keyword corresponding to the keyword classifier; and identifying one or more keywords for a subject message based on the classifier outputs of the plurality of keyword classifiers applied to the subject message. 2. The method of claim 1 , wherein the one or more exclusionary filters include a frequency filter that excludes a keyword from the set of eligible keywords based on a percentage that the keyword is used by a set of webpages in a domain. 3. The method of claim 2 , wherein the frequency filter excludes the keyword from the set of eligible keywords when the frequency that the keyword is used by the set of webpages in the domain is higher than a frequency threshold. 4. The method of claim 3 , wherein the frequency threshold is seventy percent. 5. The method of claim 1 , wherein the one or more exclusionary filters include a popularity filter that excludes a keyword from the set of eligible keywords based on a number of domains for the set of webpages that include the keyword. 6. The method of claim 5 , wherein the popularity filter excludes the keyword from the set of eligible keywords when the number of domains for the set of webpages that include the keyword is less than 10. 7. The method of claim 1 , wherein the keywords are predicted based on a webpage linked in the subject message. 8. The method of claim 1 , wherein the subject message includes a link to a subject webpage, and the plurality of keyword classifiers are configured to provide the classifier output based on a webpage linked by the subject webpage. 9. The method of claim 1 , wherein the subject message does not include a link to a webpage. 10. The method of claim 1 , wherein the subject message is received from a composing user, and further comprising suggesting to the composing user that the composing user include the one or more identified keywords in the subject message. 11. The method of claim 1 , further comprising identifying one or more topics for the subject message based on the one or more identified keywords for the subject message. 12. The method of claim 1 , further comprising identifying one or more social networking objects for the subject message based on the one or more identified keywords for the subject message. 13. A non-transitory computer-readable medium comprising instructions for execution by a processor, the instructions causing the processor to perform steps of: identifying a set of webpages linked by a set of messages in a social networking system; identifying a plurality of keywords designated by the set of web pages; applying one or more exclusionary filters to identify a set of eligible keywords by excluding keywords in the plurality of keywords that meet at least one of the exclusionary filters; training a plurality of keyword classifiers, each keyword classifier corresponding to a single eligible keyword in the set of eligible keywords, the keyword classifier providing a classifier output indicating whether a message should be associated with the eligible keyword corresponding to the keyword classifier; and identifying one or more keywords for a subject message based on the classifier outputs of the plurality of keyword classifiers applied to the subject message. 14. The non-transitory computer-readable medium of claim 13 , wherein the one or more exclusionary filters include a frequency filter that excludes a keyword from the set of eligible keywords based on a percentage that the keyword is used by a set of webpages in a domain. 15. The non-transitory computer-readable medium of claim 14 , wherein the frequency filter excludes the keyword from the set of eligible keywords when the frequency that the keyword is used by the set of webpages in the domain is higher than a frequency threshold. 16. The non-transitory computer-readable medium of claim 15 , wherein the frequency threshold is seventy percent. 17. The non-transitory computer-readable medium of claim 13 , wherein the one or more exclusionary filters include a popularity filter that excludes a keyword from the set of eligible keywords based on a number of domains for the set of webpages that include the keyword. 18. The non-transitory computer-readable medium of claim 17 , wherein the popularity filter excludes the keyword from the set of eligible keywords when the number of domains for the set of webpages that include the keyword is less than 10. 19. The non-transitory computer-readable medium of claim 13 , wherein the keywords are predicted based on a webpage linked in the subject message. 20. The non-transitory computer-readable medium of claim 13 , wherein the subject message includes a link to a subject webpage, and the plurality of keyword classifiers are configured to provide the classifier output based on a webpage linked by the subject webpage. 21. The non-transitory computer-readable medium of claim 13 , wherein the subject message does not include a link to a webpage. 22. The non-transitory computer-readable medium of claim 13 , wherein the subject message is received from a composing user, and the steps further comprising suggesting to the composing user that the composing user include the one or more identified keywords in the subject message. 23. The non-transitory computer-readable medium of claim 13 , the steps further comprising identifying one or more topics for the subject message based on the one or more identified keywords for the subject message. 24. The non-transitory computer-readable medium of claim 13 , the steps further comprising identifying one or more social networking objects for the subject message based on the one or more identified keywords for the subject message.

Assignees

Inventors

Classifications

  • Machine learning · CPC title

  • G06F16/951Primary

    Indexing; Web crawling techniques · CPC title

  • Search customisation based on user profiles and personalisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10353963B2 cover?
A social networking system receives messages from users that include links to webpages that designate keywords of the webpage. The social networking system identifies webpages linked by users to generate computer models that predict whether a webpage or message should be associated with particular keywords. The social networking system generates computer models that are trained on example webpa…
Who is the assignee on this patent?
Facebook Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 16 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).