Training image-recognition systems based on search queries on online social networks

US10083379B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10083379-B2
Application numberUS-201615277950-A
CountryUS
Kind codeB2
Filing dateSep 27, 2016
Priority dateSep 27, 2016
Publication dateSep 25, 2018
Grant dateSep 25, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a method includes receiving a plurality of search queries comprising n-grams; identifying a subset of the plurality of search queries as being queries for visual-media items based on one or more n-grams of the search query being associated with visual-media content; calculating, for each of the n-grams of the search queries of the subset, a popularity-score based on a count of the search queries in the subset that include the n-gram; determining popular n-grams, wherein each of the popular n-grams is an n-gram of the search queries of the subset of search queries having a popularity-score greater than a threshold popularity-score; and selecting one or more of the popular n-grams for training a visual-concept recognition system, wherein each of the popular n-grams is selected based on whether it is associated with a visual concept.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising, by one or more computing systems: receiving, from a plurality of client systems of a plurality of users, a plurality of search queries, each search query comprising one or more n-grams; identifying a subset of search queries from the plurality of search queries as being queries for visual-media items, each of the search queries in the subset of search queries being identified based on one or more n-grams of the search query being associated with visual-media content; calculating, for each of the n-grams of the search queries of the subset of search queries, a popularity-score based on a count of the search queries in the subset of search queries that include the n-gram, wherein, for each of one or more of the n-grams of the search queries of the subset of search queries, the count of the search queries including the n-gram is a weighted count that weights an occurrence of each search query based on a degree of confidence with which the search query is identified as being a query for visual-media items; determining one or more popular n-grams, wherein each of the popular n-grams is an n-gram of the search queries of the subset of search queries having a popularity-score greater than a threshold popularity-score; and selecting one or more of the popular n-grams for training a visual-concept recognition system, wherein each of the popular n-grams is selected based on whether it is associated with one or more visual concepts. 2. The method of claim 1 , further comprising: accessing a social graph comprising a plurality of nodes and a plurality of edges connecting the nodes, each of the edges between two of the nodes representing a single degree of separation between them, the nodes comprising: a first node corresponding to a user associated with an online social network; and a plurality of second nodes that each correspond to a visual-media item or a visual concept associated with the online social network. 3. The method of claim 1 , wherein each of one or more of the search queries in the subset of search queries is further identified based on a number of visual-media items that match the search query. 4. The method of claim 1 , wherein each of one or more of the search queries in the subset of search queries is further identified based on a number of times that prior searches including one or more n-grams of the search query resulted in a user requesting to access a visual-media item. 5. The method of claim 1 , wherein each of one or more of the search queries in the subset of search queries is further identified based on whether it is a bounded search query that specifically filters for search results having visual-media content. 6. The method of claim 1 , wherein each of one or more of the search queries in the subset of search queries is further identified based on a search context from which the search query is submitted. 7. The method of claim 1 , wherein the weighted count further weights the occurrence of each search query based on information associated with a user of a client system from which the search query was received. 8. The method of claim 7 , wherein the information associated with the user comprises demographic information of the user. 9. The method of claim 7 , wherein the information associated with the user comprises geo-location information of the user, the geo-location information being determined based on a geo-location of the client system from which the search query is received. 10. The method of claim 7 , wherein the information associated with the user comprises a search history of the user. 11. The method of claim 1 , further comprising determining whether each of the popular n-grams is associated with a particular visual concept based on an analysis of a joint-embedding model, the analysis comprising, for each of the popular n-grams: determining if an embedding for the popular n-gram is within a threshold area of embeddings for one or more visual-media items that include the particular visual concept. 12. The method of claim 1 , further comprising selecting one or more visual concepts for training the visual-concept recognition system, wherein the selecting comprises: accessing distribution data that classifies visual-media items in a sample set as including one or more categories of visual concepts; estimating, based on the distribution data, projected frequencies for each of the one or more categories of visual concepts in a larger set of visual-media items, wherein each projected frequency describes a frequency of occurrence of the respective category of visual concepts in the larger set that are predicted to include one or more visual concepts of the respective category of visual concepts; and determining, based on the projected frequencies, whether there exists a representative number of n-gram associations with one or more visual concepts of each category of visual concepts. 13. The method of claim 12 , further comprising receiving inputs from one or more human evaluators specifying whether or not each of one or more of the visual concepts is capable of being described by n-grams. 14. The method of claim 1 , further comprising using a supervised training process to train the popular n-grams selected for training the visual-concept recognition system, wherein the supervised training process comprises receiving inputs from one or more human evaluators associating the popular n-grams with one or more visual concepts. 15. The method of claim 1 , further comprising: receiving, from a client system of a particular user, a search query comprising one or more n-grams that are associated with a particular visual concept; sending, to the client system of the particular user, one or more visual-media items that include the particular visual concept; and determining whether the client system of the particular user subsequently requests to access one or more of the visual-media items that include the particular visual concept. 16. The method of claim 1 , further comprising, for one or more n-grams, periodically updating the respective associations with visual concepts. 17. One or more computer-readable non-transitory storage media embodying software that is operable when executed to: receive, from a plurality of client systems of a plurality of users, a plurality of search queries, each search query comprising one or more n-grams; identify a subset of search queries from the plurality of search queries as being queries for visual-media items, each of the search queries in the subset of search queries being identified based on one or more n-grams of the search query being associated with visual-media content; calculate, for each of the n-grams of the search queries of the subset of search queries, a popularity-score based on a count of the search queries in the subset of search queries that include the n-gram, wherein, for each of one or more of the n-grams of the search queries of the subset of search queries, the count of the search queries including the n-gram is a weighted count that weights an occurrence of each search query based on a degree of confidence with which the search query is identified as being a query for visual-media items; determine one or more popular n-gram, wherein each of the popular n-grams is an n-gram of the search queries of the subset of search queries having a popularity-score greater than a threshold popularity-score; and select one or more of the popular n-grams for training a visual-concept recognition system, wherein each of the popular n-grams is select

Assignees

Inventors

Classifications

  • using information manually generated, e.g. tags, keywords, comments, manually generated location and time information · CPC title

  • Query formulation, e.g. graphical querying · CPC title

  • Knowledge engineering; Knowledge acquisition · CPC title

  • Learning methods · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10083379B2 cover?
In one embodiment, a method includes receiving a plurality of search queries comprising n-grams; identifying a subset of the plurality of search queries as being queries for visual-media items based on one or more n-grams of the search query being associated with visual-media content; calculating, for each of the n-grams of the search queries of the subset, a popularity-score based on a count o…
Who is the assignee on this patent?
Facebook Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/5866. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 25 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).