Content type embeddings

US12099566B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12099566-B2
Application numberUS-201916675671-A
CountryUS
Kind codeB2
Filing dateNov 6, 2019
Priority dateSep 23, 2019
Publication dateSep 24, 2024
Grant dateSep 24, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for learning and using content type embeddings. The content type embeddings have the useful property that a distance in an embedding space between two content type embeddings corresponds to a semantic similarity between the two content types represented by the two content type embeddings. The closer the distance in the space, the more the two content types are semantically similar. The farther the distance in the space, the less the two content types are semantically similar. The learned content type embeddings can be used in a content suggestion system as machine learning features to improve content suggestions to end-users.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method comprising: generating a corpus of entity sequences comprising a plurality of content items in a content management system that are associated with a plurality of content types and that satisfy a recency heuristic indicating a recency of access within a period of time; filtering the corpus of entity sequences to generate filtered entity sequences comprising a subset of the plurality of content items by removing non-content item entities from the corpus of entity sequences; determining a plurality of pairwise co-occurrence instance counts for the plurality of content types reflecting pairwise co-occurrence instances among the plurality of content types in the filtered entity sequences; generating a plurality of content type embeddings for the filtered entity sequences based on the plurality of pairwise co-occurrence instance counts; counting, as one of the plurality of pairwise co-occurrence instance counts, a co-occurrence instance count for a first instance of a pair of different content types co-occurring in a same content item sharing operation facilitated by the content management system; counting, as one of the plurality of pairwise co-occurrence instance counts, a co-occurrence instance count for a second instance of a pair of different content types co-occurring in a folder of a centrally-hosted network filesystem of the content management system based on the pair of different content types being contained in the folder of the centrally-hosted network filesystem for an amount of time; weighing, in the plurality of pairwise co-occurrence instance counts, the co-occurrence instance count for first instance and the co-occurrence instance count for second instance numerically differently based on a type of co-occurrence instance count for the first instance and a type of co-occurrence instance count for the second instance; generating magnitudes of distances in a multi-dimensional embedding space between the plurality of content type embeddings corresponding to a semantic similarity between the filtered entity sequences; and providing, for display on a client device and based on the magnitudes of distances between the plurality of content type embeddings, a content item suggestion that references at least one particular content item. 2. The computer-implemented method of claim 1 , further comprising: counting, as one of the plurality of pairwise co-occurrence instance counts, an instance of a pair of different content types co-occurring in a folder of a centrally-hosted network filesystem of the content management system based on the pair of different content types being contained in the folder of the centrally-hosted network filesystem for a threshold amount of time, wherein the centrally-hosted network filesystem is hierarchical with multiple folders located hierarchically with a parent folder and a child folder and the pair of different content types are contained in the folder of the centrally-hosted network filesystem for a threshold amount of time. 3. The computer-implemented method of claim 1 , further comprising counting, as one of the plurality of pairwise co-occurrence instance counts, an instance of a pair of different content types that are not functionally integrated co-occurring in a same content item upload session or synchronization session associated with the content management system based on being uploaded or synchronized within a threshold amount of time. 4. The computer-implemented method of claim 1 , further comprising counting, as one of the plurality of pairwise co-occurrence instance counts, an instance of a pair of different content types co-occurring in a same filtered translated entity sequence generated from a content management system graph, or a sub-graph thereof. 5. The computer-implemented method of claim 1 , further comprising learning the plurality of content type embeddings based on a generative model according to a generative process. 6. The computer-implemented method of claim 1 , further comprising using a particular content type embedding of the plurality of content type embeddings as an input machine learning feature to a machine learning content suggestion model to make a prediction about a particular content item, the particular content item being a particular content type, the particular content type represented by the particular content type embedding. 7. The computer-implemented method of claim 1 , wherein determining the plurality of pairwise co-occurrence instance counts for the plurality of content types is based on a random sampling of co-occurrence data collected during a period of time. 8. One or more non-transitory computer-readable media storing one or more computer programs having instructions which, when executed by a computing system having one or more processors, cause the computing system to: generate filtered entity sequences comprising a subset of a plurality of content items by filtering a corpus of entity sequences to satisfy a recency heuristic indicating a recency of access within a period of time and filtering by removing non-content item entities from the corpus of entity sequences, wherein the corpus of entity sequences comprises a plurality of content items in a content management system that are associated with a plurality of content types; generate a plurality of content type embeddings for the filtered entity sequences based on pairwise co-occurrence instances among the plurality of content types in the filtered entity sequences; count, as one of the pairwise co-occurrence instances, a co-occurrence instance for a first instance of a pair of different content types co-occurring in a same content item sharing operation facilitated by the content management system; count, as one of the pairwise co-occurrence instances, a co-occurrence instance count for a second instance of a pair of different content types co-occurring in a folder of a centrally-hosted network filesystem of the content management system based on the pair of different content types being contained in the folder of the centrally-hosted network filesystem for an amount of time; weigh, in the pairwise co-occurrence instances, the co-occurrence instance for first instance and the co-occurrence instance for second instance numerically differently based on a type of co-occurrence instance for the first instance and a type of co-occurrence instance for the second instance; train a machine learning content suggestion model based on a set of training examples, the set of training examples having the plurality of content type embeddings as machine learning features of the set of training examples, wherein a magnitude of a distance in a multi-dimensional embedding space between a pair of content type embeddings, of the plurality of content type embeddings, corresponds to a semantic similarity between a pair of content types, of the plurality of content types, represented by the pair of content type embeddings; and use a particular content type embedding of the plurality of content type embeddings as an input machine learning feature to a machine learning content suggestion model to make a prediction about a particular content item, the particular content item being a particular content type, the particular content type represented by the particular content type embedding. 9. The one or more non-transitory computer-readable media of claim 8 , further comprising instructions, that when executed by the computing system, cause the computing system to: store a plurality of pairwise co-occurrence instance counts for a plurality of content types reflecting pairwise co-occurrence instances in a content management system among the plurality of content types

Assignees

Inventors

Classifications

  • using classification, e.g. of video objects · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Probabilistic or stochastic networks · CPC title

  • Distances to cluster centroïds · CPC title

  • Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12099566B2 cover?
Techniques for learning and using content type embeddings. The content type embeddings have the useful property that a distance in an embedding space between two content type embeddings corresponds to a semantic similarity between the two content types represented by the two content type embeddings. The closer the distance in the space, the more the two content types are semantically similar. T…
Who is the assignee on this patent?
Dropbox Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/9577. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 24 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).