What technology area does this patent fall under?

Primary CPC classification G06F16/958. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Utilizing machine-learning models to generate identifier embeddings and determine digital connections between digital content items

US11568018B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11568018-B2
Application number	US-202017131488-A
Country	US
Kind code	B2
Filing date	Dec 22, 2020
Priority date	Dec 22, 2020
Publication date	Jan 31, 2023
Grant date	Jan 31, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilize machine learning models to generate identifier embeddings from digital content identifiers and then leverage these identifier embeddings to determine digital connections between digital content items. In particular, the disclosed systems can utilize an embedding machine-learning model that comprises a character-level embedding machine-learning model and a word-level embedding machine-learning model. For example, the disclosed systems can combine a character embedding from the character-level embedding machine-learning model and a token embedding from the word-level embedding machine-learning model. The disclosed systems can determine digital connections between the plurality of digital content items by processing these identifier embeddings for a plurality of digital content items utilizing a content management model. Based on the digital connections, the disclosed systems can surface one or more digital content suggestions to a user interface of a client device.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: at least one processor; and at least one non-transitory computer-readable storage medium storing instructions that, when executed by the at least one processor, cause the system to: identify a plurality of identifiers associated with a plurality of digital content items of a content management system; generate a plurality of identifier embeddings by, for each identifier of the plurality of identifiers: generating one or more tokens, each token comprising multiple characters within the identifier, generating a token embedding based on the one or more tokens for the identifier, generating a character embedding based on individual characters within the identifier, and combining the token embedding and the character embedding to generate an identifier embedding; and determine a digital connection between a subset of digital content items of the plurality of digital content items by processing the plurality of identifier embeddings utilizing a content management model. 2. The system of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the system to generate the one or more tokens by utilizing lexical rules based on character casing and delimiters to group a subset of the individual characters within the identifier into one or more words. 3. The system of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the system to: detect user activity with respect to a first digital content item of the plurality of digital content items; generate the plurality of identifier embeddings by generating a first identifier embedding for the first digital content item and a second identifier embedding for a second digital content item; determine, utilizing the content management model, a digital connection between the first identifier embedding and the second identifier embedding; and based on the digital connection, generate one or more suggestions related to at least one of the first digital content item or the second digital content item. 4. The system of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the system to generate, via the content management model and based on the digital connection between the subset of digital content items of the plurality of digital content items, one or more suggestions comprising at least one of a suggested team workspace, a suggested digital content item; or a suggested access privilege. 5. The system of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the system to: generate the token embedding by processing the one or more tokens utilizing a word-level embedding machine-learning model; and generate the character embedding by processing the individual characters utilizing a character-level embedding machine-learning model. 6. The system of claim 5 , further comprising instructions that, when executed by the at least one processor, cause the system to: generate the token embedding by processing the one or more tokens utilizing a first embedding layer and a first recurrent neural network of the word-level embedding machine-learning model; and generate the character embedding by processing the individual characters utilizing a second embedding layer and a second recurrent neural network of the character-level embedding machine-learning model. 7. The system of claim 5 , further comprising instructions that, when executed by the at least one processor, cause the system to generate a plurality of training identifier embeddings by: generating a plurality of training character embeddings; generating a plurality of training token embeddings; and combining the plurality of training character embeddings and the plurality of training token embeddings. 8. The system of claim 7 , further comprising instructions that, when executed by the at least one processor, cause the system to train the character-level embedding machine-learning model and the word-level embedding machine-learning model by: generating digital similarity predictions between a plurality of training digital content items by processing the plurality of training identifier embeddings utilizing a trained machine-learning model; and learning parameters of the character-level embedding machine-learning model and the word-level embedding machine-learning model by comparing the digital similarity predictions with ground truth similarity metrics. 9. A system comprising: at least one processor; and at least one non-transitory computer-readable storage medium storing instructions that, when executed by the at least one processor, cause the system to: identify a plurality of identifiers associated with a plurality of digital content items of a content management system; generate a plurality of identifier embeddings by, for each identifier of the plurality of identifiers: generating one or more tokens, each token comprising multiple characters within the identifier, and generating an identifier embedding by processing individual characters within the identifier and the one or more tokens utilizing one or more embedding machine-learning models; generate digital similarity predictions between the plurality of digital content items by processing the plurality of identifier embeddings utilizing a trained machine-learning model; and learn parameters of the one or more embedding machine-learning models by comparing the digital similarity predictions with ground truth similarity metrics. 10. The system of claim 9 , further comprising instructions that, when executed by the at least one processor, cause the system to learn the parameters for the one or more embedding machine-learning models by: generating a first identifier embedding by combining a first token embedding and a first character embedding corresponding to a first identifier; generating a second identifier embedding by combining a second token embedding and a second character embedding corresponding to a second identifier; and generating a combined identifier embedding for the trained machine-learning model by combining the first identifier embedding and the second identifier embedding. 11. The system of claim 10 , further comprising instructions that, when executed by the at least one processor, cause the system to learn the parameters for the one or more embedding machine-learning models by: generating a digital similarity prediction between the first identifier and the second identifier by processing the combined identifier embedding utilizing the trained machine-learning model; and determining a loss by comparing the digital similarity prediction and a ground truth similarity metric utilizing a loss function. 12. The system of claim 10 , further comprising instructions that, when executed by the at least one processor, cause the system to generate the digital similarity prediction by utilizing the trained machine-learning model to generate a file relation prediction between the first identifier and the second identifier, the file relation prediction comprising at least one of a parent-child file relation prediction or a sibling file relation prediction. 13. The system of claim 9 , further comprising instructions that, when executed by the at least one processor, cause the system to: detect user activity with respect to a first digital content item; generate, utilizing the one or more embedding machine-learning models, a first identifier embedding for the first digital content item and a second identifier embedding for a second digital content item; and

Assignees

Dropbox Inc

Inventors

Classifications

G06F16/958Primary
Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking · CPC title
G06N20/00
Machine learning · CPC title
G06F40/284Primary
Lexical analysis, e.g. tokenisation or collocates · CPC title
G06N3/0454
Physics · mapped topic
G06N3/09
Supervised learning · CPC title

Patent family

Related publications grouped by family.

View patent family 82021307

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11568018B2 cover?: The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilize machine learning models to generate identifier embeddings from digital content identifiers and then leverage these identifier embeddings to determine digital connections between digital content items. In particular, the disclosed systems can utilize an embedding machine-learning model tha…
Who is the assignee on this patent?: Dropbox Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/958. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Resource selection determination using natural language processing based tiered clustering

Generating vector representations of documents

Embedded learning for response prediction in content item relevance

Dictionary DGA detector model

Determining collaboration recommendations from file path information

Content recommendation system using a neural network language model

Frequently asked questions