Utilizing machine-learning models to generate identifier embeddings and determine digital connections between digital content items

US12008065B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12008065-B2
Application numberUS-202318153960-A
CountryUS
Kind codeB2
Filing dateJan 12, 2023
Priority dateDec 22, 2020
Publication dateJun 11, 2024
Grant dateJun 11, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilize machine learning models to generate identifier embeddings from digital content identifiers and then leverage these identifier embeddings to determine digital connections between digital content items. In particular, the disclosed systems can utilize an embedding machine-learning model that comprises a character-level embedding machine-learning model and a word-level embedding machine-learning model. For example, the disclosed systems can combine a character embedding from the character-level embedding machine-learning model and a token embedding from the word-level embedding machine-learning model. The disclosed systems can determine digital connections between the plurality of digital content items by processing these identifier embeddings for a plurality of digital content items utilizing a content management model. Based on the digital connections, the disclosed systems can surface one or more digital content suggestions to a user interface of a client device.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: at least one processor; and at least one non-transitory computer-readable storage medium storing instructions that, when executed by the at least one processor, cause the system to: identify a plurality of identifiers corresponding to a plurality of digital content items; generate a plurality of identifier embeddings corresponding to the plurality of identifiers by utilizing one or more embedding machine-learning models; generate digital similarity predictions between the plurality of digital content items by processing the plurality of identifier embeddings utilizing a content management model; and determine a digital connection between a subset of digital content items of the plurality of digital content items based on the digital similarity predictions. 2. The system of claim 1 , wherein generating the plurality of identifier embeddings comprises generating at least one token representing a subset of individual characters within a given identifier of the plurality of identifiers. 3. The system of claim 2 , wherein the subset of individual characters represents a word within the given identifier. 4. The system of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the system to: determine, utilizing the content management model, a digital connection between a first identifier embedding from the plurality of identifier embeddings associated with a first digital content item and a second identifier embedding from the plurality of identifier embeddings associated with a second digital content item; and based on the digital connection, generate a suggestion related to at least one of the first digital content item or the second digital content item. 5. The system of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the system to generate, based on the digital connection between the subset of digital content items of the plurality of digital content items, a suggestion comprising at least one of a suggested team workspace, a suggested digital content item, or a suggested access privilege. 6. The system of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the system to generate the plurality of identifier embeddings by utilizing a word-level embedding machine-learning model and a character-level embedding machine-learning model. 7. The system of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the system to generate the plurality of identifier embeddings by: generating a word-level embedding utilizing a first embedding layer of a first neural network; and generating a character-level embedding utilizing a second embedding layer of a second neural network. 8. The system of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the system to generate a storage organization relationship suggestion for one or more digital content items from the subset of digital content items based on the digital connection between the subset of digital content items. 9. A non-transitory computer readable medium comprising instructions that, when executed by at least one processor, cause a computing device to: identify a plurality of identifiers corresponding to a plurality of digital content items, wherein the plurality of identifiers comprise a plurality of filenames associated with the plurality of digital content items; generate a plurality of identifier embeddings corresponding to the plurality of identifiers by utilizing one or more embedding machine-learning models; generate digital similarity predictions between the plurality of digital content items by processing the plurality of identifier embeddings utilizing a content management model; and determine a digital connection between a first digital content item and a second digital content item from the plurality of digital content items based on the digital similarity predictions. 10. The non-transitory computer readable medium as recited in claim 9 , wherein generating the plurality of identifier embeddings comprises generating a first token representing a word within a first filename associated with the first digital content item. 11. The non-transitory computer readable medium as recited in claim 9 , further comprising instructions that, when executed by the at least one processor, cause the computing device to: generate a suggestion related to the first digital content item or the second digital content item based on the digital connection between the first digital content item and the second digital content item; and provide the suggestion to a client device having access to the plurality of digital content items. 12. The non-transitory computer readable medium as recited in claim 9 , further comprising instructions that, when executed by the at least one processor, cause the computing device to generate, based on the digital connection between the first digital content item and the second digital content item, a suggestion comprising at least one of a suggested team workspace, a suggested digital content item, or a suggested access privilege. 13. The non-transitory computer readable medium as recited in claim 9 , further comprising instructions that, when executed by the at least one processor, cause the computing device to generate the plurality of identifier embeddings by utilizing a word-level embedding machine-learning model to process words within the plurality of filenames and a character-level embedding machine-learning model to process characters within the plurality of filenames. 14. The non-transitory computer readable medium as recited in claim 9 , further comprising instructions that, when executed by the at least one processor, cause the computing device to generate a suggestion for the first digital content item based on at least one of: a file extension embedding corresponding to the first digital content item; or a user activity embedding corresponding to user activity with respect to the first digital content item. 15. A computer-implemented method comprising: identifying a plurality of identifiers corresponding to a plurality of digital content items; generating a plurality of identifier embeddings corresponding to the plurality of identifiers by utilizing one or more embedding machine-learning models; generating digital similarity predictions between the plurality of digital content items by processing the plurality of identifier embeddings utilizing a content management model; and determining a digital connection between a subset of digital content items of the plurality of digital content items based on the digital similarity predictions. 16. The computer-implemented method of claim 15 , wherein generating the plurality of identifier embeddings comprises generating a token representing a subset of individual characters within a given identifier of the plurality of identifiers. 17. The computer-implemented method of claim 15 , further comprising providing, for display on a client device and based on the digital connection between the subset of digital content items of the plurality of digital content items, a suggestion comprising at least one of a suggested team workspace, a suggested digital content item, or a suggested access privilege. 18. The computer-implemented method of claim 15 , further comprising generating the plurality of identifier embeddings by utilizing a word-level embedding machine

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Details of searching files based on file metadata · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12008065B2 cover?
The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilize machine learning models to generate identifier embeddings from digital content identifiers and then leverage these identifier embeddings to determine digital connections between digital content items. In particular, the disclosed systems can utilize an embedding machine-learning model tha…
Who is the assignee on this patent?
Dropbox Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).