Resource selection determination using natural language processing based tiered clustering
US-11442976-B1 · Sep 13, 2022 · US
US11568018B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11568018-B2 |
| Application number | US-202017131488-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 22, 2020 |
| Priority date | Dec 22, 2020 |
| Publication date | Jan 31, 2023 |
| Grant date | Jan 31, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to systems, methods, and non-transitory computer-readable media that utilize machine learning models to generate identifier embeddings from digital content identifiers and then leverage these identifier embeddings to determine digital connections between digital content items. In particular, the disclosed systems can utilize an embedding machine-learning model that comprises a character-level embedding machine-learning model and a word-level embedding machine-learning model. For example, the disclosed systems can combine a character embedding from the character-level embedding machine-learning model and a token embedding from the word-level embedding machine-learning model. The disclosed systems can determine digital connections between the plurality of digital content items by processing these identifier embeddings for a plurality of digital content items utilizing a content management model. Based on the digital connections, the disclosed systems can surface one or more digital content suggestions to a user interface of a client device.
Opening claim text (preview).
What is claimed is: 1. A system comprising: at least one processor; and at least one non-transitory computer-readable storage medium storing instructions that, when executed by the at least one processor, cause the system to: identify a plurality of identifiers associated with a plurality of digital content items of a content management system; generate a plurality of identifier embeddings by, for each identifier of the plurality of identifiers: generating one or more tokens, each token comprising multiple characters within the identifier, generating a token embedding based on the one or more tokens for the identifier, generating a character embedding based on individual characters within the identifier, and combining the token embedding and the character embedding to generate an identifier embedding; and determine a digital connection between a subset of digital content items of the plurality of digital content items by processing the plurality of identifier embeddings utilizing a content management model. 2. The system of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the system to generate the one or more tokens by utilizing lexical rules based on character casing and delimiters to group a subset of the individual characters within the identifier into one or more words. 3. The system of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the system to: detect user activity with respect to a first digital content item of the plurality of digital content items; generate the plurality of identifier embeddings by generating a first identifier embedding for the first digital content item and a second identifier embedding for a second digital content item; determine, utilizing the content management model, a digital connection between the first identifier embedding and the second identifier embedding; and based on the digital connection, generate one or more suggestions related to at least one of the first digital content item or the second digital content item. 4. The system of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the system to generate, via the content management model and based on the digital connection between the subset of digital content items of the plurality of digital content items, one or more suggestions comprising at least one of a suggested team workspace, a suggested digital content item; or a suggested access privilege. 5. The system of claim 1 , further comprising instructions that, when executed by the at least one processor, cause the system to: generate the token embedding by processing the one or more tokens utilizing a word-level embedding machine-learning model; and generate the character embedding by processing the individual characters utilizing a character-level embedding machine-learning model. 6. The system of claim 5 , further comprising instructions that, when executed by the at least one processor, cause the system to: generate the token embedding by processing the one or more tokens utilizing a first embedding layer and a first recurrent neural network of the word-level embedding machine-learning model; and generate the character embedding by processing the individual characters utilizing a second embedding layer and a second recurrent neural network of the character-level embedding machine-learning model. 7. The system of claim 5 , further comprising instructions that, when executed by the at least one processor, cause the system to generate a plurality of training identifier embeddings by: generating a plurality of training character embeddings; generating a plurality of training token embeddings; and combining the plurality of training character embeddings and the plurality of training token embeddings. 8. The system of claim 7 , further comprising instructions that, when executed by the at least one processor, cause the system to train the character-level embedding machine-learning model and the word-level embedding machine-learning model by: generating digital similarity predictions between a plurality of training digital content items by processing the plurality of training identifier embeddings utilizing a trained machine-learning model; and learning parameters of the character-level embedding machine-learning model and the word-level embedding machine-learning model by comparing the digital similarity predictions with ground truth similarity metrics. 9. A system comprising: at least one processor; and at least one non-transitory computer-readable storage medium storing instructions that, when executed by the at least one processor, cause the system to: identify a plurality of identifiers associated with a plurality of digital content items of a content management system; generate a plurality of identifier embeddings by, for each identifier of the plurality of identifiers: generating one or more tokens, each token comprising multiple characters within the identifier, and generating an identifier embedding by processing individual characters within the identifier and the one or more tokens utilizing one or more embedding machine-learning models; generate digital similarity predictions between the plurality of digital content items by processing the plurality of identifier embeddings utilizing a trained machine-learning model; and learn parameters of the one or more embedding machine-learning models by comparing the digital similarity predictions with ground truth similarity metrics. 10. The system of claim 9 , further comprising instructions that, when executed by the at least one processor, cause the system to learn the parameters for the one or more embedding machine-learning models by: generating a first identifier embedding by combining a first token embedding and a first character embedding corresponding to a first identifier; generating a second identifier embedding by combining a second token embedding and a second character embedding corresponding to a second identifier; and generating a combined identifier embedding for the trained machine-learning model by combining the first identifier embedding and the second identifier embedding. 11. The system of claim 10 , further comprising instructions that, when executed by the at least one processor, cause the system to learn the parameters for the one or more embedding machine-learning models by: generating a digital similarity prediction between the first identifier and the second identifier by processing the combined identifier embedding utilizing the trained machine-learning model; and determining a loss by comparing the digital similarity prediction and a ground truth similarity metric utilizing a loss function. 12. The system of claim 10 , further comprising instructions that, when executed by the at least one processor, cause the system to generate the digital similarity prediction by utilizing the trained machine-learning model to generate a file relation prediction between the first identifier and the second identifier, the file relation prediction comprising at least one of a parent-child file relation prediction or a sibling file relation prediction. 13. The system of claim 9 , further comprising instructions that, when executed by the at least one processor, cause the system to: detect user activity with respect to a first digital content item; generate, utilizing the one or more embedding machine-learning models, a first identifier embedding for the first digital content item and a second identifier embedding for a second digital content item; and
Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking · CPC title
Machine learning · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Physics · mapped topic
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.