Processing text using neural networks
US-2019258713-A1 · Aug 22, 2019 · US
US12045279B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12045279-B2 |
| Application number | US-202117538880-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 30, 2021 |
| Priority date | Nov 30, 2021 |
| Publication date | Jul 23, 2024 |
| Grant date | Jul 23, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system and method and for retrieving one or more visual assets includes receiving a search query for the one or more visual assets, the search query including textual data, encoding the textual data into one or more text embedding representations via a trained text representation machine-learning (ML) model, transmitting the one or more text embedding representations to a matching and selection unit, providing visual embedding representations of one or more visual assets to the matching and selection unit, comparing, by the matching and selection unit, the one or more text embedding representations to the visual embedding representations to identify one or more visual asset search results, and providing the one or more visual asset search results for display.
Opening claim text (preview).
What is claimed is: 1. A data processing system comprising: a processor; and a memory in communication with the processor, the memory comprising executable instructions that, when executed by the processor, cause the data processing system to perform functions of: receiving a search query for one or more visual assets, the search query including textual data; encoding the textual data into one or more text embedding representations via a trained text representation machine-learning (ML) model; transmitting the one or more text embedding representations to a matching and selection unit; providing visual embedding representations of one or more visual assets to the matching and selection unit; comparing, by the matching and selection unit, the one or more text embedding representations to the visual embedding representations to identify one or more visual asset search results; and providing the one or more visual asset search results for display, wherein: the one or more visual assets are stored in a visual asset library, and new visual assets are added to the visual asset library by: receiving the new visual assets; providing the new visual assets to a trained visual asset representation ML model; receiving new visual embedding representations for the new visual assets from the visual asset representation ML model; storing the new visual embedding representations in a visual asset index associated with the visual asset library, the trained text representation ML model and the trained visual asset representation ML model are trained in conjunction with each other using a training dataset which includes pairs of visual assets and their textual descriptions, their textual descriptions including one or more keywords for each visual asset in the pairs of visual assets and their textual descriptions, and the trained text representation ML model and the trained visual asset representation ML model are trained using one or more pre-trained mechanisms. 2. The data processing system of claim 1 , wherein the visual assets include at least one of an image, a video, an icon, a GIF, an illustration, and an emoticon. 3. The data processing system of claim 1 , wherein the visual embedding representations are stored in a visual asset index. 4. The data processing system of claim 1 , wherein the search query is received via a user interface of an application that provides text searching to perform a search of visual content. 5. The data processing system of claim 1 , wherein the training dataset is updated, and the updated training dataset is used to update at least one of the trained text representation ML model and the trained visual asset representation ML model. 6. The data processing system of claim 1 , wherein the trained visual asset representation ML model is trained to encode generic knowledge of at least one of semantic concepts, patterns or objects that appear in visual assets. 7. A computer implemented method for retrieving one or more visual assets comprising: receiving a search query for the one or more visual assets, the search query including textual data; encoding the textual data into one or more text embedding representations via a trained text representation machine-learning (ML) model; transmitting the one or more text embedding representations to a matching and selection unit; providing visual embedding representations of one or more visual assets to the matching and selection unit; comparing, by the matching and selection unit, the one or more text embedding representations to the visual embedding representations to identify one or more visual asset search results; and providing the one or more visual asset search results for display, wherein: the one or more visual assets are stored in a visual asset library, and new visual assets are added to the visual asset library by: receiving the new visual assets; providing the new visual assets to a trained visual asset representation ML model; receiving new visual embedding representations for the new visual assets from the visual asset representation ML model; storing the new visual embedding representations in a visual asset index associated with the visual asset library, the trained text representation ML model and the trained visual asset representation ML model are trained in conjunction with each other using a training dataset which includes pairs of visual assets and their textual descriptions, their textual descriptions including one or more keywords for each visual asset in the pairs of visual assets and their textual descriptions, and the trained text representation ML model and the trained visual asset representation ML model are trained using one or more pre-trained mechanisms. 8. The computer implemented method of claim 7 , wherein the visual assets include at least one of an image, a video, an icon, a GIF, an illustration, and an emoticon. 9. The computer implemented method of claim 7 , wherein the visual embedding representations are stored in a visual asset index. 10. A non-transitory computer readable medium on which are stored instructions that, when executed, cause a programmable device to perform functions of: receiving a search query for one or more visual assets, the search query including textual data; encoding the textual data into one or more text embedding representations via a trained text representation machine-learning (ML) model; transmitting the one or more text embedding representations to a matching and selection unit; providing visual embedding representations of one or more visual assets to the matching and selection unit; comparing, by the matching and selection unit, the one or more text embedding representations to the visual embedding representations to identify one or more visual asset search results; and providing the one or more visual asset search results for display, wherein: the one or more visual assets are stored in a visual asset library, and new visual assets are added to the visual asset library by: receiving the new visual assets; providing the new visual assets to a trained visual asset representation ML model; receiving new visual embedding representations for the new visual assets from the visual asset representation ML model; storing the new visual embedding representations in a visual asset index associated with the visual asset library, the trained text representation ML model and the trained visual asset representation ML model are trained in conjunction with each other using a training dataset which includes pairs of visual assets and their textual descriptions, their textual descriptions including one or more keywords for each visual asset in the pairs of visual assets and their textual descriptions, and the trained text representation ML model and the trained visual asset representation ML model are trained using a pre-trained mechanism. 11. The non-transitory computer readable medium of claim 10 , wherein the visual assets include at least one of an image, a video, an icon, a GIF, an illustration, and an emoticon. 12. The non-transitory computer readable medium of claim 10 , wherein the visual embedding representations are stored in a visual asset index.
having vectorial format · CPC title
Indexing; Data structures therefor; Storage structures · CPC title
Machine learning · CPC title
Presentation of query results · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.