Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06F16/5866. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and system of content retrieval for visual data

US12045279B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12045279-B2
Application number	US-202117538880-A
Country	US
Kind code	B2
Filing date	Nov 30, 2021
Priority date	Nov 30, 2021
Publication date	Jul 23, 2024
Grant date	Jul 23, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method and for retrieving one or more visual assets includes receiving a search query for the one or more visual assets, the search query including textual data, encoding the textual data into one or more text embedding representations via a trained text representation machine-learning (ML) model, transmitting the one or more text embedding representations to a matching and selection unit, providing visual embedding representations of one or more visual assets to the matching and selection unit, comparing, by the matching and selection unit, the one or more text embedding representations to the visual embedding representations to identify one or more visual asset search results, and providing the one or more visual asset search results for display.

First claim

Opening claim text (preview).

What is claimed is: 1. A data processing system comprising: a processor; and a memory in communication with the processor, the memory comprising executable instructions that, when executed by the processor, cause the data processing system to perform functions of: receiving a search query for one or more visual assets, the search query including textual data; encoding the textual data into one or more text embedding representations via a trained text representation machine-learning (ML) model; transmitting the one or more text embedding representations to a matching and selection unit; providing visual embedding representations of one or more visual assets to the matching and selection unit; comparing, by the matching and selection unit, the one or more text embedding representations to the visual embedding representations to identify one or more visual asset search results; and providing the one or more visual asset search results for display, wherein: the one or more visual assets are stored in a visual asset library, and new visual assets are added to the visual asset library by: receiving the new visual assets; providing the new visual assets to a trained visual asset representation ML model; receiving new visual embedding representations for the new visual assets from the visual asset representation ML model; storing the new visual embedding representations in a visual asset index associated with the visual asset library, the trained text representation ML model and the trained visual asset representation ML model are trained in conjunction with each other using a training dataset which includes pairs of visual assets and their textual descriptions, their textual descriptions including one or more keywords for each visual asset in the pairs of visual assets and their textual descriptions, and the trained text representation ML model and the trained visual asset representation ML model are trained using one or more pre-trained mechanisms. 2. The data processing system of claim 1 , wherein the visual assets include at least one of an image, a video, an icon, a GIF, an illustration, and an emoticon. 3. The data processing system of claim 1 , wherein the visual embedding representations are stored in a visual asset index. 4. The data processing system of claim 1 , wherein the search query is received via a user interface of an application that provides text searching to perform a search of visual content. 5. The data processing system of claim 1 , wherein the training dataset is updated, and the updated training dataset is used to update at least one of the trained text representation ML model and the trained visual asset representation ML model. 6. The data processing system of claim 1 , wherein the trained visual asset representation ML model is trained to encode generic knowledge of at least one of semantic concepts, patterns or objects that appear in visual assets. 7. A computer implemented method for retrieving one or more visual assets comprising: receiving a search query for the one or more visual assets, the search query including textual data; encoding the textual data into one or more text embedding representations via a trained text representation machine-learning (ML) model; transmitting the one or more text embedding representations to a matching and selection unit; providing visual embedding representations of one or more visual assets to the matching and selection unit; comparing, by the matching and selection unit, the one or more text embedding representations to the visual embedding representations to identify one or more visual asset search results; and providing the one or more visual asset search results for display, wherein: the one or more visual assets are stored in a visual asset library, and new visual assets are added to the visual asset library by: receiving the new visual assets; providing the new visual assets to a trained visual asset representation ML model; receiving new visual embedding representations for the new visual assets from the visual asset representation ML model; storing the new visual embedding representations in a visual asset index associated with the visual asset library, the trained text representation ML model and the trained visual asset representation ML model are trained in conjunction with each other using a training dataset which includes pairs of visual assets and their textual descriptions, their textual descriptions including one or more keywords for each visual asset in the pairs of visual assets and their textual descriptions, and the trained text representation ML model and the trained visual asset representation ML model are trained using one or more pre-trained mechanisms. 8. The computer implemented method of claim 7 , wherein the visual assets include at least one of an image, a video, an icon, a GIF, an illustration, and an emoticon. 9. The computer implemented method of claim 7 , wherein the visual embedding representations are stored in a visual asset index. 10. A non-transitory computer readable medium on which are stored instructions that, when executed, cause a programmable device to perform functions of: receiving a search query for one or more visual assets, the search query including textual data; encoding the textual data into one or more text embedding representations via a trained text representation machine-learning (ML) model; transmitting the one or more text embedding representations to a matching and selection unit; providing visual embedding representations of one or more visual assets to the matching and selection unit; comparing, by the matching and selection unit, the one or more text embedding representations to the visual embedding representations to identify one or more visual asset search results; and providing the one or more visual asset search results for display, wherein: the one or more visual assets are stored in a visual asset library, and new visual assets are added to the visual asset library by: receiving the new visual assets; providing the new visual assets to a trained visual asset representation ML model; receiving new visual embedding representations for the new visual assets from the visual asset representation ML model; storing the new visual embedding representations in a visual asset index associated with the visual asset library, the trained text representation ML model and the trained visual asset representation ML model are trained in conjunction with each other using a training dataset which includes pairs of visual assets and their textual descriptions, their textual descriptions including one or more keywords for each visual asset in the pairs of visual assets and their textual descriptions, and the trained text representation ML model and the trained visual asset representation ML model are trained using a pre-trained mechanism. 11. The non-transitory computer readable medium of claim 10 , wherein the visual assets include at least one of an image, a video, an icon, a GIF, an illustration, and an emoticon. 12. The non-transitory computer readable medium of claim 10 , wherein the visual embedding representations are stored in a visual asset index.

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06F16/56
having vectorial format · CPC title
G06F16/51
Indexing; Data structures therefor; Storage structures · CPC title
G06N20/00
Machine learning · CPC title
G06F16/538
Presentation of query results · CPC title
G06N3/08
Learning methods · CPC title

Patent family

Related publications grouped by family.

View patent family 86500058

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12045279B2 cover?: A system and method and for retrieving one or more visual assets includes receiving a search query for the one or more visual assets, the search query including textual data, encoding the textual data into one or more text embedding representations via a trained text representation machine-learning (ML) model, transmitting the one or more text embedding representations to a matching and selecti…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06F16/5866. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).