Extracting attributes from arbitrary digital images utilizing a multi-attribute contrastive classification neural network
US-2022383037-A1 · Dec 1, 2022 · US
US12555352B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12555352-B2 |
| Application number | US-202217889752-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 17, 2022 |
| Priority date | Aug 18, 2021 |
| Publication date | Feb 17, 2026 |
| Grant date | Feb 17, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for displaying images similar to a selected image includes receiving, from a user, a selection of an anchor image, generating, using a machine learning model, an anchor embeddings set for the anchor image and respective candidate embeddings sets for a plurality of candidate images. The method also includes calculating a distance between the anchor embeddings set and each of the plurality of candidate embeddings sets and displaying at least one of the plurality of candidate images based on the calculated distance.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: receiving, by a server system from a user, a selection of an anchor image; generating, by the server system using a machine learning model, an anchor embeddings set for the anchor image, wherein the machine learning model is trained to receive an input image and output binary values for a plurality of visual attributes of the input image, the binary value for each visual attribute is indicative of an image associated with the binary value having the visual attribute, the visual attribute includes a consolidation of at least one standardized label derived from the image and at least one label class derived from the standardized labels, and the anchor embeddings set is indicative of binary values associated with the anchor image for a set of visual attributes; generating, by the server system, using the machine learning model, respective candidate embeddings sets for a plurality of candidate images; calculating a distance between the anchor embeddings set and each of the plurality of candidate embeddings sets; and displaying at least one of the plurality of candidate images based on the calculated distance. 2 . The method of claim 1 , wherein receiving the selection of the anchor image comprises: receiving, by the server system from the user, a selection of an anchor document; and deriving the anchor image from the anchor document by: retrieving a set of images associated with the anchor document; determining an alignment of each image in the set of images; receiving an indication of a standard alignment; and selecting, as the anchor image, the image in the set of images having the standard alignment. 3 . The method of claim 2 , wherein the indication of the standard alignment is based on a document type of the anchor document. 4 . The method of claim 2 , wherein deriving the anchor image further comprises: determining that two or more images of the set of images have the standard alignment; determining a confidence score associated with the alignment determination for each of the two or more images; and selecting the anchor image as the image of the two or more images with a highest confidence score. 5 . The method of claim 1 , wherein the displayed at least one candidate image corresponds to the candidate embeddings set that is closest to the anchor embeddings set. 6 . The method of claim 1 , wherein each candidate embeddings set is indicative of binary values associated with each candidate image for the set of visual attributes. 7 . The method of claim 6 , wherein the binary values represent a motif for each candidate image, the motif comprising a combination of visual attributes. 8 . A method comprising: training a machine learning model to receive an input image and to output binary values for a plurality of visual attributes of the input image, wherein the binary value for each visual attribute is indicative of an image associated with the binary value having the visual attribute, and the visual attribute includes a consolidation of at least one standardized label derived from the image and at least one label class derived from the standardized labels; receiving, by a server system from a user, a selection of an anchor image; generating, by the server system using the machine learning model, an anchor embeddings set for the anchor image; generating, by the machine learning model, respective candidate embeddings sets for a plurality of candidate images, each candidate embeddings set associated with a candidate image; calculating a distance between the anchor embeddings set and each of the plurality of candidate embeddings sets; and displaying at least one of the plurality of candidate images based on the calculated distance. 9 . The method of claim 8 , wherein the machine learning model generates embeddings for an image with a first layer of the model that, during training, is input to a second layer that outputs the binary values for the plurality of visual attributes of the image. 10 . The method of claim 8 , wherein training the machine learning model comprises minimizing a cross-entropy loss function, wherein a loss of the loss function comprises a sum of losses respective of the binary values for the plurality of visual attributes. 11 . The method of claim 8 , wherein the plurality of visual attributes are generated by: defining a set of images, each image associated with a document; deriving, from each image of the set of images, one or more pre-determined labels; standardizing the one or more pre-determined labels; generating label classes based on the standardized labels; and flattening the label classes with the pre-determined labels to generate the set of visual attributes. 12 . The method of claim 8 , wherein receiving the selection of the anchor image comprises: receiving, by the server system from the user, a selection of an anchor document; and deriving the anchor image from the anchor document by: retrieving a set of images associated with the anchor document; determining an alignment of each image in the set of images; receiving an indication of a standard alignment; and selecting, as the anchor image, the image in the set of images having the standard alignment. 13 . The method of claim 8 , wherein the displayed at least one candidate image corresponds to the candidate embeddings set closest to the anchor embeddings set. 14 . The method of claim 8 , wherein displaying the at least one candidate image comprises displaying a candidate document associated with each of the at least one candidate image. 15 . A method comprising: deriving a set of images from a set of documents; deriving, from each image of the set of images, one or more pre-determined labels; standardizing the one or more pre-determined labels; generating label classes based on the standardized labels; flattening the label classes with the standardized labels to generate a defined set comprising a plurality of visual attributes, wherein flattening comprises consolidating the standardized labels and the label classes into the set of visual attributes, and each visual attribute comprises at least one label class and at least one standardized label; training a machine learning model to receive an input image and to output binary values for the plurality of visual attributes of the input image, wherein the binary value for each visual attribute is indicative of an image associated with the binary value having the visual attribute; receiving, by a server system from a user, a selection of an anchor image; generating, by the server system using a machine learning model, an anchor embeddings set for the anchor image, the anchor embeddings set indicative of binary values associated with the anchor image for the set of visual attributes; generating, by server system using the machine learning model, respective candidate embeddings sets for a plurality of candidate images, each candidate embeddings set indicative of binary values associated with each candidate image for the set of visual attributes; calculating a distance between the anchor embeddings set and each of the plurality of candidate embeddings sets; and displaying at least one of the plurality of candidate images based on the calculated distance. 16 . The method of claim 15 , wherein receiving the selection of the anchor image comprises: receiving, by the server system from the user, a selection of an anchor document; and deriving the anchor image from the anchor document by: retrieving a set of images associated with the an
Classification of content, e.g. text, photographs or tables · CPC title
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
User interactive design; Environments; Toolboxes · CPC title
Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.