Systems and methods for face annotation
US-2023316803-A1 · Oct 5, 2023 · US
US12300007B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-12300007-B1 |
| Application number | US-202217957360-A |
| Country | US |
| Kind code | B1 |
| Filing date | Sep 30, 2022 |
| Priority date | Sep 30, 2022 |
| Publication date | May 13, 2025 |
| Grant date | May 13, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are generally described for cropped image evaluation. In various examples, first image data representing a first image and second image data representing a cropped version of the first image may be received. An image captioning model may be used to generate first text data describing the first image data and second text data describing the second image data. A first encoder may be used to generate first data representing the first text data and second data representing the second text data. In various examples, a third data representing a degree of similarity between the first data and the second data may be generated. In some cases, first computer-executable instructions configured to cause the second image data to be displayed on a display may be generated based at least in part on the third data.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: receiving first image data representing a first image; generating second image data representing a portion of the first image that is generated by cropping the first image data according to a first aspect ratio of a first display of a target device; generating, by inputting the first image data into an image captioning model, first text data representing a first description of first content of the first image data; generating, using a first encoder, a first vector representation of the first text data; generating, by inputting the second image data into the image captioning model, second text data representing a second description of second content of the second image data; generating, using the first encoder, a second vector representation of the second text data; determining a first cosine similarity score between the first vector representation and the second vector representation; and generating, based at least in part on the first cosine similarity score, first computer-executable instructions to cause the second image data to be displayed on the first display of the target device. 2. The computer-implemented method of claim 1 , further comprising: generating first graph data representing the first text data using a semantic propositional image caption evaluation (SPICE) model, wherein a node in the first graph data represents a word of the first text data; and generating second graph data representing the second text data using SPICE. 3. The computer-implemented method of claim 2 , further comprising: generating, by the first encoder, the first vector representation at least in part by generating a first embedding for a first node in the first graph data, the first node representing a first word; generating, by the first encoder, the second vector representation at least in part by generating a second embedding for a second node in the second graph data, the second node representing a second word; and determining the first cosine similarity score based at least in part by determining a cosine similarity between the first embedding and the second embedding. 4. A method comprising: receiving first image data representing a first image; receiving second image data representing a second image comprising a first subset of pixels of the first image data; generating, using an image captioning model executed by at least one computing device, first text data describing the first image; generating, using the image captioning model, second text data describing the second image; generating first data representing the first text data; generating second data representing the second text data; determining a third data representing a degree of similarity between the first data and the second data; and generating first computer-executable instructions configured to cause the second image data to be displayed on a display based at least in part on the third data. 5. The method of claim 4 , further comprising: generating first graph data representing the first text data, wherein a first node in the first graph data represents a first word of the first text data, wherein the first data comprises the first graph data; and generating second graph data representing the second text data, wherein the second data comprises the second graph data. 6. The method of claim 5 , further comprising: determining, using the first graph data and the second graph data, a first set of words present in the first graph data that are also present in the second graph data; and generating the third data based at least in part on the first set of words. 7. The method of claim 5 , further comprising: generating a first vector representing at least a first word in the first graph data, wherein the first data comprises the first vector; generating a second vector representing at least a second word in the second graph data, wherein the second data comprises the second vector; and determining the third data based at least in part on one of a cosine similarity or Euclidean distance between the first vector and the second vector. 8. The method of claim 4 , further comprising: determining first keypoints in the first image data using a keypoint detection model; determining second keypoints in the second image data using the keypoint detection model; determining a ratio of a number of the second keypoints to a number of the first keypoints; and determining the third data based at least in part on the ratio. 9. The method of claim 4 , further comprising: receiving third image data representing a third image comprising a second subset of pixels of the first image data different from the first subset; generating, using the image captioning model, third text data describing the third image data; generating, using a first encoder and the third text data, fourth data representing the third text data; determining fifth data representing a degree of similarity between the first data and the fourth data; and selecting the second image data for output based on a comparison of the third data and the fifth data. 10. The method of claim 4 , wherein the first data and the second data are generated using a first encoder, the method further comprising: generating, using a second encoder executed by the at least one computing device, fourth data representing the first text data; generating, using the second encoder and the second text data, fifth data representing the second text data; determining sixth data representing a degree of similarity between the fourth data and the fifth data; inputting the third data and the sixth data into a neural network; and generating, by the neural network, output data indicating a semantic similarity between the first image data and the second image data. 11. The method of claim 4 , further comprising: receiving a first input describing a first aspect ratio of an output display; generating a plurality of cropped images from the first image data by iterating a window of the first aspect ratio over a plurality of positions overlaying the first image data, wherein each position of the plurality of positions corresponds to one of the plurality of cropped images; generating, for a first cropped image of the plurality of cropped images, a first score representing a first degree of similarity between the first cropped image and the first image; generating, for a second cropped image of the plurality of cropped images, a second score representing a second degree of similarity between the second cropped image and the first image; and selecting the first cropped image from among the plurality of cropped images based on the first score and the second score. 12. The method of claim 4 , wherein the third data represents a similarity between first content of the first image and second content of the second image. 13. A system comprising: at least one processor; and at least one non-transitory computer-readable memory storing instructions that, when executed by the at least one processor, are effective to program the at least one processor to: receive first image data representing a first image; receive second image data representing a second image comprising a first subset of pixels of the first image; generate, using an image captioning model executed by at least one computing device, first text data describing the first image; generate, using the image captioning model, second text data describing the second image; generate first data representing the first text data; generate second data representing the second text data; determine a third data
using neural networks · CPC title
Proximity, similarity or dissimilarity measures · CPC title
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.