What technology area does this patent fall under?

Primary CPC classification G06V20/70. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Automatic image cropping

US12300007B1 · US · B1

Patent metadata
Field	Value
Publication number	US-12300007-B1
Application number	US-202217957360-A
Country	US
Kind code	B1
Filing date	Sep 30, 2022
Priority date	Sep 30, 2022
Publication date	May 13, 2025
Grant date	May 13, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are generally described for cropped image evaluation. In various examples, first image data representing a first image and second image data representing a cropped version of the first image may be received. An image captioning model may be used to generate first text data describing the first image data and second text data describing the second image data. A first encoder may be used to generate first data representing the first text data and second data representing the second text data. In various examples, a third data representing a degree of similarity between the first data and the second data may be generated. In some cases, first computer-executable instructions configured to cause the second image data to be displayed on a display may be generated based at least in part on the third data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: receiving first image data representing a first image; generating second image data representing a portion of the first image that is generated by cropping the first image data according to a first aspect ratio of a first display of a target device; generating, by inputting the first image data into an image captioning model, first text data representing a first description of first content of the first image data; generating, using a first encoder, a first vector representation of the first text data; generating, by inputting the second image data into the image captioning model, second text data representing a second description of second content of the second image data; generating, using the first encoder, a second vector representation of the second text data; determining a first cosine similarity score between the first vector representation and the second vector representation; and generating, based at least in part on the first cosine similarity score, first computer-executable instructions to cause the second image data to be displayed on the first display of the target device. 2. The computer-implemented method of claim 1 , further comprising: generating first graph data representing the first text data using a semantic propositional image caption evaluation (SPICE) model, wherein a node in the first graph data represents a word of the first text data; and generating second graph data representing the second text data using SPICE. 3. The computer-implemented method of claim 2 , further comprising: generating, by the first encoder, the first vector representation at least in part by generating a first embedding for a first node in the first graph data, the first node representing a first word; generating, by the first encoder, the second vector representation at least in part by generating a second embedding for a second node in the second graph data, the second node representing a second word; and determining the first cosine similarity score based at least in part by determining a cosine similarity between the first embedding and the second embedding. 4. A method comprising: receiving first image data representing a first image; receiving second image data representing a second image comprising a first subset of pixels of the first image data; generating, using an image captioning model executed by at least one computing device, first text data describing the first image; generating, using the image captioning model, second text data describing the second image; generating first data representing the first text data; generating second data representing the second text data; determining a third data representing a degree of similarity between the first data and the second data; and generating first computer-executable instructions configured to cause the second image data to be displayed on a display based at least in part on the third data. 5. The method of claim 4 , further comprising: generating first graph data representing the first text data, wherein a first node in the first graph data represents a first word of the first text data, wherein the first data comprises the first graph data; and generating second graph data representing the second text data, wherein the second data comprises the second graph data. 6. The method of claim 5 , further comprising: determining, using the first graph data and the second graph data, a first set of words present in the first graph data that are also present in the second graph data; and generating the third data based at least in part on the first set of words. 7. The method of claim 5 , further comprising: generating a first vector representing at least a first word in the first graph data, wherein the first data comprises the first vector; generating a second vector representing at least a second word in the second graph data, wherein the second data comprises the second vector; and determining the third data based at least in part on one of a cosine similarity or Euclidean distance between the first vector and the second vector. 8. The method of claim 4 , further comprising: determining first keypoints in the first image data using a keypoint detection model; determining second keypoints in the second image data using the keypoint detection model; determining a ratio of a number of the second keypoints to a number of the first keypoints; and determining the third data based at least in part on the ratio. 9. The method of claim 4 , further comprising: receiving third image data representing a third image comprising a second subset of pixels of the first image data different from the first subset; generating, using the image captioning model, third text data describing the third image data; generating, using a first encoder and the third text data, fourth data representing the third text data; determining fifth data representing a degree of similarity between the first data and the fourth data; and selecting the second image data for output based on a comparison of the third data and the fifth data. 10. The method of claim 4 , wherein the first data and the second data are generated using a first encoder, the method further comprising: generating, using a second encoder executed by the at least one computing device, fourth data representing the first text data; generating, using the second encoder and the second text data, fifth data representing the second text data; determining sixth data representing a degree of similarity between the fourth data and the fifth data; inputting the third data and the sixth data into a neural network; and generating, by the neural network, output data indicating a semantic similarity between the first image data and the second image data. 11. The method of claim 4 , further comprising: receiving a first input describing a first aspect ratio of an output display; generating a plurality of cropped images from the first image data by iterating a window of the first aspect ratio over a plurality of positions overlaying the first image data, wherein each position of the plurality of positions corresponds to one of the plurality of cropped images; generating, for a first cropped image of the plurality of cropped images, a first score representing a first degree of similarity between the first cropped image and the first image; generating, for a second cropped image of the plurality of cropped images, a second score representing a second degree of similarity between the second cropped image and the first image; and selecting the first cropped image from among the plurality of cropped images based on the first score and the second score. 12. The method of claim 4 , wherein the third data represents a similarity between first content of the first image and second content of the second image. 13. A system comprising: at least one processor; and at least one non-transitory computer-readable memory storing instructions that, when executed by the at least one processor, are effective to program the at least one processor to: receive first image data representing a first image; receive second image data representing a second image comprising a first subset of pixels of the first image; generate, using an image captioning model executed by at least one computing device, first text data describing the first image; generate, using the image captioning model, second text data describing the second image; generate first data representing the first text data; generate second data representing the second text data; determine a third data

Assignees

Amazon Tech Inc

Inventors

Classifications

G06V10/82
using neural networks · CPC title
G06V10/761
Proximity, similarity or dissimilarity measures · CPC title
G06V20/70Primary
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
G06V10/26Primary
Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion · CPC title

Patent family

Related publications grouped by family.

View patent family 95659024

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12300007B1 cover?: Techniques are generally described for cropped image evaluation. In various examples, first image data representing a first image and second image data representing a cropped version of the first image may be received. An image captioning model may be used to generate first text data describing the first image data and second text data describing the second image data. A first encoder may be us…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G06V20/70. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Systems and methods for face annotation

Localizing relevant objects in multi-object images

Text analyzing method and device, server and computer-readable storage medium

Image captioning utilizing semantic text modeling and adversarial learning

Frequently asked questions