System and method for extracting object information from digital images to evaluate for realism

US12488566B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12488566-B2
Application numberUS-202318478265-A
CountryUS
Kind codeB2
Filing dateSep 29, 2023
Priority dateJun 30, 2023
Publication dateDec 2, 2025
Grant dateDec 2, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are systems, methods, devices, and other techniques for comprehensive and automated evaluation of digital images generated from artificial intelligence (AI) models in order to promote accurate representations of real-world content. Prompts are received at the system that are then passed to both a search engine and a generative AI model. Synthesized digital images are obtained from the generative AI model. The top-matching image from the search engine is used as a verification of the ground truth of the synthesized digital images. A realism score is generated for each synthesized digital image that characterizes the accuracy of the synthesized digital image with reference to the verification image. The realism score can be used to assist and expedite the image selection process, as well as serve as input to fine-tune the performance of generative models.

First claim

Opening claim text (preview).

We claim: 1 . A computer-implemented method for extracting object information from digital images, the method comprising: receiving, at a realism assessment system, a user prompt involving real-world content; submitting, from the realism assessment system and to a web-based search engine, the user prompt; receiving, at the realism assessment system, a reference digital image retrieved by the search engine in response to the user prompt; submitting, from the realism assessment system and to a first generative artificial intelligence (AI) model, the user prompt; receiving, at the realism assessment system, a plurality of synthetic digital images including a first synthetic digital image and a second synthetic digital image, the plurality of synthetic digital images generated by the first generative AI model in response to the user prompt; automatically classifying, via a deep learning-based instance segmentation model of the realism assessment system, a first set of pixels in the reference digital image as corresponding to a first object, a second set of pixels in the first synthetic digital image as corresponding to a second object, and a third set of pixels in the second synthetic digital image as corresponding to a third object; extracting one or more Hu Moments for each of the first object, the second object, and the third object; generating, at the realism assessment system, a first realism score based on a comparison of characteristics of the first object with the second object by comparing the one or more Hu Moments for the first object and the second object; generating, at the realism assessment system, a second realism score based on a comparison of characteristics of the first object with the third object by comparing the one or more Hu Moments for the first object and the third object, the second realism score being greater than the first realism score; determining, at the realism assessment system, the second synthetic digital image has a greater likelihood of accurately representing the real-world content than the first synthetic digital image based on the second realism score being greater than the first realism score; and ranking each synthetic digital image of the plurality of synthetic digital images based on their computed realism score and presenting, at a computing device, the synthetic digital images in order of their ranking. 2 . The method of claim 1 , wherein the computed Hu Moments are used to characterize the shape of a main object in each of the reference digital image, the first image, and the second image, the main object being identified by a large language (LLM)-based classifier. 3 . The method of claim 1 , further comprising: detecting, via the deep learning-based instance segmentation model, a first set of objects in the reference digital image including at least the first object and a fourth object; generating, via the deep learning-based instance segmentation model, a label for each object in the first set of objects; passing both the user prompt and the labels for the objects in the first set of objects to a large language model (LLM) classifier; and identifying, via the LLM classifier, a first main object from the first set of objects, wherein the first main object corresponds to the first object. 4 . The method of claim 3 , further comprising: detecting, via the instance segmentation model, a second set of objects in the first synthetic digital image including at least the second object and a fifth object; generating, via the instance segmentation model, a label for each object in the second set of objects; passing both the user prompt and the labels for the objects in the second set of objects to the LLM classifier; and identifying, via the LLM classifier, a second main object from the second set of objects, wherein the second main object corresponds to the second object. 5 . The method of claim 4 , further comprising filtering out characteristics of the fourth object and the fifth object by the LLM classifier before generating the first realism score. 6 . The method of claim 1 , further comprising excluding from presentation at the computing device the synthetic digital images of the plurality of synthetic digital images that are assigned a realism score below a predesignated first threshold score. 7 . The method of claim 1 , further comprising: receiving, at the realism assessment system, a first image and a second image both retrieved by the search engine in response to the user prompt; converting, via an image captioning AI generator, the first image to a first text description; converting, via the image captioning AI generator, the second image to a second text description; determining the first text description is more similar to the user prompt than the second text description; and selecting the first image as the reference digital image in response to the first text description being more similar to the user prompt than the second text description. 8 . A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to extract object information from digital images by: receiving, at a realism assessment system, a user prompt involving real-world content; submitting, from the realism assessment system and to a web-based search engine, the user prompt; receiving, at the realism assessment system, a reference digital image retrieved by the search engine in response to the user prompt; submitting, from the realism assessment system and to a first generative AI model, the user prompt; receiving, at the realism assessment system, a plurality of synthetic digital images including a first synthetic digital image and a second synthetic digital image, the plurality of synthetic digital images generated by the first generative AI model in response to the user prompt; automatically classifying, via a deep learning-based instance segmentation model of the realism assessment system, a first set of pixels in the reference digital image as corresponding to a first object, a second set of pixels in the first synthetic digital image as corresponding to a second object, and a third set of pixels in the second synthetic digital image as corresponding to a third object; extracting one or more Hu Moments for each of the first object, the second object, and the third object; generating, at the realism assessment system, a first realism score based on a comparison of characteristics of the first object with the second object by comparing the one or more Hu Moments for the first object and the second object; generating, at the realism assessment system, a second realism score based on a comparison of characteristics of the first object with the third object by comparing the one or more Hu Moments for the first object and the third object, the second realism score being greater than the first realism score; determining, at the realism assessment system, the second synthetic digital image has a greater likelihood of accurately representing the real-world content than the first synthetic digital image based on the second realism score being greater than the first realism score; and ranking each synthetic digital image of the plurality of synthetic digital images based on their computed realism score and presenting, at a computing device, the synthetic digital images in order of their ranking. 9 . The non-transitory computer-readable medium of claim 8 , wherein the computed Hu Moments are used to characterize the shape of a main object in each of the reference digital image, the first image, and the second image, the main object being identified by a l

Assignees

Inventors

Classifications

  • using local operators · CPC title

  • Target detection · CPC title

  • Proximity, similarity or dissimilarity measures · CPC title

  • G06V20/70Primary

    Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title

  • Depth or shape recovery · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12488566B2 cover?
Described herein are systems, methods, devices, and other techniques for comprehensive and automated evaluation of digital images generated from artificial intelligence (AI) models in order to promote accurate representations of real-world content. Prompts are received at the system that are then passed to both a search engine and a generative AI model. Synthesized digital images are obtained f…
Who is the assignee on this patent?
Accenture Global Solutions Ltd
What technology area does this patent fall under?
Primary CPC classification G06V20/70. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 02 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).