Qualifying labels automatically attributed to content in images

US12530913B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12530913-B2
Application numberUS-202318171256-A
CountryUS
Kind codeB2
Filing dateFeb 17, 2023
Priority dateFeb 17, 2023
Publication dateJan 20, 2026
Grant dateJan 20, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for image generation. The method including identifying a plurality of features of an image. The method including classifying each of the plurality of features using an artificial intelligence (AI) model trained to identify features in a plurality of images, wherein the plurality of features is classified as a plurality of labels, wherein the image is provided as input to the AI model. The method including receiving feedback for a label, wherein the feedback is associated with a user. The method including modifying a label based on the feedback. The method including updating the plurality of labels with the label that is modified. The method including providing as input the plurality of labels that is updated into an image generation artificial intelligence system configured for implementing latent diffusion to generate an updated image.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: identifying a plurality of features of an image; classifying each of the plurality of features using an artificial intelligence (AI) model trained to classify features in a plurality of images, wherein the plurality of features is classified as a plurality of labels, and wherein the image is provided as input to the AI model; determining, based on a commentary, an object within the image to which the commentary applies; determining one or more labels of the object to which the commentary applies; translating the commentary into feedback for the one or more labels; modifying the one or more labels based on the feedback; updating the plurality of labels with the one or more labels that is modified; and providing, as input, the plurality of labels that is updated into an image generation artificial intelligence system configured for implementing latent diffusion to generate an updated image. 2 . The method of claim 1 , further comprising: generating the image using the image generation artificial intelligence system. 3 . The method of claim 1 , further comprising: receiving the feedback via a user interface, wherein the feedback is formatted in text. 4 . The method of claim 1 , further comprising: receiving the feedback as audio, wherein the feedback is presented in natural language; converting the audio to text; and presenting the text via a user interface. 5 . The method of claim 1 , further comprising: receiving identification of an object within a scene that is presented on a display; presenting one or more labels of the object in a user interface via the display, wherein the one or more labels of the object includes the label; and receiving identification of the label by a user via the user interface. 6 . The method of claim 5 , wherein the receiving identification of the object includes: determining that the user is pointing to a location in physical space corresponding to a location of the object within the scene in virtual space, wherein the scene is presented on the display of a head mounted display worn by the user; and determining that the user is pointing to the object within the scene based on the pointing. 7 . The method of claim 5 , wherein the receiving identification of the object includes: determining that the user selects the object in the scene using a controller. 8 . The method of claim 1 , further comprising: presenting a plurality of labels of a plurality of objects of a scene of the image in a user interface on a display, wherein the plurality of objects are presented in the user interface as a hierarchical file system of objects; receiving selection of an object via the hierarchical file system; presenting one or more labels of the object in the user interface via the display, wherein the one or more labels of the object includes the label; and receiving identification of the label by the user via the user interface. 9 . The method of claim 1 , further comprising: highlighting the object that is presented on a display. 10 . The method of claim 1 , further comprising: determining that a user is pointing to a location in physical space corresponding to a location of an object within a scene in virtual space, wherein the scene is presented on a display of a head mounted display worn by the user; highlighting the object in the scene; determining that the user is selecting the object based on the pointing; receiving commentary to modify the object from the user, wherein the commentary is presented in natural language; determining one or more labels of the object, wherein the one or more labels of the object includes the label; determining that the commentary applies to the label; and translating the commentary into the feedback for the label. 11 . The method of claim 1 , further comprising; determining a context based on the feedback for the label; and modifying one or more of the plurality of labels based on the context, wherein the plurality of labels that is updated includes one or more of the plurality of labels that have been modified. 12 . The method of claim 1 , further comprising: adding a new object corresponding to the label based on the feedback; determining a context based on the new object; and modifying one or more of the plurality of labels based on the context, wherein the plurality of labels that is updated includes one or more of the plurality of labels that have been modified. 13 . The method of claim 1 , further comprising: removing an object corresponding to the label based on the feedback; and removing the label from the plurality of labels when performing the modifying the label and when performing the updating the plurality of labels. 14 . A non-transitory computer-readable medium having instructions stored thereon, which, when executed by a processor, cause the processor to perform a method comprising: identifying a plurality of features of an image; classifying each of the plurality of features using an artificial intelligence (AI) model trained to classify features in a plurality of images, wherein the plurality of features is classified as a plurality of labels, wherein the image is provided as input to the AI model; determining, based on a commentary, an object within the image to which the commentary applies; determining one or more labels of the object to which the commentary applies; translating the commentary into feedback for the one or more labels; modifying the one or more labels a label based on the feedback; updating the plurality of labels with the one or more labels that is modified; and providing as input the plurality of labels that is updated into an image generation artificial intelligence system configured for implementing latent diffusion to generate an updated image. 15 . The non-transitory computer-readable medium of claim 14 , further comprising instructions that, when executed, cause the processor to perform the method comprising: receiving identification of an object within a scene that is presented on a display; presenting one or more labels of the object in a user interface via the display, wherein the one or more labels of the object includes the label; and receiving identification of the label by a user via the user interface. 16 . The non-transitory computer-readable medium of claim 14 , further comprising instructions that, when executed, cause the processor to perform the method comprising: presenting a plurality of labels of a plurality of objects of a scene of the image in a user interface on a display, wherein the plurality of objects are presented in the user interface as a hierarchical file system of objects; receiving selection of an object via the hierarchical file system; presenting one or more labels of the object in the user interface via the display, wherein the one or more labels of the object includes the label; and receiving identification of the label by the user via the user interface. 17 . The non-transitory computer-readable medium of claim 14 , further comprising instructions that, when executed, cause the processor to perform the method comprising: highlighting the object that is presented on a display. 18 . A computer system comprising: a processor; and memory coupled to the processor and having stored therein instructions that, if executed by the computer system, cause the computer system to execute a method for implementing a graphics pipeline, comprising: identifying a plurali

Assignees

Inventors

Classifications

  • Three-dimensional [3D] modelling for computer graphics · CPC title

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • Interaction with lists of selectable items, e.g. menus · CPC title

  • Target detection · CPC title

  • using classification, e.g. of video objects · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12530913B2 cover?
A method for image generation. The method including identifying a plurality of features of an image. The method including classifying each of the plurality of features using an artificial intelligence (AI) model trained to identify features in a plurality of images, wherein the plurality of features is classified as a plurality of labels, wherein the image is provided as input to the AI model. …
Who is the assignee on this patent?
Sony Interactive Entertainment Inc
What technology area does this patent fall under?
Primary CPC classification G06V20/70. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 20 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).