Processing image data

US12475671B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12475671-B2
Application numberUS-202117542239-A
CountryUS
Kind codeB2
Filing dateDec 3, 2021
Priority dateOct 7, 2021
Publication dateNov 18, 2025
Grant dateNov 18, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of processing image data is provided. Pixel data for a first image is preprocessed to identify a subset of the pixel data corresponding to a region of interest depicting a scene element. The subset of the pixel data is processed at a first encoder to generate a first data structure representative of the region of interest, the first data structure identifying the scene element depicted in the region of interest. The subset of pixel data is also processed at a second encoder to generate a second data structure representative of the region of interest, the second data structure comprising values for visual characteristics associated with the scene element. The first and second data structures are outputted for use by a decoder to generate a second image approximating the region of interest of the first image.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method of processing image data, the method comprising: receiving pixel data of a first image; preprocessing the received pixel data to identify a subset of the pixel data of the first image, the subset of the pixel data corresponding to a region of interest of the first image depicting at least one scene element; first processing the subset of the pixel data of the first image at a first encoder to generate a first data structure representative of the region of interest of the first image, the first data structure comprising a scene element identifier identifying the at least one scene element depicted in the region of interest of the first image, wherein the scene element identifier is invariant to changes in a configuration of the at least one scene element between different images depicting the at least one scene element; second processing the subset of the pixel data of the first image at a second encoder to generate a second data structure representative of the region of interest of the first image, the second data structure comprising values for one or more visual characteristics associated with the at least one scene element depicted in the region of interest of the first image; and outputting the first data structure and the second data structure for use by a decoder to generate a second image approximating the region of interest of the first image, wherein the one or more visual characteristics, the values of which are to be included in the second data structure, are determined by the second encoder based on the identity of the at least one scene element as determined by the first encoder. 2 . The computer-implemented method of claim 1 wherein the second encoder is configured to determine the one or more visual characteristics by identifying features of the region of interest which are visually salient. 3 . The computer-implemented method of claim 1 , wherein the first encoder comprises a convolutional neural network that uses a differentiable loss function. 4 . The computer-implemented method of claim 3 , wherein the differentiable loss function comprises a triplet loss function. 5 . The computer-implemented method of claim 1 , wherein the first encoder is configured to distinguish between the at least one scene element that is depicted in the region of interest and at least one second scene element, the at least one scene element and the at least one second scene element being of a common scene element type. 6 . The computer-implemented method of claim 1 , wherein the scene element identifier is indicative of generic structural characteristics of content of the region of interest in comparison to other regions of the image and/or other images. 7 . The computer-implemented method of claim 1 , wherein the second encoder comprises a convolutional neural network configured to output a vector comprising the values of the one or more visual characteristics. 8 . The computer-implemented method of claim 1 , wherein the second encoder is configured to determine visual details of the region of interest to which the subset of the pixel data corresponds that are not captured by the first processing at the first encoder. 9 . The computer-implemented method of claim 1 , wherein the second encoder is configured to locate one or more landmarks in the region of interest to which the subset of the pixel data corresponds, wherein the one or more visual characteristics comprise coordinates of the one or more landmarks in the region of interest. 10 . The computer-implemented method of claim 1 , wherein the one or more visual characteristics relate to one or more of: lighting, orientation, movement, and perspective in the region of interest. 11 . The computer-implemented method of claim 1 , comprising generating, using an image generator module, the second image using the scene element identifier and the values of the one or more visual characteristics. 12 . The computer-implemented method of claim 11 , wherein the first encoder and/or the second encoder are trained using back-propagation of errors based on a comparison between the region of interest of the first image and the second image generated by the image generator module. 13 . The computer-implemented method of claim 11 , wherein the first encoder and/or the second encoder are trained using a discriminator function configured to determine whether the second image generated by the image generator module is a real image or a synthesized image, the discriminator function being configured to produce a composite set of loss functions that can be minimized using stochastic gradient descent and backpropagation through the first encoder and/or the second encoder. 14 . The computer-implemented method of claim 13 , wherein the composite set of loss functions are calculated in a latent space of a neural network that takes as inputs the subset of the pixel data corresponding to the region of interest of the first image and the second image generated by the image generator module. 15 . The computer-implemented method of claim 11 , wherein the first encoder and/or the second encoder are trained using one or more optimizing functions configured to score a loss of fidelity between the region of interest of the first image and the second image generated by the image generator module based on one or more of mean absolute error, mean squared error, and/or structural similarity index metrics that can be minimized using stochastic gradient descent and backpropagation through the first encoder and/or the second encoder. 16 . The computer-implemented method of claim 1 , wherein the second image comprises a photorealistic rendering of the region of interest to which the subset of the pixel data corresponds. 17 . A computer-implemented method of generating an image at a decoder, the method comprising: receiving a first data structure representative of a region of interest of a first image, the first data structure generated by a first encoder and comprising a scene element identifier identifying at least one scene element depicted in the region of interest of the first image, wherein the scene element identifier is invariant to changes in a configuration of the at least one scene element between different images depicting the at least one scene element; receiving a second data structure representative of the region of interest of the first image, the second data structure comprising values for one or more visual characteristics associated with the at least one scene element depicted in the region of interest of the first image; and generating for display, using the first data structure and the second data structure, a second image approximating the region of interest of the first image, wherein the one or more visual characteristics, the values of which are to be included in the second data structure, are determined by a second encoder based on the identity of the at least one scene element. 18 . A computing device comprising: a processor; and a memory, wherein the computing device is arranged to perform, using the processor, a method of processing image data, the method comprising: receiving pixel data of a first image; preprocessing the received pixel data to identify a subset of the pixel data corresponding to a region of interest of the first image depicting at least one scene element; first processing the subset of the pixel data of the first image at a first encoder to generate a first data structure representative of the region of interest of

Assignees

Inventors

Classifications

  • Backpropagation, e.g. using gradient descent · CPC title

  • Combinations of networks · CPC title

  • Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching · CPC title

  • using neural networks · CPC title

  • Generative networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12475671B2 cover?
A method of processing image data is provided. Pixel data for a first image is preprocessed to identify a subset of the pixel data corresponding to a region of interest depicting a scene element. The subset of the pixel data is processed at a first encoder to generate a first data structure representative of the region of interest, the first data structure identifying the scene element depicted…
Who is the assignee on this patent?
Sony Interactive Entertainment Europe Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/25. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 18 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).