Image processing apparatus, image processing method, and storage medium
US-2023396748-A1 · Dec 7, 2023 · US
US12277271B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12277271-B2 |
| Application number | US-202418734497-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 5, 2024 |
| Priority date | Aug 18, 2023 |
| Publication date | Apr 15, 2025 |
| Grant date | Apr 15, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and a system for rendering video images in virtual reality (VR) scenes are provided. The method includes providing a video image at a current time point, dividing the video image at the current time point into a plurality of sub-regions, inputting image feature information of the sub-regions and acquired user viewpoint feature information into a trained attention model for processing to obtain attention coefficients of the sub-regions indicating probability values at which user viewpoints at a next time point fall into the sub-regions, rendering the sub-regions based on the attention coefficients of the sub-regions to obtain a rendered video image at the current time point, inputting the attention coefficients of the sub-regions and the image feature information of the sub-regions into a trained user eyes trajectory prediction model for processing, obtaining user eyes trajectory information in a current time period, dividing, for video images at subsequent time points within the current time period, the video images at the subsequent time points into a plurality of sub-regions, calculating attention coefficients of the sub-regions in a video image at each of the subsequent time points within the current time period respectively based on the user eyes trajectory information in the current time period, and rendering the corresponding sub-regions based on the attention coefficients of the sub-regions to obtain a rendered video image at each of the subsequent time points.
Opening claim text (preview).
What is claimed is: 1. A method for rendering video images in virtual reality (VR) scenes, the method comprising: providing a video image at a current time point; dividing the video image at the current time point into a plurality of sub-regions; inputting image feature information of the sub-regions and acquired user viewpoint feature information into a trained attention model for processing to obtain attention coefficients of the sub-regions indicating probability values at which user viewpoints at a next time point fall into the sub-regions; rendering the sub-regions based on the attention coefficients of the sub-regions to obtain a rendered video image at the current time point; inputting the attention coefficients of the sub-regions and the image feature information of the sub-regions into a trained user eyes trajectory prediction model for processing; obtaining user eyes trajectory information in a current time period; dividing, for video images at subsequent time points within the current time period, the video images at the subsequent time points into a plurality of sub-regions, calculating attention coefficients of the sub-regions in a video image at each of the subsequent time points within the current time period respectively based on the user eyes trajectory information in the current time period; and rendering the corresponding sub-regions based on the attention coefficients of the sub-regions to obtain a rendered video image at each of the subsequent time points. 2. The method according to claim 1 , the method further comprising: releasing the rendered video images at the time points within the time period chronologically; collecting user viewpoint information at the corresponding time points; and forming, when the user viewpoint information falls into a sub-region in the rendered video images, the sub-region rendered into a VR scene for presentation. 3. The method according to claim 1 , wherein the acquired user viewpoint feature information comprises visual behavior factor information and context factor information, wherein the visual behavior factor information comprises texture information textures of the sub-regions, mesh information meshes of the sub-regions, and position information of the sub-regions, and wherein the context factor information comprises user intention expression data, text data, voice conversation data, system guidance data, and Task directivity data. 4. The method according to claim 1 , wherein a manner of dividing the video image at the current time point into the plurality of sub-regions and a manner of dividing the video images at the subsequent time points into the plurality of sub-regions are the same, and wherein the dividing the video image at the current time point and the dividing the video images at the subsequent time points comprises: mapping the video image into a two-dimensional video image, the video image being a VR scene within a user eyes range defined by a sum of a field of view (FOV) of the user eyes and a set angle α; and inputting image feature information of the two-dimensional video image into a trained division model to obtain a plurality of sub-regions divided and corresponding user viewpoint feature information. 5. The method according to claim 1 , wherein before obtaining the attention coefficients of the sub-regions, the method further comprises: processing the sub-regions based on a foveal principle to obtain the attention coefficients of the sub-regions. 6. The method according to claim 1 , wherein a training process of the user eyes trajectory prediction model comprises: inputting the attention coefficients of the sub-regions in the video image at the current time point and the image feature information of the sub-regions into a user trajectory prediction model established based on user visual habit information for training, and outputting user eyes trajectory probability values of the sub-regions; determining a ground truth (GT) of the user eyes trajectory prediction model by using a user eyes trajectory of a user gazing from the sub-regions to adjacent sub-regions; and adjusting the user trajectory prediction model based on the user visual habit information in the training process until the training is completed. 7. The method according to claim 1 , wherein the obtaining the user eyes trajectory information in the current time period further comprises: determining, based on real eyes trajectory information of the user within the current time period, whether the user eyes trajectory information in the current time period directly obtained by processing through the user eyes trajectory prediction model is accurate; if the user eyes trajectory information is accurate, taking the user eyes trajectory information in the current time period directly obtained by processing through the user eyes trajectory prediction model as the obtained user eyes trajectory information in the current time period; and if the user eyes trajectory information is not accurate, the real eyes trajectory information of the user within the current time period as the obtained user eyes trajectory information in the current time period, and optimally training the user eyes trajectory prediction model based on the real eyes trajectory information of the user within the current time period. 8. The method according to claim 1 , wherein the calculating of the attention coefficients of the sub-regions in the video image at each of the subsequent time points within the current time period respectively based on the user eyes trajectory information in the current time period comprises: determining, for a sub-region in the video image at each of the subsequent time points within the current time period, whether the user eyes fall into the sub-region based on the user eyes trajectory information in the current time period, enhancing, if yes, the attention coefficient of the sub-region according to a set amplitude on the basis of the attention coefficient at a corresponding previous time point, and decreasing, if no, the attention coefficient of the sub-region according to the set amplitude on the basis of the attention coefficient at the corresponding previous time point. 9. The method according to claim 1 , wherein the rendering the corresponding sub-regions based on the attention coefficients of the sub-regions comprises: setting an attention coefficient threshold, determining whether the attention coefficients of the sub-regions exceed the attention coefficient threshold set, rendering, if yes, the sub-regions using a set high-level rendering mode, and rendering, if no, the sub-regions using a set low-level rendering mode. 10. An electronic device, comprising: memory storing one or more computer programs; and one or more processors communicatively coupled to the memory, wherein the one or more computer programs include computer-executable instructions that, when executed by the one or more processors, cause the electronic device to: provide a video image at a current time point, divide the video image at the current time point into a plurality of sub-regions, inputting image feature information of the sub-regions and acquired user viewpoint feature information into a trained attention model for processing to obtain attention coefficients of the sub-regions indicating probability values at which user viewpoints at a next time point fall into the sub-regions, render the sub-regions based on the attention coefficients of the sub-regions to obtain a rendered video image at the current time point, input the attention coefficients of the sub-regions and the image feature information of the sub-regions into a trained user eyes trajectory prediction
involving special video data, e.g 3D video · CPC title
Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV programme (methods or arrangements for recognising human body or animal bodies or body parts G06V40/10; methods or arrangements for acquiring or recognising human faces, facial parts, facial sketches, facial expressions G06V40/16; methods or arrangements for recognising movements or behaviour G06V40/20; arrangements for identifying users in broadcast systems H04H60/45) · CPC title
involving rendering scenes according to scene graphs, e.g. MPEG-4 scene graphs · CPC title
Protocols for games, networked simulations or virtual reality · CPC title
Recognising the driver's state or behaviour, e.g. attention or drowsiness · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.