What technology area does this patent fall under?

Primary CPC classification G02B30/52. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 09 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Processing stereo images with a machine-learning model

US11644685B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11644685-B2
Application number	US-202016993788-A
Country	US
Kind code	B2
Filing date	Aug 14, 2020
Priority date	Aug 14, 2020
Publication date	May 9, 2023
Grant date	May 9, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a method includes accessing a pair of stereo images for a scene, where each image of the pair of stereo images has incomplete pixel information and k channels, stacking the pair of stereo images to form a stacked input image with 2k channels, processing the stacked input image using a machine-learning model to generate a stacked output image with 2k channels, and separating the stacked output image with 2k channels into a pair of reconstructed stereo images for the scene, where each image of the pair of reconstructed stereo images has complete pixel information and k channels.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising, by a computing device: accessing a pair of stereo images for a scene, wherein each image of the pair of stereo images has incomplete pixel information and k channels; stacking the pair of stereo images to form a stacked input image with 2k channels by: calculating an importance score associated with each area among a plurality of areas in the scene; identifying an area with a highest importance score among the plurality of areas in the scene; and stacking the channels of both images by aligning the identified area between the pair of stereo images; processing the stacked input image using a machine-learning model to generate a stacked output image with 2k channels; and separating the stacked output image with 2k channels into a pair of reconstructed stereo images for the scene, wherein each image of the pair of reconstructed stereo images has complete pixel information and k channels. 2. The method of claim 1 , wherein the pair of stereo images is used to provide a stereoscopic view of the scene to a user. 3. The method of claim 1 , wherein an object captured in one of the pair of stereo images is shifted from the other image, wherein a degree of the shift is associated with a distance of the object from a viewpoint of a user. 4. The method of claim 1 , wherein stacking the pair of stereo images to form the stacked input image with 2k channels comprises stacking the channels of both images by aligning pixel coordinates between the pair of stereo images. 5. The method of claim 1 , wherein calculating the importance score associated with each area is based on a relative distance of the area from a vergence location of a user such that a higher importance score is assigned to a first area with a smaller distance to the vergence location of the user than a second area with a larger distance to the vergence location of the user. 6. The method of claim 1 , wherein calculating the importance score associated with each area is based on content associated each area such that a higher importance score is assigned to a first area that is associated with an important content than a second area that is not associated with an important content. 7. The method of claim 1 , wherein the k channels comprise RGB channels. 8. The method of claim 1 , wherein the k channels comprise RGB channels and an alpha channel, wherein the alpha channel indicates a transparency level of each pixel. 9. The method of claim 1 , wherein the pair of stereo images is associated with a frame in a video stream. 10. The method of claim 1 , wherein the machine-learning model is an image reconstruction model that reconstructs restores sampled, noisy or damaged images. 11. The method of claim 1 , wherein the machine-learning model is trained with a loss function that measures differences between each image of the pair of reconstructed stereo images and a corresponding image of a pair of ground truth stereo images. 12. One or more computer-readable non-transitory storage media embodying software that is operable when executed to: access a pair of stereo images for a scene, wherein each image of the pair of stereo images has incomplete pixel information and k channels; stack the pair of stereo images to form a stacked input image with 2k channels by: calculating an importance score associated with each area among a plurality of areas in the scene; identifying an area with a highest importance score among the plurality of areas in the scene; and stacking the channels of both images by aligning the identified area between the pair of stereo images; process the stacked input image using a machine-learning model to generate a stacked output image with 2k channels; and separate the stacked output image with 2k channels into a pair of reconstructed stereo images for the scene, wherein each image of the pair of reconstructed stereo images has complete pixel information and k channels. 13. The media of claim 12 , wherein the pair of stereo images is used to provide a stereoscopic view of the scene to a user. 14. The media of claim 12 , wherein an object captured in one of the pair of stereo images is shifted from the other image, wherein a degree of the shift is associated with a distance of the object from a viewpoint of a user. 15. The media of claim 12 , wherein stacking the pair of stereo images to form the stacked input image with 2k channels comprises stacking the channels of both images by aligning pixel coordinates between the pair of stereo images. 16. The media of claim 12 , wherein calculating the importance score associated with each area is based on a relative distance of the area from a vergence location of a user such that a higher importance score is assigned to a first area with a smaller distance to the vergence location of the user than a second area with a larger distance to the vergence location of the user. 17. The media of claim 12 , wherein calculating the importance score associated with each area is based on content associated each area such that a higher importance score is assigned to a first area that is associated with an important content than a second area that is not associated with an important content. 18. A system comprising: one or more processors; and a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to: access a pair of stereo images for a scene, wherein each image of the pair of stereo images has incomplete pixel information and k channels; stack the pair of stereo images to form a stacked input image with 2k channels by: calculating an importance score associated with each area among a plurality of areas in the scene; identifying an area with a highest importance score among the plurality of areas in the scene; and stacking the channels of both images by aligning the identified area between the pair of stereo images; process the stacked input image using a machine-learning model to generate a stacked output image with 2k channels; and separate the stacked output image with 2k channels into a pair of reconstructed stereo images for the scene, wherein each image of the pair of reconstructed stereo images has complete pixel information and k channels.

Assignees

Meta Platforms Tech Llc

Inventors

Classifications

H04N19/186
the unit being a colour or a chrominance component · CPC title
G06T3/4053
based on super-resolution, i.e. the output image resolution being higher than the sensor resolution · CPC title
G02B27/0172
characterised by optical features · CPC title
G06N3/084
Backpropagation, e.g. using gradient descent · CPC title
G06N20/00
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 80224180

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11644685B2 cover?: In one embodiment, a method includes accessing a pair of stereo images for a scene, where each image of the pair of stereo images has incomplete pixel information and k channels, stacking the pair of stereo images to form a stacked input image with 2k channels, processing the stacked input image using a machine-learning model to generate a stacked output image with 2k channels, and separating t…
Who is the assignee on this patent?: Meta Platforms Tech Llc
What technology area does this patent fall under?: Primary CPC classification G02B30/52. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 09 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Neural Super-sampling for Real-time Rendering

Image processing for reducing artifacts caused by removal of scene elements from images

Systems and methods for providing depth map information

Virtual reality head-mounted devices having reduced numbers of cameras, and methods of operating the same

Frequently asked questions