Processing stereo images with a machine-learning model

US11644685B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11644685-B2
Application numberUS-202016993788-A
CountryUS
Kind codeB2
Filing dateAug 14, 2020
Priority dateAug 14, 2020
Publication dateMay 9, 2023
Grant dateMay 9, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a method includes accessing a pair of stereo images for a scene, where each image of the pair of stereo images has incomplete pixel information and k channels, stacking the pair of stereo images to form a stacked input image with 2k channels, processing the stacked input image using a machine-learning model to generate a stacked output image with 2k channels, and separating the stacked output image with 2k channels into a pair of reconstructed stereo images for the scene, where each image of the pair of reconstructed stereo images has complete pixel information and k channels.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising, by a computing device: accessing a pair of stereo images for a scene, wherein each image of the pair of stereo images has incomplete pixel information and k channels; stacking the pair of stereo images to form a stacked input image with 2k channels by: calculating an importance score associated with each area among a plurality of areas in the scene; identifying an area with a highest importance score among the plurality of areas in the scene; and stacking the channels of both images by aligning the identified area between the pair of stereo images; processing the stacked input image using a machine-learning model to generate a stacked output image with 2k channels; and separating the stacked output image with 2k channels into a pair of reconstructed stereo images for the scene, wherein each image of the pair of reconstructed stereo images has complete pixel information and k channels. 2. The method of claim 1 , wherein the pair of stereo images is used to provide a stereoscopic view of the scene to a user. 3. The method of claim 1 , wherein an object captured in one of the pair of stereo images is shifted from the other image, wherein a degree of the shift is associated with a distance of the object from a viewpoint of a user. 4. The method of claim 1 , wherein stacking the pair of stereo images to form the stacked input image with 2k channels comprises stacking the channels of both images by aligning pixel coordinates between the pair of stereo images. 5. The method of claim 1 , wherein calculating the importance score associated with each area is based on a relative distance of the area from a vergence location of a user such that a higher importance score is assigned to a first area with a smaller distance to the vergence location of the user than a second area with a larger distance to the vergence location of the user. 6. The method of claim 1 , wherein calculating the importance score associated with each area is based on content associated each area such that a higher importance score is assigned to a first area that is associated with an important content than a second area that is not associated with an important content. 7. The method of claim 1 , wherein the k channels comprise RGB channels. 8. The method of claim 1 , wherein the k channels comprise RGB channels and an alpha channel, wherein the alpha channel indicates a transparency level of each pixel. 9. The method of claim 1 , wherein the pair of stereo images is associated with a frame in a video stream. 10. The method of claim 1 , wherein the machine-learning model is an image reconstruction model that reconstructs restores sampled, noisy or damaged images. 11. The method of claim 1 , wherein the machine-learning model is trained with a loss function that measures differences between each image of the pair of reconstructed stereo images and a corresponding image of a pair of ground truth stereo images. 12. One or more computer-readable non-transitory storage media embodying software that is operable when executed to: access a pair of stereo images for a scene, wherein each image of the pair of stereo images has incomplete pixel information and k channels; stack the pair of stereo images to form a stacked input image with 2k channels by: calculating an importance score associated with each area among a plurality of areas in the scene; identifying an area with a highest importance score among the plurality of areas in the scene; and stacking the channels of both images by aligning the identified area between the pair of stereo images; process the stacked input image using a machine-learning model to generate a stacked output image with 2k channels; and separate the stacked output image with 2k channels into a pair of reconstructed stereo images for the scene, wherein each image of the pair of reconstructed stereo images has complete pixel information and k channels. 13. The media of claim 12 , wherein the pair of stereo images is used to provide a stereoscopic view of the scene to a user. 14. The media of claim 12 , wherein an object captured in one of the pair of stereo images is shifted from the other image, wherein a degree of the shift is associated with a distance of the object from a viewpoint of a user. 15. The media of claim 12 , wherein stacking the pair of stereo images to form the stacked input image with 2k channels comprises stacking the channels of both images by aligning pixel coordinates between the pair of stereo images. 16. The media of claim 12 , wherein calculating the importance score associated with each area is based on a relative distance of the area from a vergence location of a user such that a higher importance score is assigned to a first area with a smaller distance to the vergence location of the user than a second area with a larger distance to the vergence location of the user. 17. The media of claim 12 , wherein calculating the importance score associated with each area is based on content associated each area such that a higher importance score is assigned to a first area that is associated with an important content than a second area that is not associated with an important content. 18. A system comprising: one or more processors; and a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to: access a pair of stereo images for a scene, wherein each image of the pair of stereo images has incomplete pixel information and k channels; stack the pair of stereo images to form a stacked input image with 2k channels by: calculating an importance score associated with each area among a plurality of areas in the scene; identifying an area with a highest importance score among the plurality of areas in the scene; and stacking the channels of both images by aligning the identified area between the pair of stereo images; process the stacked input image using a machine-learning model to generate a stacked output image with 2k channels; and separate the stacked output image with 2k channels into a pair of reconstructed stereo images for the scene, wherein each image of the pair of reconstructed stereo images has complete pixel information and k channels.

Assignees

Inventors

Classifications

  • the unit being a colour or a chrominance component · CPC title

  • based on super-resolution, i.e. the output image resolution being higher than the sensor resolution · CPC title

  • characterised by optical features · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11644685B2 cover?
In one embodiment, a method includes accessing a pair of stereo images for a scene, where each image of the pair of stereo images has incomplete pixel information and k channels, stacking the pair of stereo images to form a stacked input image with 2k channels, processing the stacked input image using a machine-learning model to generate a stacked output image with 2k channels, and separating t…
Who is the assignee on this patent?
Meta Platforms Tech Llc
What technology area does this patent fall under?
Primary CPC classification G02B30/52. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 09 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).