Analysis and manipulation of images and video for generation of surround views
US-2015130799-A1 · May 14, 2015 · US
US11488380B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11488380-B2 |
| Application number | US-202117338217-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 3, 2021 |
| Priority date | Apr 26, 2018 |
| Publication date | Nov 1, 2022 |
| Grant date | Nov 1, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A multi-view interactive digital media representation (MVIDMR) of an object can be generated from live images of an object captured from a camera. Selectable tags can be placed at locations on the object in the MVIDMR. When the selectable tags are selected, media content can be output which shows details of the object at location where the selectable tag is placed. A machine learning algorithm can be used to automatically recognize landmarks on the object in the frames of the MVIDMR and a structure from motion calculation can be used to determine 3-D positions associated with the landmarks. A 3-D skeleton associated with the object can be assembled from the 3-D positions and projected into the frames associated with the MVIDMR. The 3-D skeleton can be used to determine the selectable tag locations in the frames of the MVIDMR of the object.
Opening claim text (preview).
What is claimed is: 1. A method comprising: processing a recording, generated using a recording device, of a first plurality of frames captured by a camera of the recording device, the first plurality of frames comprising different views of a car; generating a multi-view interactive digital media representation (MVIDMR) of the car including a second plurality of frames from the first plurality of frames wherein the different views of the car are included in each of the second plurality of frames; determining, based on heatmaps and part affinity fields generated using a machine learning algorithm on the second plurality of frames, a skeleton for the car, a plurality of landmarks forming joints of the skeleton; and rendering a first selectable tag into the second plurality of frames to form a third plurality of frames associated with a tagged MVIDMR wherein the first selectable tag is associated with a first landmark positioned at a first joint within the skeleton and wherein the first selectable tag is rendered into the second plurality frames relative to 2-D pixel locations corresponding to the first joint in the second plurality of frames, a first frame from the third plurality of frames of the tagged MVIDMR that includes the first selectable tag being displayable on a display device of a user. 2. The method of claim 1 , further comprising: receiving, from the display device, input indicating selection of the first selectable tag by the user; and in response to receiving the input, causing display of, on the display device of the user, media content associated with the first selectable. 3. The method of claim 2 , wherein the landmarks are selected from the group consisting of a location on a roof of the car, a location on a side mirror on the car, a location on a tail light of the car, a location on tires of the car and a location headlights on the car. 4. The method of claim 2 , wherein the first selectable tag is associated with a damaged location on the car and wherein the media content shows one or more close-up views of the damaged location. 5. The method of claim 4 , wherein the first selectable tag is associated with a damaged location on the car and wherein the media content further shows an assessment of severity of damage to the damaged location. 6. The method of claim 4 , wherein the first selectable tag is associated with a damaged location on the car and wherein the media content further shows an estimate of cost of repairing damage to the damaged location. 7. The method of claim 4 , wherein the first selectable tag is associated with a component or a region of the car and wherein the media content shows one or more close-up views of the component or the region of the car. 8. The method of claim 4 , wherein the MVIDMR shows an interior of the car. 9. The method of claim 1 , wherein the displayed tagged MVIDMR comprises a 360 degree view of the car associated with an advertisement to sell the car. 10. The method of claim 1 , wherein the skeleton is a 3-D skeleton. 11. The method of claim 10 , further comprising based upon a structure from motion calculation, determining 3-D positions of the joints of the 3-D skeleton. 12. The method of claim 11 , wherein the structure from motion calculation includes a bundle adjustment calculation. 13. The method of claim 11 , further comprising determining the 2-D pixel locations associated with the 3-D positions of the joints of the 3-D skeleton. 14. The method of claim 12 , further comprising rendering the 3-D skeleton into a first frame from among the second plurality of frames associated with the MVIDMR. 15. The method of claim 13 , wherein the 3-D skeleton is rendered over the car. 16. The method of claim 13 , wherein the 3-D skeleton is rendered off-set from the car. 17. The method of claim 13 , wherein the rendering includes projecting 3-D positions associated with joints of the 3-D skeleton into a 2-D pixel coordinate system associated with the first frame. 18. The method of claim 13 , wherein a first portion of the joints of the 3-D skeleton rendered into the first frame are associated with first landmarks visible on the car in the first frame and wherein a second portion of the joints of the 3-D skeleton rendered into the first frame are associated with second landmarks occluded on the car in the first frame. 19. The method of claim 1 wherein the machine learning algorithm includes a neural net. 20. The method of claim 1 , wherein the first plurality of frames are captured from a video stream as the recording device moves along a trajectory.
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
for representing the structure of the pattern or shape of an object therefor · CPC title
Three-dimensional [3D] objects · CPC title
Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries · CPC title
Terrestrial scenes (scenes under surveillance with static cameras G06V20/52; scenes perceived from the exterior of a vehicle G06V20/56; scenes perceived from the interior of a vehicle G06V20/59) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.