Who is the assignee on this patent?

Imperial College Innovations Ltd

What technology area does this patent fall under?

Primary CPC classification G06V20/64. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Detecting objects in video data

US10915731B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10915731-B2
Application number	US-201816228517-A
Country	US
Kind code	B2
Filing date	Dec 20, 2018
Priority date	Jun 24, 2016
Publication date	Feb 9, 2021
Grant date	Feb 9, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Certain examples described herein enable semantically-labelled representations of a three-dimensional (3D) space to be generated from video data. In described examples, a 3D representation is a surface element or ‘surfel’ representation, where the geometry of the space is modelled using a plurality of surfaces that are defined within a 3D co-ordinate system. Object-label probability values for spatial elements of frames of video data may be determined using a two-dimensional image classifier. Surface elements that correspond to the spatial elements are identified based on a projection of the surface element representation using an estimated pose for a frame. Object-label probability values for the surface elements are then updated based on the object-label probability values for corresponding spatial elements. This results in a semantically-labelled 3D surface element representation of objects present in the video data. This data enables computer vision and/or robotic applications to make better use of the 3D representation.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for detecting objects in video data, comprising: determining object-label probability values for spatial elements of frames of video data using a two-dimensional image classifier, wherein the object-label probability value for each of the spatial elements indicates a probability that the respective spatial element is an observation of a particular object; identifying surface elements in a three-dimensional surface element representation of a space observed in the frames of video data that correspond to the spatial elements, wherein a correspondence between a spatial element and a surface element is determined based on a projection of the surface element representation using an estimated pose for a frame; and updating object-label probability values for the surface elements based on the object-label probability values for corresponding spatial elements to provide a semantically-labelled three-dimensional surface element representation of objects present in the video data, wherein the object-label probability value for each of the surface elements indicates a probability that the respective surface element represents the particular object. 2. The method of claim 1 , wherein, during processing of said video data, the method comprises: detecting a loop closure event and applying a spatial deformation to the surface element representation, the spatial deformation modifying three-dimensional positions of surface elements in the surface element representation, wherein the spatial deformation modifies the correspondence between spatial elements and surface elements of the surface element representation such that, after the spatial deformation, object-label probability values for a first surface element are updated using object-label probability values for spatial elements that previously corresponded to a second surface element. 3. The method of claim 1 , comprising: processing the frames of video data without a pose graph to generate the three-dimensional surface element representation, including, on a frame-by-frame basis: comparing a rendered frame generated using the three-dimensional surface element representation with a video data frame from the frames of video data to determine a pose of a capture device for the video data frame; and updating the three-dimensional surface element representation using the pose and image data from the video data frame. 4. The method of claim 3 , wherein: a subset of the frames of video data used to generate the three-dimensional surface element representation are input to the two-dimensional image classifier. 5. The method of claim 1 , wherein the frames of video data comprise at least one of colour data, depth data and normal data; and wherein the two-dimensional image classifier is configured to compute object-label probability values based on at least one of colour data, depth data and normal data for a frame. 6. The method of claim 1 , wherein the two-dimensional image classifier comprises a convolutional neural network. 7. The method of claim 6 , wherein the convolutional neural network is configured to output the object-label probability values as a set of pixel maps for each frame of video data, each pixel map in the set corresponding to a different object label in a set of available object labels. 8. The method of claim 6 , wherein the two-dimensional image classifier comprises a deconvolutional neural network communicatively coupled to the output of the convolutional neural network. 9. The method of claim 1 , comprising, after the updating of the object-label probability values for the surface elements: regularising the object-label probability values for the surface elements. 10. The method of claim 9 , wherein regularising comprises: applying a conditional random field to the object-label probability values for surface elements in the surface element representation. 11. The method of claim 9 , wherein regularising the object-label probability values comprises: regularising the object-label probability values assigned to surface elements based on one or more of: surface element positions, surface element colours, and surface element normals. 12. The method of claim 1 , comprising: replacing a set of one or more surface elements with a three-dimensional object definition based on the object-label probability values assigned to said surface elements. 13. The method of claim 1 , comprising: annotating surface elements of a three-dimensional surface element representation of a space with object-labels to provide an annotated representation; generating annotated frames of video data from the annotated representation based on a projection of the annotated representation, the projection using an estimated pose for each annotated frame, each annotated frame comprising spatial elements with assigned object-labels; and training the two-dimensional image classifier using the annotated frames of video data. 14. The method of claim 1 , comprising: obtaining a first frame of video data corresponding to an observation of a first portion of an object; generating an image map for the first frame of video data using the two-dimensional image classifier, said image map indicating the presence of the first portion of the object in an area of the first frame; and determining that a surface element does not project onto the area in the first frame and as such not updating object-label probability values for the surface element based image map values in said area; wherein following detection of a loop closure event the method comprises: modifying a three-dimensional position of the surface element; obtaining a second frame of video data corresponding to a repeated observation of the first portion of the object; generating an image map for the second frame of video data using the two-dimensional image classifier, said image map indicating the presence of the first portion of the object in an area of the second frame; determining that the modified first surface element does project onto the area of the second frame following the loop closure event; and updating object-label probability values for the surface element based on the image map for the second frame of video data, wherein the object-label probability values for the surface element include fused object predictions for the surface element from multiple viewpoints. 15. Apparatus for detecting objects in video data comprising: an image-classifier interface to receive two-dimensional object-label probability distributions for spatial elements of individual frames of video data, wherein the object-label probability distribution for each of the spatial elements includes a set of object-label probability values, and each of the set of object-label probability values indicates a probability that the respective spatial element is an observation of a different respective object; a correspondence interface to receive data indicating, for a given frame of video data, a correspondence between spatial elements within the given frame and surface elements in a three-dimensional surface element representation, said correspondence being determined based on a projection of the surface element representation using an estimated pose for the given frame; and a semantic augmenter to iteratively update object-label probability values assigned to individual surface elements in the three-dimensional surface element representation, wherein the semantic augmenter is configured to use, for a given frame of video data, the data received by the correspondence interface to apply the two-dimensional obj

Assignees

Imperial College Innovations Ltd

Inventors

Classifications

G06V10/7715
Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title
G06V20/64Primary
Three-dimensional [3D] objects · CPC title
G06V20/647Primary
by matching two-dimensional images to three-dimensional objects · CPC title
G06F18/2148
characterised by the process organisation or structure, e.g. boosting cascade · CPC title
G06F18/2415
based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate · CPC title

Patent family

Related publications grouped by family.

View patent family 56891621

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10915731B2 cover?: Certain examples described herein enable semantically-labelled representations of a three-dimensional (3D) space to be generated from video data. In described examples, a 3D representation is a surface element or ‘surfel’ representation, where the geometry of the space is modelled using a plurality of surfaces that are defined within a 3D co-ordinate system. Object-label probability values for …
Who is the assignee on this patent?: Imperial College Innovations Ltd
What technology area does this patent fall under?: Primary CPC classification G06V20/64. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).