Augmenting layer-based object detection with deep convolutional neural networks

US9542626B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9542626-B2
Application numberUS-201615048757-A
CountryUS
Kind codeB2
Filing dateFeb 19, 2016
Priority dateSep 6, 2013
Publication dateJan 10, 2017
Grant dateJan 10, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

By way of example, the technology disclosed by this document receives image data; extracts a depth image and a color image from the image data; creates a mask image by segmenting the depth image; determines a first likelihood score from the depth image and the mask image using a layered classifier; determines a second likelihood score from the color image and the mask image using a deep convolutional neural network; and determines a class of at least a portion of the image data based on the first likelihood score and the second likelihood score. Further, the technology can pre-filter the mask image using the layered classifier and then use the pre-filtered mask image and the color image to calculate a second likelihood score using the deep convolutional neural network to speed up processing.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for performing object recognition comprising: receiving image data; extracting a depth image and a color image from the image data; creating a mask image by segmenting the image data into a plurality of components; identifying objects within the plurality of components of the mask image; determining a first likelihood score from the depth image and the mask image using a layered classifier; determining a second likelihood score from the color image and the mask image by generating an object image by copying pixels from a first image of the components in the mask image and classifying the object image using a deep convolutional neural network (CNN); and determining a class for at least a portion of the image data based on the first likelihood score and the second likelihood score. 2. A computer-implemented method for performing object recognition comprising: receiving image data; creating a mask image by segmenting the image data into a plurality of components; determining a first likelihood score from the image data and the mask image using a layered classifier; determining a second likelihood score from the image data and the mask image using a deep convolutional neural network (CNN); and determining a class for at least a portion of the image data based on the first likelihood score and the second likelihood score. 3. The computer-implemented method of claim 2 , wherein the determining the second likelihood score from the image data and the mask image using the deep CNN includes: extracting a first image from the image data; generating an object image by copying pixels from the first image of the components in the mask image; classifying the object image using the deep CNN; generating classification likelihood scores indicating probabilities of the object image belonging to different classes of the deep CNN; and generating the second likelihood score based on the classification likelihood scores. 4. The computer-implemented method of claim 3 , wherein the first image is one of a color image, a depth image, and a combination of a color image and a depth image. 5. The computer-implemented method of claim 2 , wherein determining the class of at least the portion of the image data includes: fusing the first likelihood score and the second likelihood score into an overall likelihood score; and responsive to satisfying a predetermined threshold with the overall likelihood score, classifying the at least the portion of the image data as representing a person using the overall likelihood score. 6. The computer-implemented method of claim 2 , further comprising: extracting a depth image and a color image from the image data, wherein determining the first likelihood score from the image data and the mask image using the layered classifier includes determining the first likelihood score from the depth image and the mask image using the layered classifier, and determining the second likelihood score from the image data and the mask image using the deep CNN includes determining the second likelihood score from the color image and the mask image using the deep CNN. 7. The computer-implemented method of claim 2 , wherein the deep CNN has a soft max layer as a final layer to generate the second likelihood score that the at least the portion of the image data represents a person. 8. The computer-implemented method of claim 2 , further comprising: converting the first likelihood score and the second likelihood score into a first log likelihood value and a second log likelihood value; and calculating a combined likelihood score by using a weighted summation of the first log likelihood value and the second log likelihood value. 9. The computer-implemented method of claim 2 , wherein the class is a person. 10. The computer-implemented method of claim 2 , wherein determining the second likelihood score further comprises: determining the second likelihood score using the image data and the first likelihood score from the layered classifier. 11. A system for performing object recognition comprising: a processor; and a memory storing instructions that, when executed, cause the system to: create a mask image by segmenting image data into a plurality of components; determine a first likelihood score from the image data and the mask image using a layered classifier; determine a second likelihood score from the image data and the mask image using a deep convolutional neural network (CNN); and determine a class for at least a portion of the image data based on the first likelihood score and the second likelihood score. 12. The system of claim 11 , wherein the instructions that cause the system to determine the second likelihood score from the image data and the mask image using the deep CNN further cause the system to: extract a first image from the image data; generate an object image by copying pixels from the first image of the components in the mask image; classify the object image using the deep CNN; generate classification likelihood scores indicating probabilities of the object image belonging to different classes of the deep CNN; and generate the second likelihood score based on the classification likelihood scores. 13. The system of claim 12 , wherein the first image is one of a color image, a depth image, and a combination of a color image and a depth image. 14. The system claim 11 , wherein the instructions that cause the system to determine the class of at least the portion of the image data further cause the system to: fuse the first likelihood score and the second likelihood score into an overall likelihood score; and responsive to satisfying a predetermined threshold with the overall likelihood score, classify the at least the portion of the image data as representing a person using the overall likelihood score. 15. The system of claim 11 , wherein the memory stores further instructions that cause the system to: extract a depth image and a color image from the image data, wherein determining the first likelihood score from the image data and the mask image using the layered classifier includes determining the first likelihood score from the depth image and the mask image using the layered classifier, and determining the second likelihood score from the image data and the mask image using the deep CNN includes determining the second likelihood score from the color image and the mask image using the deep CNN. 16. The system of claim 11 wherein the deep CNN has a soft max layer as a final layer to generate the second likelihood score that the at least the portion of the image data represents a person. 17. The system of claim 11 , wherein the memory stores further instructions that cause the system to: convert the first likelihood score and the second likelihood score into a first log likelihood value and a second log likelihood value; and calculate a combined likelihood score by using a weighted summation of the first log likelihood value and the second log likelihood value. 18. The system of claim 11 , wherein the class is a person. 19. The system of claim 11 , wherein the instructions that cause the system to determine the second likelihood score further cause the system to: pre-filter the mask image using the layered classifier; and determine the second likelihood score using the image data and the pre-filtered mask image. 20. The system of claim 11 , wherein the layered classifier determines the first like

Assignees

Inventors

Classifications

  • using probabilistic graphical models from image or video features, e.g. Markov models or Bayesian networks · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

  • using classification, e.g. of video objects · CPC title

  • Multiple classes · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9542626B2 cover?
By way of example, the technology disclosed by this document receives image data; extracts a depth image and a color image from the image data; creates a mask image by segmenting the depth image; determines a first likelihood score from the depth image and the mask image using a layered classifier; determines a second likelihood score from the color image and the mask image using a deep convolu…
Who is the assignee on this patent?
Toyota Motor Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).