Object detection using deep neural networks

US9275308B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9275308-B2
Application numberUS-201414288194-A
CountryUS
Kind codeB2
Filing dateMay 27, 2014
Priority dateMay 31, 2013
Publication dateMar 1, 2016
Grant dateMar 1, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting objects in images. One of the methods includes receiving an input image. A full object mask is generated by providing the input image to a first deep neural network object detector that produces a full object mask for an object of a particular object type depicted in the input image. A partial object mask is generated by providing the input image to a second deep neural network object detector that produces a partial object mask for a portion of the object of the particular object type depicted in the input image. A bounding box is determined for the object in the image using the full object mask and the partial object mask.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method comprising: receiving, by one or more computers, an input image; generating, by one or more computers, a full object mask by providing the input image to a first deep neural network object detector that produces a full object mask for an object of a particular object type depicted in the input image, wherein the full object mask identifies regions of the input image that correspond to the object and regions of the input image that do not correspond to the object; generating, by one or more computers, a partial object mask by providing the input image to a second deep neural network object detector that produces a partial object mask for a portion of the object of the particular object type depicted in the input image; and determining, by one or more computers, a bounding box for the object in the image using the full object mask and the partial object mask. 2. The method of claim 1 , wherein the portion of the object corresponds to the bottom portion, the top portion, the left portion, or the right portion of the object. 3. The method of claim 1 , wherein the bounding box has a partial bounding box corresponding to the partial object mask, and wherein determining the bounding box for the object in the image using the full object mask and the partial object mask comprises determining a bounding box that has a best fit, among a plurality of candidate bounding boxes, of the full bounding box with the full object mask and the partial bounding box with the partial object mask. 4. The method of claim 1 , wherein determining a bounding box for the object in the image using the full object mask and the partial object mask comprises: computing a score for each of a plurality of candidate bounding boxes based on a first measure of overlap between the bounding box and the full object mask and a second measure of overlap between the bounding box and the partial object mask; and determining a bounding box having a highest score. 5. The method of claim 4 , wherein the score for a candidate bounding box bb is given by: S ⁡ ( bb ) = ∑ h ⁢ ⁢ ( S ⁡ ( bb ⁡ ( h ) , m h ) - S ⁡ ( bb ⁡ ( h _ ) , m h _ ) ) wherein bb(h) is a partial bounding box for a corresponding partial object mask m h , wherein bb( h ) is an opposite partial bounding box for a corresponding opposite partial object mask m h , and S(bb(h), m h ) is a measure of overlap between the partial bounding box and the partial object mask. 6. The method of claim 4 , further comprising: generating a second full object mask by providing a portion of the image corresponding to the bounding box to the first deep neural network object detector; generating a second partial object mask by providing the portion of the image corresponding to the bounding box to the second deep neural network object detector; computing a score for each of a second plurality of candidate bounding boxes based on a third measure of overlap between each bounding box and the second full object mask and a fourth measure of overlap between each bounding box and the second partial object mask; and determining a refined bounding box of the second plurality of candidate bounding boxes having a highest score. 7. The method of claim 1 , further comprising: determining full and partial object masks for subwindows of each of multiple subwindow scales; and merging object masks determined at same values of the subwindow scales. 8. The method of claim 7 , wherein merging the object masks determined at the multiple values of the scale s comprises averaging the object masks. 9. The method of claim 1 , wherein determining full and partial object masks for subwindows of each of multiple subwindow scales comprises determining full and partial object masks for no more than 50 subwindows. 10. The method of claim 1 , further comprising: generating a predicted object type by providing the input image to a third deep neural network classifier that produces a predicted object type for a portion of the image corresponding to the bounding box; determining that the predicted object type does not match the particular object type; and removing the bounding box from consideration. 11. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving an input image; generating a full object mask by providing the input image to a first deep neural network object detector that produces a full object mask for an object of a particular object type depicted in the input image, wherein the full object mask identifies regions of the input image that correspond to the object and regions of the input image that do not correspond to the object; generating a partial object mask by providing the input image to a second deep neural network object detector that produces a partial object mask for a portion of the object of the particular object type depicted in the input image; and determining a bounding box for the object in the image using the full object mask and the partial object mask. 12. The system of claim 11 , wherein the portion of the object corresponds to the bottom portion, the top portion, the left portion, or the right portion of the object. 13. The system of claim 11 , wherein the bounding box has a partial bounding box corresponding to the partial object mask, and wherein determining t

Assignees

Inventors

Classifications

  • G06V10/454Primary

    Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

  • References adjustable by an adaptive method, e.g. learning · CPC title

  • G06K9/66Primary

    Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9275308B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for detecting objects in images. One of the methods includes receiving an input image. A full object mask is generated by providing the input image to a first deep neural network object detector that produces a full object mask for an object of a particular object type depicted in the input image. A …
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06V10/454. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 01 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).