System and method for fast object detection

US11113507B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11113507-B2
Application numberUS-201815986689-A
CountryUS
Kind codeB2
Filing dateMay 22, 2018
Priority dateMay 22, 2018
Publication dateSep 7, 2021
Grant dateSep 7, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides a method comprising identifying a salient part of an object in an input image based on processing of a region of interest (RoI) in the input image at an electronic device. The method further comprises determining an estimated full appearance of the object in the input image based on the salient part and a relationship between the salient part and the object. The electronic device is operated based on the estimated full appearance of the object.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: identifying one or more salient parts of an object in an input image by classifying, at an electronic device, a set of input patches of the input image utilizing a multi-label classification network, wherein the multi-label classification network is trained to capture one or more global characteristics of the object and one or more local characteristics of the one or more salient parts of the object based on input patches cropped from a set of training images, the multi-label classification network classifies at least one input patch of the set of input patches as a first object classification representing a background, and the multi-label classification network classifies one or more other input patches of the set of input patches as one or more additional object classifications representing the one or more salient parts of the object; determining an estimated full appearance of the object in the input image based on the one or more salient parts and a relationship between the one or more salient parts and the object as defined by one or more bounding box templates for the one or more salient parts; and invoking an action on the electronic device based on the estimated full appearance of the object. 2. The method of claim 1 , wherein the identifying the one or more salient parts of the object in the input image further comprises: generating a set of feature maps based on a sparse image pyramid; and determining a region of interest (RoI) in the input image based on the set of feature maps. 3. The method of claim 2 , the method further comprising: resizing the input image by generating the sparse image pyramid comprising one or more pyramid levels, wherein each pyramid level corresponds to different scales of the input image. 4. The method of claim 3 , wherein the generating the set of feature maps comprises: for each input patch of each pyramid level of the sparse image pyramid, classifying the input patch as one of a plurality of object classifications utilizing the multi-label classification network. 5. The method of claim 1 , wherein the object is a face, and the one or more salient parts is one or more facial parts of the face. 6. The method of claim 4 , wherein the plurality of object classifications comprise the first object classification and the one or more additional object classifications, and at least one of the plurality of object classifications represents at least one of: the background, whole face, eye, nose, whole mouth, left corner of mouth, right corner of mouth, or ear. 7. The method of claim 4 , further comprising: in response to the multi-label classification network capturing a global characteristic of the object in an input patch, determining a location of the object in the input image based on a location of the input patch. 8. The method of claim 4 , further comprising: in response to the multi-label classification network capturing a local characteristic of the one or more salient parts of the object in an input patch, inferring a location of the object in the input image based on a location of the input patch and the relationship between the one or more salient parts and the object as defined by the one or more bounding box templates for the one or more salient parts. 9. The method of claim 1 , wherein the action comprises: in response to a request to capture a picture via a camera coupled to the electronic device: controlling the capture of the picture based on detecting presence of one or more expected features in the input image utilizing the multi-label classification network, wherein the input image is a camera view of the camera. 10. The method of claim 1 , wherein the action comprises: in response to a request to capture a picture via a camera coupled to the electronic device: determining a current composition of the picture by detecting a location and a size of each object in the input image, wherein the input image is a camera view of the camera; and providing one or more suggestions to alter the current composition of the picture based on the current composition. 11. A system comprising: at least one processor; and a non-transitory processor-readable memory device storing instructions that when executed by the at least one processor causes the at least one processor to perform operations including: identifying one or more salient parts of an object in an input image by classifying, at an electronic device, a set of input patches of the input image utilizing a multi-label classification network, wherein the multi-label classification network is trained to capture one or more global characteristics of the object and one or more local characteristics of the one or more salient parts of the object based on input patches cropped from a set of training images, the multi-label classification network classifies at least one input patch of the set of input patches as a first object classification representing a background, and the multi-label classification network classifies one or more other input patches of the set of input patches as one or more additional object classifications representing the one or more salient parts of the object; determining an estimated full appearance of the object in the input image based on the one or more salient parts and a relationship between the one or more salient parts and the object as defined by one or more bounding box templates for the one or more salient parts; and invoking an action on the electronic device based on the estimated full appearance of the object. 12. The system of claim 11 , wherein the identifying the one or more salient parts of the object in the input image further comprises: generating a set of feature maps based on a sparse image pyramid; and determining a region of interest (RoI) in the input image based on the set of feature maps. 13. The system of claim 12 , wherein the generating the set of feature maps comprises: for each input patch of each pyramid level of the sparse image pyramid, classifying the input patch as one of a plurality of object classifications utilizing the multi-label classification network. 14. The system of claim 11 , wherein the object is a face, and the one or more salient parts is one or more facial parts of the face. 15. The system of claim 13 , wherein the plurality of object classifications comprise the first object classification and the one or more additional object classifications, and at least one of the plurality of object classifications represents at least one of: the background, whole face, eye, nose, whole mouth, left corner of mouth, right corner of mouth, or ear. 16. The system of claim 13 , wherein the operations further comprise: in response to the multi-label classification network capturing a global characteristic of the object in an input patch, determining a location of the object in the input image based on a location of the input patch; and in response to the multi-label classification network capturing a local characteristic of the one or more salient parts of the object in the input patch, inferring the location of the object in the input image based on the location of the input patch and the relationship between the one or more salient parts and the object as defined by the one or more bounding box templates for the one or more salient parts. 17. A non-transitory computer readable storage medium including instructions to perform a method comprising: identifying one or more salient parts of an object in an input image by classifying, at an electronic device,

Assignees

Inventors

Classifications

  • Sparse representations · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

  • using classification, e.g. of video objects · CPC title

  • using facial parts and geometric relationships · CPC title

  • where the recognised objects include parts of the human body · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11113507B2 cover?
One embodiment provides a method comprising identifying a salient part of an object in an input image based on processing of a region of interest (RoI) in the input image at an electronic device. The method further comprises determining an estimated full appearance of the object in the input image based on the salient part and a relationship between the salient part and the object. The electron…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 07 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).