Coarse-to-fine hand detection method using deep neural network

US10817716B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10817716-B2
Application numberUS-201816228436-A
CountryUS
Kind codeB2
Filing dateDec 20, 2018
Priority dateJun 6, 2017
Publication dateOct 27, 2020
Grant dateOct 27, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments provide a process to identify one or more areas containing a hand or hands of one or more subjects in an image. The detection process can start with coarsely locating one or more segments in the image that contain portions of the hand(s) of the subject(s) in the image using a coarse CNN. The detection process can then combine these segments to obtain the one or more areas capturing the hand(s) of the subject(s) in the image. The combined area(s) can then be fed to a grid-based deep neural network finely detect area(s) in the image that contain only the hand(s) of the subject(s) captured.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for detecting a hand of a subject in an image, the method being executed by a processor configured to execute machine-readable instructions, the method comprising: receiving image data for an image, the image capturing one or more hands of one or more subjects; processing the image data using a first location network to obtain segments in the image, each of the segments containing the portion of the hand of the subject; combining the segments into a first image area; expanding the size of the first image area by a predetermined margin; and processing the first image area using a grid-based detection network to obtain a second image area, the second image area capturing a hand of the subject, wherein expanding the size of the first image area by the predetermined margin comprises: dividing the image into n by n grids, wherein the predetermined margin is the size of an individual grid cell; expanding the first image area by the predetermined margin; and aligning the border of the first image area to the grids. 2. The method of claim 1 , wherein the first location network includes a convolution neural network (CNN) having two sub stages connected in a series. 3. The method of claim 1 , wherein the segments include a first segment and a second segment, the first segment containing a first portion of the hand of the subject, and the second segment containing a second portion of the at least one hand of the subject, wherein the first portion overlaps with the second portion at least in part. 4. The method of claim 1 , wherein the grid-based detection network comprises a deep CNN that includes multiple layers configured to process the grid cells of the first image area. 5. The method of claim 4 , wherein the grid-based detection network includes more than three layers. 6. The method of claim 1 , further comprising training the first location network with training image data having markings of positions and sizes of hands of subject using a Batch Gradient Descent method. 7. The method of claim 6 , further comprising processing the training image data using the first location network to obtain image segments containing portions of the hands of the subjects, combining and expanding the image segments to obtain image areas capturing the hands of the subjects, and training the grid-based detection network with image areas. 8. A system for detecting a hand of a subject in an image, the system comprising a processor configured to execute machine-readable instructions such that when the machine-readable instructions are executed, the system is caused to perform operations including: receiving image data for an image, the image capturing one or more hands of one or more subjects; processing the image data using a first location network to obtain segments in the image, each of the segments containing the portion of the hand of the subject; combining the segments into a first image area; expanding the size of the first image area by a predetermined margin; and processing the first image area using a grid-based detection network to obtain a second image area, the second image area capturing a hand of the subject, wherein expanding the size of the first image area by the predetermined margin comprises: dividing the image into n by n grids, wherein the predetermined margin is the size of an individual grid cell; expanding the first image area by the predetermined margin; and aligning the border of the first image area to the grids. 9. The system of claim 8 , wherein the first location network includes a convolution neural network (CNN) having two sub stages connected in a series. 10. The system of claim 8 , wherein the segments include a first segment and a second segment, the first segment containing a first portion of the hand of the subject, and the second segment containing a second portion of the at least one hand of the subject, wherein the first portion overlaps with the second portion at least in part. 11. The system of claim 8 , wherein the grid-based detection network comprises a deep CNN that includes multiple layers configured to process the grid cells of the first image area. 12. The system of claim 11 , wherein the grid-based detection network includes more than three layers. 13. The system of claim 11 , wherein the processor is further caused to perform processing the training image data using the first location network to obtain image segments containing portions of the hands of the subjects, combining and expanding the image segments to obtain image areas capturing the hands of the subjects, and training the grid-based detection network with image areas. 14. The system of claim 8 , wherein the processor is further caused to perform training image data having markings of positions and sizes of hands of subject using a Batch Gradient Descent method. 15. A non-transitory computer readable storage medium storing a plurality of machine-readable instructions that, when executed by a processor of a computer system for detecting a hand of a subject in an image, cause the computer system to perform operations including: receiving image data for an image, the image capturing one or more hands of one or more subjects; processing the image data using a first location network to obtain segments in the image, each of the segments containing the portion of the hand of the subject; combining the segments into a first image area; expanding the size of the first image area by a predetermined margin; and processing the first image area using a grid-based detection network to obtain a second image area, the second image area capturing a hand of the subject, wherein expanding the size of the first image area by the predetermined margin comprises: dividing the image into n by n grids, wherein the predetermined margin is the size of an individual grid cell; expanding the first image area by the predetermined margin; and aligning the border of the first image area to the grids. 16. The non-transitory computer readable storage medium of claim 15 , wherein the segments include a first segment and a second segment, the first segment containing a first portion of the hand of the subject, and the second segment containing a second portion of the at least one hand of the subject, wherein the first portion overlaps with the second portion at least in part. 17. The non-transitory computer readable storage medium of claim 15 , wherein the grid-based detection network comprises a deep convolution neural network (CNN) that includes multiple layers configured to process the grid cells of the first image area. 18. The non-transitory computer readable storage medium of claim 15 , further comprising instructions that when executed cause the computer system to train the first location network with training image data having markings of positions and sizes of hands of subject using a Batch Gradient Descent method. 19. The non-transitory computer readable storage medium of claim 18 , further comprising instructions that when executed cause the computer system to process the training image data using the first location network to obtain image segments containing portions of the hands of the subjects, combine and expand the image segments to obtain image areas capturing the hands of the subjects, and training the grid-based detection network with image areas.

Assignees

Inventors

Classifications

  • Classification techniques · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

  • G06V40/113Primary

    Recognition of static hand signs · CPC title

  • Distances to prototypes · CPC title

  • Graphical models, e.g. Bayesian networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10817716B2 cover?
Embodiments provide a process to identify one or more areas containing a hand or hands of one or more subjects in an image. The detection process can start with coarsely locating one or more segments in the image that contain portions of the hand(s) of the subject(s) in the image using a coarse CNN. The detection process can then combine these segments to obtain the one or more areas capturing …
Who is the assignee on this patent?
Midea Group Co Ltd, Seetatech Beijing Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 27 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).