Leveraging multi cues for fine-grained object classification

US10424072B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10424072-B2
Application numberUS-201715418614-A
CountryUS
Kind codeB2
Filing dateJan 27, 2017
Priority dateMar 1, 2016
Publication dateSep 24, 2019
Grant dateSep 24, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides a method comprising estimating a camera pose of an input image and aligning the input image to a desired camera pose based on a feature database. The input image comprises an image of a fine-grained object. The method further comprises classifying the object based on the alignment.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: estimating a camera pose of an input image, wherein the input image comprises an image of a fine-grained object; aligning the input image to a desired camera pose by projecting a first multi-dimensional space onto the input image from a second multi-dimensional space based on a feature database comprising a set of multi-dimensional points, wherein a resulting aligned input image comprises the object inside the first multi-dimensional space, and the first multi-dimensional space has fewer dimensions than the second multi-dimensional space and the set of multi-dimensional points; and classifying the object based on the first multi-dimensional space of the aligned input image. 2. The method of claim 1 , wherein the set of multi-dimensional points comprises a set of sparse multi-dimensional points representing sparse geometry of a shape of the object. 3. The method of claim 2 , wherein the set of sparse multi-dimensional points is based on a set of images including the object, and the set of images are captured from different camera poses to illustrate different illumination changes and backgrounds of the object. 4. The method of claim 3 , wherein a portion of the object in each image of the set of images triangulates to a same multi-dimensional point of the feature database. 5. The method of claim 4 , wherein each multi-dimensional point of the feature database is associated with a corresponding set of local multi-dimensional feature descriptors indicative of a visual appearance of the object about the multi-dimensional point. 6. The method of claim 1 , wherein the classifying the object comprises utilizing a single-layer feature extraction scheme that provides both low-level feature representation and high-level feature representation of the object. 7. The method of claim 1 , wherein the projecting the first multi-dimensional space onto the input image from the second multi-dimensional space comprises: projecting a second multi-dimensional surface onto the input image, wherein the second multi-dimensional space has a same amount of dimensions as the set of multi-dimensional points and the second multi-dimensional surface; and transforming the projected second multi-dimensional surface to a first multi-dimensional surface, wherein the first multi-dimensional surface has a same amount of dimensions as the first multi-dimensional space, and the first multi-dimensional surface comprises a portion of the input image that includes the object. 8. The method of claim 7 , wherein the projecting the first multi-dimensional space onto the input image from the second multi-dimensional space comprises applying a manifold learning algorithm. 9. The method of claim 1 , wherein the input image is decomposed as a set of sparse feature maps convolved with one or more learned convolutional kernels. 10. A system, comprising: at least one processor; and a non-transitory processor-readable memory device storing instructions that when executed by the at least one processor causes the at least one processor to perform operations including: estimating a camera pose of an input image, wherein the input image comprises an image of a fine-grained object; aligning the input image to a desired camera pose by projecting a first multi-dimensional space onto the input image from a second multi-dimensional space based on a feature database comprising a set of multi-dimensional points, wherein a resulting aligned input image comprises the object inside the first multi-dimensional space, and the first multi-dimensional space has fewer dimensions than the second multi-dimensional space and the set of multi-dimensional points; and classifying the object based on the first multi-dimensional space of the aligned input image. 11. The system of claim 10 , wherein the set of multi-dimensional points comprises a set of sparse multi-dimensional points representing sparse geometry of a shape of the object. 12. The system of claim 11 , wherein the set of sparse multi-dimensional points is based on a set of images including the object, and the set of images are captured from different camera poses to illustrate different illumination changes and backgrounds of the object. 13. The system of claim 12 , wherein a portion of the object in each image of the set of images triangulates to a same multi-dimensional point of the feature database. 14. The system of claim 13 , wherein each multi-dimensional point of the feature database is associated with a corresponding set of local multi-dimensional feature descriptors indicative of a visual appearance of the object about the multi-dimensional point. 15. The system of claim 10 , wherein the classifying the object comprises utilizing a single-layer feature extraction scheme that provides both low-level feature representation and high-level feature representation of the object. 16. The system of claim 15 , wherein the projecting the first multi-dimensional space onto the input image from the second multi-dimensional space comprises: projecting a second multi-dimensional surface onto the input image, wherein the second multi-dimensional space has a same amount of dimensions as the set of multi-dimensional points and the second multi-dimensional surface; and transforming the projected second multi-dimensional surface to a first multi-dimensional surface, wherein the first multi-dimensional surface has a same amount of dimensions as the first multi-dimensional space, and the first multi-dimensional surface comprises a portion of the input image that includes the object. 17. The system of claim 16 , wherein the projecting the first multi-dimensional space onto the input image from the second multi-dimensional space comprises applying a manifold learning algorithm. 18. The system of claim 10 , wherein the input image is decomposed as a set of sparse feature maps convolved with one or more learned convolutional kernels. 19. A non-transitory computer readable storage medium including instructions to perform a method comprising: estimating a camera pose of an input image, wherein the input image comprises an image of a fine-grained object; aligning the input image to a desired camera pose by projecting a first multi-dimensional space onto the input image from a second multi-dimensional space based on a feature database comprising a set of multi-dimensional points, wherein a resulting aligned input image comprises the object inside the first multi-dimensional space, and the first multi-dimensional space has fewer dimensions than the second multi-dimensional space and the set of multi-dimensional points; and classifying the object based on the first multi-dimensional space of the aligned input image. 20. The computer readable storage medium of claim 19 , wherein the set of multi-dimensional points comprises a set of sparse multi-dimensional points representing sparse geometry of a shape of the object.

Assignees

Inventors

Classifications

  • G06T7/337Primary

    involving reference images or patches · CPC title

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

  • enforcing sparsity or involving a domain transformation · CPC title

  • Salient features, e.g. scale invariant feature transforms [SIFT] · CPC title

  • involving reference images or patches · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10424072B2 cover?
One embodiment provides a method comprising estimating a camera pose of an input image and aligning the input image to a desired camera pose based on a feature database. The input image comprises an image of a fine-grained object. The method further comprises classifying the object based on the alignment.
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06T7/337. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 24 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).