What technology area does this patent fall under?

Primary CPC classification G06V10/462. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 16 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

End-to end visual recognition system and methods

US9418317B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9418317-B2
Application number	US-201414245159-A
Country	US
Kind code	B2
Filing date	Apr 4, 2014
Priority date	Jul 8, 2010
Publication date	Aug 16, 2016
Grant date	Aug 16, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

We describe an end-to-end visual recognition system, where “end-to-end” refers to the ability of the system of performing all aspects of the system, from the construction of “maps” of scenes, or “models” of objects from training data, to the determination of the class, identity, location and other inferred parameters from test data. Our visual recognition system is capable of operating on a mobile hand-held device, such as a mobile phone, tablet or other portable device equipped with sensing and computing power. Our system employs a video based feature descriptor, and we characterize its invariance and discriminative properties. Feature selection and tracking are performed in real-time, and used to train a template-based classifier during a capture phase prompted by the user. During normal operation, the system scores objects in the field of view based on their ranking.

First claim

Opening claim text (preview).

What is claimed is: 1. A visual recognition apparatus for identifying objects captured in a video stream having a captured time period, the apparatus comprising: a hardware processor; and programming in a non-transitory computer readable medium and executable on the hardware processor for: capturing a video stream on an electronic device having an image sensor, said video stream comprising a plurality of temporally adjacent images; enabling a user of the electronic device to select, from said video stream, a target object or scene for training; associating each frame in an image with a corresponding frame in temporally adjacent images, or in images taken from nearby vantage points; and temporally aggregating statistics computed at one or more collections of temporally corresponding frames, into a descriptor; wherein said programming ranks image features according to their structural stability margin; and wherein said structural stability margin comprises a maximum norm of the nuisance that does not cause a singularity in the detection mechanism. 2. The apparatus recited in claim 1 , wherein said temporal aggregating of statistics is performed by computing a mean, or median, or mode, or sample histogram of a contrast-invariant function of the image in said frames. 3. The apparatus recited in claim 1 , wherein said programming performs steps comprising: spatially aggregating such statistics into a representation that is insensitive to nuisance factor and distinctive; exploiting such a representation within a classification scheme to enable the detection, localization, recognition and categorization of objects and scenes in video; and displaying the result of the classification scheme by overlaying information on the live video stream, optionally localized and overlaid on the object of interest. 4. The apparatus recited in claim 1 , wherein said programming performs steps comprising: selecting a plurality of features corresponding to translational, similarity, affine or more general reference frames from the video stream for objects in a field of view of the video stream; and performing such a selection at a plurality of scales, and using topological consistency across scale as a criterion for propagating said general reference frames across different scales. 5. The apparatus recited in claim 4 , wherein said plurality of features comprises a plurality of feature points. 6. The apparatus recited in claim 1 , wherein said programming includes a canonization mechanism which does not rely on a co-variant detector. 7. The apparatus recited in claim 1 , wherein said programming canonizes rotation in response to a gravity sensor signal. 8. The apparatus recited in claim 4 , wherein said programming performs steps comprising: computing a co-variant region that is proximate to a feature point of said feature; computing a contrast invariant feature; and performing a temporal aggregation operation of a number of statistics computed on each image associated with the plurality of video frames over a time period. 9. The apparatus recited in claim 8 , wherein the temporal aggregation operation comprises aggregating the contrast invariant feature at each video frame during the time period at the corresponding scale of a feature point of the feature. 10. A visual recognition method for identifying objects captured in a video stream having a captured time period, the method comprising: capturing a video stream on an electronic device having an image sensor, said video stream comprising a plurality of temporally adjacent image; enabling a user of the electronic device to select, from said video stream, a target object or scene for training; associating each frame in an image with a corresponding frame in temporally adjacent images, or in images taken from nearby vantage points; temporally aggregating statistics computed at one or more collections of temporally corresponding frames, into a descriptor; and ranking image features an according to their structural stability margin; wherein said structural stability margin comprises a maximum norm of the nuisance that does not cause a singularity in the detection mechanism; and wherein said method is performed by executing programming on at least one hardware processor, said programming residing on a non-transitory medium readable by the hardware processor. 11. The method recited in claim 10 , wherein said aggregation is performed by computing a mean, or median, or mode, or sample histogram of a contrast-invariant function of the image in said frames. 12. The method recited in claim 10 , further comprising: spatially aggregating such statistics into a representation that is insensitive to nuisance factor and distinctive; exploiting such a representation within a classification scheme to enable the detection, localization, recognition and categorization of objects and scenes in video; and displaying the result of the classification scheme by overlaying information on the live video stream, optionally localized and overlaid on the object of interest. 13. The method recited in claim 10 , further comprising: selecting a plurality of features corresponding to translational, similarity, affine or more general reference frames from the video stream for objects in a field of view of the video stream; and performing such a selection at a plurality of scales, and using topological consistency across scale as a criterion for propagating said general reference frames across different scales.

Assignees

Univ California

Inventors

Classifications

G06V10/772
Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries · CPC title
G06F18/24765
Rule-based classification · CPC title
G06V10/462Primary
Salient features, e.g. scale invariant feature transforms [SIFT] · CPC title
G06F18/28
Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries · CPC title
G06T7/2073
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 45441849

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9418317B2 cover?: We describe an end-to-end visual recognition system, where “end-to-end” refers to the ability of the system of performing all aspects of the system, from the construction of “maps” of scenes, or “models” of objects from training data, to the determination of the class, identity, location and other inferred parameters from test data. Our visual recognition system is capable of operating on a mob…
Who is the assignee on this patent?: Univ California
What technology area does this patent fall under?: Primary CPC classification G06V10/462. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 16 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).