System and method for detection of objects of interest in imagery

US9740949B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9740949-B1
Application numberUS-201314054584-A
CountryUS
Kind codeB1
Filing dateOct 15, 2013
Priority dateJun 14, 2007
Publication dateAug 22, 2017
Grant dateAug 22, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described is a system for detecting objects of interest in imagery. The system is configured to receive an input video and generate an attention map. The attention map represents features found in the input video that represent potential objects-of-interest (OI). An eye-fixation map is generated based on a subject's eye fixations. The eye-fixation map also represents features found in the input video that are potential OI. A brain-enhanced synergistic attention map is generated by fusing the attention map with the eye-fixation map. The potential OI in the brain-enhanced synergistic attention map are scored, with scores that cross a predetermined threshold being used to designate potential OI as actual or final OI.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for detecting objects of interest in imagery, the system comprising: one or more processors and a memory, the memory having instructions encoded thereon, such that upon execution of the instructions, the one or more processors perform operations of: receiving an input video; generating an attention map, the attention map representing features found in the input video that represent potential objects-of-interest; generating, in real-time, an eye-fixation map, the eye-fixation map representing features found in the input video that, based on a subject's eye fixations in real-time, are potential objects-of-interest; generating a brain-enhanced synergistic attention map by fusing the attention map with the eye-fixation map, the brain-enhanced synergistic map having a collection of potential objects-of-interest from both the attention map and eye-fixation map; scoring the potential objects-of-interest in the brain-enhanced synergistic attention map; and designating the potential objects-of-interest as final objects-of-interest for scores that cross a predetermined threshold. 2. The system as set forth in claim 1 , wherein the memory further includes instructions for causing the one or more processors to perform operations of: generating a masked map that masks the potential objects-of-interest in the attention map; combining the masked map with the input video to generate a masked video having unmasked regions and masked regions, where the masked regions mask the potential objects-of-interest as generated by the attention map; presenting the masked video to a subject; collecting data regarding the subject's eye fixations on the masked video; and generating the eye-fixation map based on the subject's eye fixations. 3. The system as set forth in claim 2 , wherein in collecting data regarding the subject's eye fixations on the masked video, a fixation includes the data points, within a temporal window, having an agreement in spatial position that exceeds a threshold. 4. The system as set forth in claim 3 , wherein generating an attention map further comprises operations of: receiving a series of consecutive frames representing a scene as provided for in the input video, the frames having at least a current frame and a previous frame; generating a surprise map based on features found in the current frame and the previous frame, the surprise map having a plurality of values corresponding to spatial locations within the scene; and determining a surprise in the scene based on a value in the surprise map exceeding a predetermined threshold, the surprise being a potential object-of-interest in the attention map. 5. The system as set forth in claim 4 , wherein combining the masked map with the input video to generate a masked video further comprises an operation of masking each frame independently of each other frame such that there is no temporal continuity of the masking across frames. 6. The system as set forth in claim 5 , wherein masking each frame independently further comprises an operation of blacking out the masked regions while maintaining original pixel values in the unmasked regions. 7. The system as set forth in claim 5 , wherein masking each frame independently further comprises an operation blurring the masked region by convolving the masked region with a Gaussian smoothing kernel. 8. The system as set forth in claim 4 , wherein combining the masked map with the input video to generate a masked video further comprises operations of: determining if a potential object-of-interest in the masked map is in M out of N frames, where both M and N are greater than one, and if so, then designating a region associated with the potential object-of-interest as a masked region for all of the N frames; and blurring the masked region by convolving the masked region with Gaussian smoothing kernels of different sizes. 9. A computer implemented method for detecting objects of interest in imagery, the method comprising an act of: causing one or more processors to execute instructions encoded upon a memory, that upon execution of the instructions, the one or more processors perform operations of: receiving an input video; generating an attention map, the attention map representing features found in the input video that represent potential objects-of-interest; generating, in real-time, an eye-fixation map, the eye-fixation map representing features found in the input video that, based on a subject's eye fixations in real-time, are potential objects-of-interest; generating a brain-enhanced synergistic attention map by fusing the attention map with the eye-fixation map, the brain-enhanced synergistic map having a collection of potential objects-of-interest from both the attention map and eye-fixation map; scoring the potential objects-of-interest in the brain-enhanced synergistic attention map; and designating the potential objects-of-interest as final objects-of-interest for scores that cross a predetermined threshold. 10. The computer implemented method as set forth in claim 9 , further comprising an act of causing the one or more processors to perform operations of: generating a masked map that masks the potential objects-of-interest in the attention map; combining the masked map with the input video to generate a masked video having unmasked regions and masked regions, where the masked regions mask the potential objects-of-interest as generated by the attention map; presenting the masked video to a subject; collecting data regarding the subject's eye fixations on the masked video; and generating the eye-fixation map based on the subject's eye fixations. 11. The computer implemented method as set forth in claim 10 , wherein in collecting data regarding the subject's eye fixations on the masked video, a fixation includes the data points, within a temporal window, having an agreement in spatial position that exceeds a threshold. 12. The computer implemented method as set forth in claim 11 , wherein generating an attention map further comprises operations of: receiving a series of consecutive frames representing a scene as provided for in the input video, the frames having at least a current frame and a previous frame; generating a surprise map based on features found in the current frame and the previous frame, the surprise map having a plurality of values corresponding to spatial locations within the scene; and determining a surprise in the scene based on a value in the surprise map exceeding a predetermined threshold, the surprise being a potential object-of-interest in the attention map. 13. The computer implemented method as set forth in claim 12 , wherein combining the masked map with the input video to generate a masked video further comprises an operation of masking each frame independently of each other frame such that there is no temporal continuity of the masking across frames. 14. The computer implemented method as set forth in claim 13 , wherein masking each frame independently further comprises an operation of blacking out the masked regions while maintaining original pixel values in the unmasked regions. 15. The computer implemented method as set forth in claim 13 , wherein masking each frame independently further comprises an operation blurring the masked region by convolving the masked region with a Gaussian smoothing kernel. 16. The computer implemented method as set forth in claim 12 , wherein combining the masked map with the input video to generate a masked video further comprises operations of: determining if a potential obj

Assignees

Inventors

Classifications

  • of extracted features · CPC title

  • G06V20/52Primary

    Surveillance or monitoring of activities, e.g. for recognising suspicious objects (recognising microscopic objects G06V20/69) · CPC title

  • Determination of region of interest [ROI] or a volume of interest [VOI] · CPC title

  • of extracted features · CPC title

  • with interaction between the filter responses, e.g. cortical complex cells · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9740949B1 cover?
Described is a system for detecting objects of interest in imagery. The system is configured to receive an input video and generate an attention map. The attention map represents features found in the input video that represent potential objects-of-interest (OI). An eye-fixation map is generated based on a subject's eye fixations. The eye-fixation map also represents features found in the input…
Who is the assignee on this patent?
Hrl Lab Llc
What technology area does this patent fall under?
Primary CPC classification G06V20/52. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 22 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).