Systems and methods for enhancing displayed images
US-9466130-B2 · Oct 11, 2016 · US
US9740949B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9740949-B1 |
| Application number | US-201314054584-A |
| Country | US |
| Kind code | B1 |
| Filing date | Oct 15, 2013 |
| Priority date | Jun 14, 2007 |
| Publication date | Aug 22, 2017 |
| Grant date | Aug 22, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Described is a system for detecting objects of interest in imagery. The system is configured to receive an input video and generate an attention map. The attention map represents features found in the input video that represent potential objects-of-interest (OI). An eye-fixation map is generated based on a subject's eye fixations. The eye-fixation map also represents features found in the input video that are potential OI. A brain-enhanced synergistic attention map is generated by fusing the attention map with the eye-fixation map. The potential OI in the brain-enhanced synergistic attention map are scored, with scores that cross a predetermined threshold being used to designate potential OI as actual or final OI.
Opening claim text (preview).
What is claimed is: 1. A system for detecting objects of interest in imagery, the system comprising: one or more processors and a memory, the memory having instructions encoded thereon, such that upon execution of the instructions, the one or more processors perform operations of: receiving an input video; generating an attention map, the attention map representing features found in the input video that represent potential objects-of-interest; generating, in real-time, an eye-fixation map, the eye-fixation map representing features found in the input video that, based on a subject's eye fixations in real-time, are potential objects-of-interest; generating a brain-enhanced synergistic attention map by fusing the attention map with the eye-fixation map, the brain-enhanced synergistic map having a collection of potential objects-of-interest from both the attention map and eye-fixation map; scoring the potential objects-of-interest in the brain-enhanced synergistic attention map; and designating the potential objects-of-interest as final objects-of-interest for scores that cross a predetermined threshold. 2. The system as set forth in claim 1 , wherein the memory further includes instructions for causing the one or more processors to perform operations of: generating a masked map that masks the potential objects-of-interest in the attention map; combining the masked map with the input video to generate a masked video having unmasked regions and masked regions, where the masked regions mask the potential objects-of-interest as generated by the attention map; presenting the masked video to a subject; collecting data regarding the subject's eye fixations on the masked video; and generating the eye-fixation map based on the subject's eye fixations. 3. The system as set forth in claim 2 , wherein in collecting data regarding the subject's eye fixations on the masked video, a fixation includes the data points, within a temporal window, having an agreement in spatial position that exceeds a threshold. 4. The system as set forth in claim 3 , wherein generating an attention map further comprises operations of: receiving a series of consecutive frames representing a scene as provided for in the input video, the frames having at least a current frame and a previous frame; generating a surprise map based on features found in the current frame and the previous frame, the surprise map having a plurality of values corresponding to spatial locations within the scene; and determining a surprise in the scene based on a value in the surprise map exceeding a predetermined threshold, the surprise being a potential object-of-interest in the attention map. 5. The system as set forth in claim 4 , wherein combining the masked map with the input video to generate a masked video further comprises an operation of masking each frame independently of each other frame such that there is no temporal continuity of the masking across frames. 6. The system as set forth in claim 5 , wherein masking each frame independently further comprises an operation of blacking out the masked regions while maintaining original pixel values in the unmasked regions. 7. The system as set forth in claim 5 , wherein masking each frame independently further comprises an operation blurring the masked region by convolving the masked region with a Gaussian smoothing kernel. 8. The system as set forth in claim 4 , wherein combining the masked map with the input video to generate a masked video further comprises operations of: determining if a potential object-of-interest in the masked map is in M out of N frames, where both M and N are greater than one, and if so, then designating a region associated with the potential object-of-interest as a masked region for all of the N frames; and blurring the masked region by convolving the masked region with Gaussian smoothing kernels of different sizes. 9. A computer implemented method for detecting objects of interest in imagery, the method comprising an act of: causing one or more processors to execute instructions encoded upon a memory, that upon execution of the instructions, the one or more processors perform operations of: receiving an input video; generating an attention map, the attention map representing features found in the input video that represent potential objects-of-interest; generating, in real-time, an eye-fixation map, the eye-fixation map representing features found in the input video that, based on a subject's eye fixations in real-time, are potential objects-of-interest; generating a brain-enhanced synergistic attention map by fusing the attention map with the eye-fixation map, the brain-enhanced synergistic map having a collection of potential objects-of-interest from both the attention map and eye-fixation map; scoring the potential objects-of-interest in the brain-enhanced synergistic attention map; and designating the potential objects-of-interest as final objects-of-interest for scores that cross a predetermined threshold. 10. The computer implemented method as set forth in claim 9 , further comprising an act of causing the one or more processors to perform operations of: generating a masked map that masks the potential objects-of-interest in the attention map; combining the masked map with the input video to generate a masked video having unmasked regions and masked regions, where the masked regions mask the potential objects-of-interest as generated by the attention map; presenting the masked video to a subject; collecting data regarding the subject's eye fixations on the masked video; and generating the eye-fixation map based on the subject's eye fixations. 11. The computer implemented method as set forth in claim 10 , wherein in collecting data regarding the subject's eye fixations on the masked video, a fixation includes the data points, within a temporal window, having an agreement in spatial position that exceeds a threshold. 12. The computer implemented method as set forth in claim 11 , wherein generating an attention map further comprises operations of: receiving a series of consecutive frames representing a scene as provided for in the input video, the frames having at least a current frame and a previous frame; generating a surprise map based on features found in the current frame and the previous frame, the surprise map having a plurality of values corresponding to spatial locations within the scene; and determining a surprise in the scene based on a value in the surprise map exceeding a predetermined threshold, the surprise being a potential object-of-interest in the attention map. 13. The computer implemented method as set forth in claim 12 , wherein combining the masked map with the input video to generate a masked video further comprises an operation of masking each frame independently of each other frame such that there is no temporal continuity of the masking across frames. 14. The computer implemented method as set forth in claim 13 , wherein masking each frame independently further comprises an operation of blacking out the masked regions while maintaining original pixel values in the unmasked regions. 15. The computer implemented method as set forth in claim 13 , wherein masking each frame independently further comprises an operation blurring the masked region by convolving the masked region with a Gaussian smoothing kernel. 16. The computer implemented method as set forth in claim 12 , wherein combining the masked map with the input video to generate a masked video further comprises operations of: determining if a potential obj
of extracted features · CPC title
Surveillance or monitoring of activities, e.g. for recognising suspicious objects (recognising microscopic objects G06V20/69) · CPC title
Determination of region of interest [ROI] or a volume of interest [VOI] · CPC title
of extracted features · CPC title
with interaction between the filter responses, e.g. cortical complex cells · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.