Focusing regions of interest using dynamic object detection for textual information retrieval

US11657627B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11657627-B2
Application numberUS-202117365179-A
CountryUS
Kind codeB2
Filing dateJul 1, 2021
Priority dateAug 1, 2019
Publication dateMay 23, 2023
Grant dateMay 23, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various examples, frames of a video may include a first visual object that may appear relative to a second visual object within a region of the frames. Once a relationship between the first visual object and the region is known, one or more operations may be performed on the relative region. For example, optical character recognition may be performed on the relative region where the relative region is known to contain textual information. As a result, the identification of the first visual object may serve as an anchor for determining the location of the relative region including the second visual object—thereby increasing accuracy and efficiency of the system while reducing run-time.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: processing circuitry to: analyze a frame to determine a respective location of a dynamic object within the frame; determine, based at least in part on a spatial relationship between the dynamic object and a related object region, a location of the related object region within the frame; determine, based at least in part on the location, textual information within the related object region; and perform, based at least in part on the textual information, one or more operations to augment an output of an application corresponding to the frame. 2. The processor of claim 1 , wherein the frame is one of a plurality of frames of a video, and the dynamic object is located at a different location from the respective location in at least one other frame of the plurality of frames. 3. The processor of claim 1 , wherein an appearance of the dynamic object is fixed, and an appearance of one or more related objects within the related object region is dynamic. 4. The processor of claim 1 , wherein the spatial relationship is determined based at least in part on analyzing a plurality of frames that include the dynamic object and one or more related objects positioned relative to the dynamic object. 5. The processor of claim 1 , wherein the determination of the location of the related object region includes determining, in image space, one or more pixel locations corresponding to the related object region based at least in part on one or more pixel locations corresponding to the respective location of the dynamic object. 6. The processor of claim 1 , wherein the analysis of the frame includes executing at least one of a computer vision algorithm, an object detection algorithm, or a neural network to identify the dynamic object in the frame. 7. The processor of claim 1 , wherein the determination of the textual information includes executing a character recognition operation within the related object region. 8. The processor of claim 1 , wherein the textual information within the related object region associated with the frame is different from other textual information associated with another frame of a video that includes the frame. 9. The processor of claim 1 , wherein the application corresponds to an instance of a cloud streaming application. 10. The processor of claim 1 , wherein the one or more operations includes at least one of generating a snapshot, generating a highlight, generating a recording, or updating an achievement or award. 11. A system comprising: one or more processing units; and one or more memory devices storing instructions that, when executed using the one or more processing units, cause the one or more processing units to execute: determining a respective location of a dynamic object within a current frame; determining, based at least in part on a known relationship between the dynamic object and a textual region, a location of the textual region within the current frame; identifying textual information within the textual region based at least in part on the location; and performing, based at least in part on the textual information, one or more operations to augment an output of an application corresponding to the current frame. 12. The system of claim 11 , wherein the known relationship is determined based at least in part on applying a plurality of frames to a machine learning model trained to detect a presence of objects and textual regions relative to the objects. 13. The system of claim 11 , wherein the known relationship is defined by a pixel distance between the dynamic object and the textual region. 14. The system of claim 13 , wherein the textual region is defined by at least one of an anchor pixel and a pixel dimension or pixel locations of one or more vertices of the textual region. 15. The system of claim 11 , wherein an appearance of the dynamic object is fixed and respective textual information corresponding to at least one other frame is different from the textual information corresponding to the current frame. 16. The system of claim 11 , wherein the current frame is comprised in a stream or a recording of one or more cloud gaming sessions, and the textual information includes information corresponding to one or more instances of a game from the one or more cloud gaming sessions. 17. The system of claim 11 , wherein the identifying the textual information includes executing an optical character recognition (OCR) algorithm within the textual region. 18. A method comprising: determining a location of a dynamic object within a frame; identifying an associated location of a textual region within the frame relative to the location of the dynamic object based at least in part on a spatial relationship between the dynamic object and the textual region; determining, based at least in part on the associated location, textual information from within the textual region; and performing, based at least in part on the textual information, one or more operations to augment an output of an application corresponding to the frame. 19. The method of claim 18 , wherein the frame is one of a plurality of frames of a stream, and the dynamic object is located at a different location from the location in at least one other frame of the plurality of frames. 20. The method of claim 18 , wherein an appearance of the dynamic object is fixed across two or more frames, and the textual information is different from respective textual information corresponding to at least one other frame.

Assignees

Inventors

Classifications

  • Character recognition · CPC title

  • Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream · CPC title

  • by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition · CPC title

  • Processing of additional data, e.g. scrambling of additional data or processing content descriptors · CPC title

  • involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream (arrangements characterised by components specially adapted for monitoring, identification or recognition of video in broadcast systems H04H60/59) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11657627B2 cover?
In various examples, frames of a video may include a first visual object that may appear relative to a second visual object within a region of the frames. Once a relationship between the first visual object and the region is known, one or more operations may be performed on the relative region. For example, optical character recognition may be performed on the relative region where the relative…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06V20/62. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 23 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).