Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06V20/41. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Object data generation for remote image processing

US12333807B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12333807-B2
Application number	US-202117328592-A
Country	US
Kind code	B2
Filing date	May 24, 2021
Priority date	May 24, 2021
Publication date	Jun 17, 2025
Grant date	Jun 17, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In a system including a processor and memory, the memory includes instructions that, when executed by the processor, cause the processor to control the system to perform receiving a video stream capturing objects; identifying, based on the received video stream, object areas corresponding to the objects, respectively; tracking the object areas in the received video stream; generating, based on the tracking of the object areas, visual data sets at a plurality of times, wherein each visual data set is generated at a different time and includes visual data representing each object area; determining a priority of each visual data in each visual data set; selecting, based on the determined priority of each visual data, a group of the visual data to be transmitted to a remote system; and transmitting, to the remote system, the selected group of the visual data.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for reducing an amount of data transmitted to a remote system for remote image processing to identify a plurality of objects in a scene, the system comprising: a processor; and a non-transitory computer-readable medium in communication with the processor, the computer-readable medium comprising instructions that, when executed by the processor, cause the processor to control the system to perform functions of: receiving a video stream capturing the scene including the plurality of objects that are independently movable; identifying, within the scene captured in the received video stream, a plurality of object areas respectively corresponding to the plurality of objects, each object area capturing a visual feature of the corresponding object; tracking the plurality of object areas within the scene captured in the received video stream over a time; generating, based on the tracking of the plurality of object areas, a plurality of visual data sets respectively representing visual characteristics of the plurality of object areas in the scene captured, wherein generating the plurality of visual data sets is repeated at a plurality of different times such that the plurality of visual data sets is newly generated at each different time to respectively represent the visual characteristics of the plurality of object areas in the scene captured at each different time; and in response to the plurality of visual data sets being newly generated at each different time, performing functions of: determining a transmission priority of each newly generated visual data set based on at least one of: a confidence value of each visual data set previously transmitted to the remote system via a communication network, the confidence value determined by the remote system and indicating a confidence level of an identity of the object corresponding to the object area represented by each newly generated visual data set; a most recent time that the visual data set representing each object area has been transmitted to the remote system for the remote image processing; and an occurrence of a new object area due to a new object appearing in the scene captured in the received video stream; determining, based on the transmission priority of each newly generated visual data set, whether each newly generated visual data set needs to be included in a subset for transmission to the remote system for the remote image processing, the subset including less than all of the plurality of newly generated visual data sets; and transmitting, to the remote system via the communication network, only the subset for the remote image processing to remotely identify, at the remote system, the object corresponding to each visual data set included in the transmitted subset. 2. The system of claim 1 , wherein: the received video stream is an uncompressed video stream; and the plurality of visual data sets is extracted from the uncompressed video stream. 3. The system of claim 1 , wherein the plurality of objects comprises a plurality of persons, and each object area captures a facial features respectively of each person. 4. The system of claim 1 , wherein the instructions, when executed by the processor, further cause the processor to control the system to perform functions of: determining, based on the received video stream, a position of each object; and transmitting, to the remote system via the communication network, the determined position of each object along with the visual data set corresponding to each object. 5. The system of claim 1 , wherein the instructions, when executed by the processor, further cause the processor to control the system to perform functions of: receiving an audio stream capturing sounds from the plurality of objects; determining, based on the received audio stream, a position of each object; and transmitting, to the remote system via the communication network, the determined position of each object along with the visual data set corresponding to each object. 6. The system of claim 1 , wherein, for determining whether each newly generated visual data set needs to be included in the subset for transmission to the remote system, the instructions, when executed by the processor, further cause the processor to control the system to perform a function of determining whether each newly generated visual data set needs to be included in the subset further based on bandwidth information of the communication network. 7. The system of claim 1 , wherein: the remote system comprises a videoconferencing host server, and the instructions, when executed by the processor, further cause the processor to control the system to perform functions of: compressing the received video stream; and transmitting, to the remote system via the communication network, the compressed video stream along with the subset. 8. The system of claim 1 , wherein, for generating the plurality of visual data sets, the instructions, when executed by the processor, further cause the processor to control the system to perform a function of generating the plurality of visual data sets respectively representing the plurality of object areas in the scene captured at a same time. 9. The system of claim 1 , wherein: the plurality of object areas comprises first and second object areas respectively corresponding to first and second objects, the plurality of visual data sets previously transmitted to the remote system comprises first and second previously transmitted visual data sets respectively corresponding to the first and second object areas, the first previously transmitted visual data set having the confidence value higher than that of the second previously visual data sets, the plurality of newly generated visual data sets comprises first and second newly generated visual data sets respectively corresponding to the first and second object areas, and for determining the transmission priority of each newly generated visual data set, the instructions, when executed by the processor, further cause the processor to control the system to perform a function of setting the transmission priority of the first newly generated visual data to be lower than that of the second newly generated object area. 10. The system of claim 1 , wherein: the plurality of object areas comprises first and second object areas respectively corresponding to first and second objects, the plurality of visual data sets previously transmitted to the remote system comprises first and second previously transmitted visual data sets respectively corresponding to the first and second object areas, the first previously transmitted visual data set being transmitted prior to transmitting the second previously visual data sets, the plurality of newly generated visual data sets comprises first and second newly generated visual data sets respectively corresponding to the first and second object areas, and for determining the transmission priority of each newly generated visual data set, the instructions, when executed by the processor, further cause the processor to control the system to perform a function of setting the transmission priority of the first newly generated visual data to be higher than that of the second newly generated object area. 11. A method of reducing an amount of data transmitted to a remote system for remote image processing to identify a plurality of objects in a scene, the method comprising: receiving a video stream capturing the scene including the plurality of objects that are independently movable; identifying, within the scene captured in the received video stream, a plurality of object areas respectively corres

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06V40/161
Detection; Localisation; Normalisation · CPC title
G06V10/22
by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition · CPC title
G10L25/57
for processing of video signals · CPC title
G06T2207/10016
Video; Image sequence · CPC title
G06T7/70
Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title

Patent family

Related publications grouped by family.

View patent family 84103958

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12333807B2 cover?: In a system including a processor and memory, the memory includes instructions that, when executed by the processor, cause the processor to control the system to perform receiving a video stream capturing objects; identifying, based on the received video stream, object areas corresponding to the objects, respectively; tracking the object areas in the received video stream; generating, based on …
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06V20/41. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).