Machine learning techniques for enhancing video conferencing applications
US-2022237735-A1 · Jul 28, 2022 · US
US12333807B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12333807-B2 |
| Application number | US-202117328592-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 24, 2021 |
| Priority date | May 24, 2021 |
| Publication date | Jun 17, 2025 |
| Grant date | Jun 17, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In a system including a processor and memory, the memory includes instructions that, when executed by the processor, cause the processor to control the system to perform receiving a video stream capturing objects; identifying, based on the received video stream, object areas corresponding to the objects, respectively; tracking the object areas in the received video stream; generating, based on the tracking of the object areas, visual data sets at a plurality of times, wherein each visual data set is generated at a different time and includes visual data representing each object area; determining a priority of each visual data in each visual data set; selecting, based on the determined priority of each visual data, a group of the visual data to be transmitted to a remote system; and transmitting, to the remote system, the selected group of the visual data.
Opening claim text (preview).
What is claimed is: 1. A system for reducing an amount of data transmitted to a remote system for remote image processing to identify a plurality of objects in a scene, the system comprising: a processor; and a non-transitory computer-readable medium in communication with the processor, the computer-readable medium comprising instructions that, when executed by the processor, cause the processor to control the system to perform functions of: receiving a video stream capturing the scene including the plurality of objects that are independently movable; identifying, within the scene captured in the received video stream, a plurality of object areas respectively corresponding to the plurality of objects, each object area capturing a visual feature of the corresponding object; tracking the plurality of object areas within the scene captured in the received video stream over a time; generating, based on the tracking of the plurality of object areas, a plurality of visual data sets respectively representing visual characteristics of the plurality of object areas in the scene captured, wherein generating the plurality of visual data sets is repeated at a plurality of different times such that the plurality of visual data sets is newly generated at each different time to respectively represent the visual characteristics of the plurality of object areas in the scene captured at each different time; and in response to the plurality of visual data sets being newly generated at each different time, performing functions of: determining a transmission priority of each newly generated visual data set based on at least one of: a confidence value of each visual data set previously transmitted to the remote system via a communication network, the confidence value determined by the remote system and indicating a confidence level of an identity of the object corresponding to the object area represented by each newly generated visual data set; a most recent time that the visual data set representing each object area has been transmitted to the remote system for the remote image processing; and an occurrence of a new object area due to a new object appearing in the scene captured in the received video stream; determining, based on the transmission priority of each newly generated visual data set, whether each newly generated visual data set needs to be included in a subset for transmission to the remote system for the remote image processing, the subset including less than all of the plurality of newly generated visual data sets; and transmitting, to the remote system via the communication network, only the subset for the remote image processing to remotely identify, at the remote system, the object corresponding to each visual data set included in the transmitted subset. 2. The system of claim 1 , wherein: the received video stream is an uncompressed video stream; and the plurality of visual data sets is extracted from the uncompressed video stream. 3. The system of claim 1 , wherein the plurality of objects comprises a plurality of persons, and each object area captures a facial features respectively of each person. 4. The system of claim 1 , wherein the instructions, when executed by the processor, further cause the processor to control the system to perform functions of: determining, based on the received video stream, a position of each object; and transmitting, to the remote system via the communication network, the determined position of each object along with the visual data set corresponding to each object. 5. The system of claim 1 , wherein the instructions, when executed by the processor, further cause the processor to control the system to perform functions of: receiving an audio stream capturing sounds from the plurality of objects; determining, based on the received audio stream, a position of each object; and transmitting, to the remote system via the communication network, the determined position of each object along with the visual data set corresponding to each object. 6. The system of claim 1 , wherein, for determining whether each newly generated visual data set needs to be included in the subset for transmission to the remote system, the instructions, when executed by the processor, further cause the processor to control the system to perform a function of determining whether each newly generated visual data set needs to be included in the subset further based on bandwidth information of the communication network. 7. The system of claim 1 , wherein: the remote system comprises a videoconferencing host server, and the instructions, when executed by the processor, further cause the processor to control the system to perform functions of: compressing the received video stream; and transmitting, to the remote system via the communication network, the compressed video stream along with the subset. 8. The system of claim 1 , wherein, for generating the plurality of visual data sets, the instructions, when executed by the processor, further cause the processor to control the system to perform a function of generating the plurality of visual data sets respectively representing the plurality of object areas in the scene captured at a same time. 9. The system of claim 1 , wherein: the plurality of object areas comprises first and second object areas respectively corresponding to first and second objects, the plurality of visual data sets previously transmitted to the remote system comprises first and second previously transmitted visual data sets respectively corresponding to the first and second object areas, the first previously transmitted visual data set having the confidence value higher than that of the second previously visual data sets, the plurality of newly generated visual data sets comprises first and second newly generated visual data sets respectively corresponding to the first and second object areas, and for determining the transmission priority of each newly generated visual data set, the instructions, when executed by the processor, further cause the processor to control the system to perform a function of setting the transmission priority of the first newly generated visual data to be lower than that of the second newly generated object area. 10. The system of claim 1 , wherein: the plurality of object areas comprises first and second object areas respectively corresponding to first and second objects, the plurality of visual data sets previously transmitted to the remote system comprises first and second previously transmitted visual data sets respectively corresponding to the first and second object areas, the first previously transmitted visual data set being transmitted prior to transmitting the second previously visual data sets, the plurality of newly generated visual data sets comprises first and second newly generated visual data sets respectively corresponding to the first and second object areas, and for determining the transmission priority of each newly generated visual data set, the instructions, when executed by the processor, further cause the processor to control the system to perform a function of setting the transmission priority of the first newly generated visual data to be higher than that of the second newly generated object area. 11. A method of reducing an amount of data transmitted to a remote system for remote image processing to identify a plurality of objects in a scene, the method comprising: receiving a video stream capturing the scene including the plurality of objects that are independently movable; identifying, within the scene captured in the received video stream, a plurality of object areas respectively corres
Detection; Localisation; Normalisation · CPC title
by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition · CPC title
for processing of video signals · CPC title
Video; Image sequence · CPC title
Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.