Remotely adjusting audio capture during video conferences

US11729354B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11729354-B2
Application numberUS-202117514818-A
CountryUS
Kind codeB2
Filing dateOct 29, 2021
Priority dateOct 29, 2021
Publication dateAug 15, 2023
Grant dateAug 15, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One example method includes joining, by a first client device, a videoconferencing meeting hosted by a video conference provider, the videoconference meeting including a plurality of participants; providing an audio stream and a video stream to a video conference provider; receiving, from a second client device, an audio focus area associated with a video stream provided the first client device; determining, based on the audio focus area, a bounding region within an environment shown in the video stream; directing a microphone array to capture audio from the bounding region; and providing the captured audio as an audio stream to the video conference provider.

First claim

Opening claim text (preview).

That which is claimed is: 1. A method comprising: joining, by a first client device, a videoconferencing meeting hosted by a video conference provider, the videoconference meeting including a plurality of participants; providing, by the first client device, an audio stream and a video stream to the video conference provider; receiving, from a second client device, an audio focus area associated with the video stream provided by the first client device; determining, by the first client device based on the audio focus area, a bounding region within an environment shown in the video stream; directing, by the first client device, a microphone array to capture audio from the bounding region; and providing, by the first client device, the captured audio as an audio stream to the video conference provider. 2. The method of claim 1 , wherein the audio focus area identifies a portion of a video frame received from the first client device. 3. The method of claim 1 , wherein the audio focus area identifies a person in a video frame received from the first client device. 4. The method of claim 1 , further comprising: determining an audio focus zone within the bounding region; and wherein directing the microphone array comprises directing the microphone array to capture audio from the audio focus zone. 5. The method of claim 1 , wherein determining the bounding region within the environment is based on dimensions of a room and a location and orientation of a camera providing the video stream. 6. The method of claim 1 , wherein directing the microphone array comprises changing a position or orientation of the microphone array or one or more microphones in the microphone array. 7. The method of claim 1 , wherein directing the microphone array comprises changing one or more beamforming parameters of the microphone array. 8. The method of claim 1 , wherein the microphone array is a first microphone array, and further comprising: receiving, from a third client device, a second audio focus area associated with the video stream provided the first client device; determining, based on the second audio focus area, a second bounding region within the environment shown in the video stream; directing a second microphone array to capture second audio from the second bounding region; and providing the captured second audio as a second audio stream to the video conference provider. 9. A client device comprising: a communications interface; a non-transitory computer-readable medium; and one or more processors communicatively coupled to the communications interface and the non-transitory computer-readable medium, the one or more processors configured to execute processor-executable instructions stored in the non-transitory computer-readable medium to cause the one or more processors to: join a videoconferencing meeting hosted by a video conference provider, the videoconference meeting including a plurality of participants; provide an audio stream and a video stream to a video conference provider; receive, from a client device, an audio focus area associated with a video stream provided the client device; determine, based on the audio focus area, a bounding region within an environment shown in the video stream; direct a microphone array to capture audio from the bounding region; and provide the captured audio as an audio stream to the video conference provider. 10. The client device of claim 9 , wherein the audio focus area identifies a portion of a video frame provided the client device. 11. The client device of claim 9 , wherein the audio focus area identifies a previously received audio focus area. 12. The client device of claim 9 , wherein the one or more processors are configured to execute further processor-executable instructions stored in the non-transitory computer-readable medium to cause the one or more processors to: determine an existing bounding region similar to the bounding region, and, determine the existing bounding region as the bounding region. 13. The client device of claim 9 , wherein the one or more processors are configured to execute further processor-executable instructions stored in the non-transitory computer-readable medium to cause the one or more processors to change one or more beamforming parameters of the microphone array. 14. The client device of claim 9 , wherein the microphone array is a first microphone array, and wherein the one or more processors are configured to execute further processor-executable instructions stored in the non-transitory computer-readable medium to cause the one or more processors to: receive, from a second client device, a second audio focus area associated with the video stream provided the client device; determining, based on the second audio focus area, a second bounding region within the environment shown in the video stream; directing a second microphone array to capture second audio from the second bounding region; and providing the captured second audio as a second audio stream to the video conference provider. 15. A non-transitory computer-readable medium comprising processor-executable instructions configured to cause one or more processors to: join, by a client device, a videoconferencing meeting hosted by a video conference provider, the videoconference meeting including a plurality of participants; provide an audio stream and a video stream to a video conference provider; receive, from a second client device, an audio focus area associated with a video stream provided the client device; determine, based on the audio focus area, a bounding region within an environment shown in the video stream; direct a microphone array to capture audio from the bounding region; and provide the captured audio as an audio stream to the video conference provider. 16. The non-transitory computer-readable medium of claim 15 , wherein the audio focus area identifies a plurality of portions of a video frame provided by the client device. 17. The non-transitory computer-readable medium of claim 15 , further comprising processor-executable instructions configured to cause the one or more processors to determining the region within the environment based on dimensions of a room and a location and orientation of a camera providing the video stream. 18. The non-transitory computer-readable medium of claim 15 , further comprising processor-executable instructions configured to cause the one or more processors to: determine an existing bounding region similar to the bounding region, and, determine the existing bounding region as the bounding region. 19. The non-transitory computer-readable medium of claim 15 , further comprising processor-executable instructions configured to cause the one or more processors to change one or more beamforming parameters of the microphone array. 20. The non-transitory computer-readable medium of claim 15 , wherein the microphone array is a first microphone array, and further comprising processor-executable instructions configured to cause the one or more processors to: receive, from a third client device, a second audio focus area associated with the video stream provided the client device; determining, based on the second audio focus area, a second bounding region within the environment shown in the video stream; directing a second microphone array to capture second audio from the second bounding region; and providing the captured second audio as a second audio stream to the video conference provider.

Assignees

Inventors

Classifications

  • H04N7/15Primary

    Conference systems · CPC title

  • based on user input or interaction · CPC title

  • Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title

  • H04R1/406Primary

    microphones · CPC title

  • for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11729354B2 cover?
One example method includes joining, by a first client device, a videoconferencing meeting hosted by a video conference provider, the videoconference meeting including a plurality of participants; providing an audio stream and a video stream to a video conference provider; receiving, from a second client device, an audio focus area associated with a video stream provided the first client device…
Who is the assignee on this patent?
Zoom Video Communications Inc
What technology area does this patent fall under?
Primary CPC classification H04N7/15. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Aug 15 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).