Audio and video management for extended reality video conferencing

US11847825B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11847825-B2
Application numberUS-202217685816-A
CountryUS
Kind codeB2
Filing dateMar 3, 2022
Priority dateMar 4, 2021
Publication dateDec 19, 2023
Grant dateDec 19, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Some embodiments of the present inventive concept provide for improved telepresence and other virtual sessions using localized projection of audible noises and/or dynamic adjustment of audio and/or video qualities based on spatial relationships between users. An XR telepresence platform can allow for immersive multi-user video conferencing from within a web browser or other medium. The platform can support spatial audio and/or user video. The platform can scale to hundreds or thousands of users concurrently in a single or multiple virtual environments. Disclosed herein are quality-of-service techniques for dynamically selecting or modifying audio and/or video traffic.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of dynamically managing 2D video and audio streams in a 3D extended reality (XR) environment during a telepresence session, the telepresence session allowing real-time audiovisual interactions, the method comprising: determining spatial proximities of a collection of virtual avatars in relation to a first virtual avatar within the 3D XR environment; identifying a set of virtual avatars of the collection of virtual avatars that are within in a field-of-view of the first virtual avatar; dynamically generating a composite audio stream for the first virtual avatar, wherein the composite audio stream comprises a plurality of audio sources, wherein each audio source corresponds to a different virtual avatar of the collection of virtual avatars, and wherein said dynamically generating the composite audio stream comprises adjusting volume levels of the plurality of audio sources based on respective spatial proximities to the first virtual avatar; dynamically generating a composite 3D video stream of the 3D XR environment from a perspective of the first virtual avatar, wherein the composite 3D video stream comprises a 2D virtual representation of each virtual avatars of the set of virtual avatar in the field-of-view of the first virtual avatar, and wherein said dynamically generating the composite 3D video stream comprises varying a video quality of the 2D virtual representations based on respective spatial proximities to the first virtual avatar; and communicating the composite audio stream and composite 3D video stream to a first client connection during the telepresence session, wherein the first client connection is associated with the first virtual avatar, wherein an audio device produces audio associated with the composite audio stream, and wherein a display device displays a video image associated with the composite 3D video stream. 2. The method of claim 1 , wherein said dynamically generating the composite audio stream comprises associating a higher audio volume with a second virtual avatar than a third virtual avatar based at least in part on a determination that the first virtual avatar is closer to the second virtual avatar than the third virtual avatar. 3. The method of claim 1 , wherein said dynamically generating the composite audio stream comprises associating a lower audio volume with a second virtual avatar than a third virtual avatar based at least in part on a determination that the first virtual avatar is further from the second virtual avatar than the third virtual avatar. 4. The method of claim 1 , wherein said dynamically generating the composite audio stream comprises independently varying the plurality of audio sources based on a distance between the first virtual avatar and a respective virtual avatar of the collection of virtual avatars. 5. The method of claim 1 , wherein said dynamically generating the composite 3D video stream comprises associating a higher resolution video with a second virtual avatar than a third virtual avatar based at least in part on a determination that the first virtual avatar is closer to the second virtual avatar than the third virtual avatar. 6. The method of claim 1 , wherein said dynamically generating the composite 3D video stream comprises associating a lower resolution video with a second virtual avatar than a third virtual avatar based at least in part on a determination that the first virtual avatar is closer to the third virtual avatar than the second virtual avatar. 7. The method of claim 1 , wherein said dynamically generating the composite 3D video stream comprises independently varying a video quality associated with a particular virtual avatar based on a distance between the first virtual avatar and the particular virtual avatar. 8. The method of claim 1 , wherein said dynamically generating the composite 3D video stream comprises at least one of discarding or ignoring video data associated with the 3D XR environment that is not part of the field-of-view. 9. The method of claim 8 , wherein varying the video quality comprises varying at least in one of a bitrate or a resolution. 10. The method of claim 1 , wherein the composite 3D video stream only includes portions of the field-of-view, wherein a video quality of a particular virtual avatar in the field-of-view improves as a distance between the first virtual avatar and the particular virtual avatar decreases. 11. The method of claim 10 , wherein the composite audio stream includes audio corresponding to portions outside of the field-of-view, and wherein a particular audio volume associated with a particular virtual avatar of a plurality of other virtual avatars increases as a distance between the first virtual avatar and the particular virtual avatar decreases. 12. The method of claim 1 , wherein said determining the spatial proximities comprises determining a distance, in the 3D XR environment, between the first virtual avatar and at least one other virtual avatar of the collection of virtual avatars. 13. The method of claim 1 , further comprising determining at least one conversation cluster based at least in part on the spatial proximities, wherein each conversation cluster of the at least one conversation cluster comprises a group of virtual avatars including the first virtual avatar, wherein virtual avatars associated with client connections part of the same conversation cluster are enabled to interact with each other. 14. The method of claim 13 , wherein virtual avatars that are not associated with client connections part of the same conversation cluster are not enabled to interact with each other. 15. The method of claim 1 , wherein the first virtual avatar is a virtual human avatar corresponding to a user wearing an extended reality head-mounted display. 16. The method of claim 15 , further comprising: receiving at least one of head pose or body pose data of the user; wherein said identifying the set of virtual avatars is based at least in part on the at least one of head pose or body pose data. 17. A computing system of a telepresence management system, the computing system comprising: memory; and one or more processors coupled to the memory and configured to: determine spatial proximities between a collection of virtual avatars in relation to a first virtual avatar within an immersive 3D extended reality (XR) environment allowing real-time audiovisual interactions between two or more virtual avatars of the collection of virtual avatars; identify a set of virtual avatars of the collection of virtual avatars that are within in a field-of-view of the first virtual avatar; dynamically generate a composite audio stream for the first virtual avatar by adjusting volume levels of a plurality of audio sources based on respective spatial proximities to the first virtual avatar, wherein the composite audio stream comprises the plurality of audio sources, and wherein each audio source corresponds to a different virtual avatar of the collection of virtual avatars; dynamically generate a composite 3D video stream of the 3D XR environment from a perspective of the first virtual avatar by varying a video quality of 2D virtual representations of each virtual avatar of the set of virtual avatars based on respective spatial proximities to the first virtual avatar, and wherein the composite 3D video stream comprises the 2D virtual representations of each virtual avatar of the set of virtual avatars; and communicate the composite audio stream and composite 3D video stream to a first client connection, wherein the first client connection is associa

Assignees

Inventors

Classifications

  • G06V20/20Primary

    in augmented reality scenes · CPC title

  • Geometric effects · CPC title

  • Mixed reality (object pose determination, tracking or camera calibration for mixed reality G06T7/00) · CPC title

  • Network arrangements for conference optimisation or adaptation · CPC title

  • Arrangements for multi-party communication, e.g. for conferences (data switching systems for conference H04L12/18; arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities H04M3/56; television conferencing systems H04N7/15) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11847825B2 cover?
Some embodiments of the present inventive concept provide for improved telepresence and other virtual sessions using localized projection of audible noises and/or dynamic adjustment of audio and/or video qualities based on spatial relationships between users. An XR telepresence platform can allow for immersive multi-user video conferencing from within a web browser or other medium. The platform…
Who is the assignee on this patent?
Univ Carnegie Mellon
What technology area does this patent fall under?
Primary CPC classification G06V20/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).