Optimizing audio delivery for virtual reality applications

US12411648B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12411648-B2
Application numberUS-202217734461-A
CountryUS
Kind codeB2
Filing dateMay 2, 2022
Priority dateOct 12, 2017
Publication dateSep 9, 2025
Grant dateSep 9, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There are disclosed techniques, systems, methods and instructions for a virtual reality, VR, augmented reality, AR, mixed reality, MR, or 360-degree video environment. In one example, the system includes at least one media video decoder configured to decode video signals from video streams for the representation of VR, AR, MR or 360-degree video environment scenes to a user. The system includes at least one audio decoder configured to decode audio signals from at least one audio stream. The system is configured to request at least one audio stream and/or one audio element of an audio stream and/or one adaptation set to a server on the basis of at least the user's current viewport and/or head orientation and/or movement data and/or interaction metadata and/or virtual positional data.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system for receiving audio streams to be reproduced, wherein the system comprises at least one audio decoder to decode audio signals from representations of adaptation sets, wherein the system is configured to receive at least user's movement data and in a current audio scene, to request and receive, on a basis of at least the user's movement data: first representations of first adaptation sets; second representations of second adaptation sets; and metadata information of the current audio scene, wherein first audio elements in the first representations of the first adaptation sets belong only to the current audio scene, the first representations of the first adaptation sets being current main audio streams, and wherein the metadata information indicates that second audio elements in the second representations of the second adaptation sets belong to both the current audio scene and a further audio scene, the second representations of the second adaptation sets being current auxiliary audio streams of the current audio scene, wherein the system is further configured to request and receive, in the further audio scene, on a basis of at least the user's movement data: third representations of third adaptation sets, and metadata information of the further audio scene, wherein third audio elements in the third representations of the third adaptation sets belong only to the further audio scene, the third representations of the third adaptation sets being further main audio streams, and wherein the system is further configured, in the further audio scene, to access, as indicated by the metadata information of the further audio scene and on a basis of at least the user's movement data, the second representations of the second adaptation sets already requested and received in the current audio scene, the second representations of the second adaptation sets being further auxiliary audio streams of the further audio scene. 2. The system of claim 1 , wherein ones of the first representations of the first adaptation sets, the second representations of the second adaptation sets, and the third representations of the third adaptation sets are requested and received irrespective of at least the user's movement data, while other ones of the first representations of the first adaptation sets, the second representations of the second adaptation sets, and the third representations of the third adaptation sets are requested and received based on at least the user's movement data. 3. The system of claim 1 , wherein each audio element is associated to a position or area in a virtual environment, so that different representations of the adaptation sets are provided for different user's movement data. 4. The system of claim 1 , configured to predictively decide whether at least one audio element of one of the adaptation sets among the first, the second, and the third adaptation sets will become relevant or audible based on at least the user's movement data, wherein the system is configured to request and to receive the at least one audio element or the one of the adaptation sets at a particular user's movement based on relevance or audibility of the at least one audio element or the one of the adaptation sets, and wherein the system is configured to reproduce the at least one audio element or the one of the adaptation sets, when received, after the particular user's movement or interaction in the current audio scene or the further audio scene. 5. The system of claim 4 , further configured to predictively decide the relevance and/or audibility of the at least one audio element of the adaptation set by using a plurality of predetermined thresholds. 6. The system of claim 1 , further configured to retain access to both the metadata information of the current audio scene and the second representations of the second adaptation sets, before and after a user's interaction, the interaction resulting from movement data in the current audio scene initiating the further audio scene. 7. The system of claim 1 , wherein at the beginning of a transition from the current audio scene to the further audio scene, the system receives and requests: versions of the first representations, at a higher bitrate than versions of the second representations, of the first adaptation sets associated to the current audio scene, and the versions of the second representations of the second adaptation sets associated to both the current audio scene and the further audio scene; and at the end of the transition from the current audio scene to the the further audio scene, the system receives and requests: versions of the third representations, at the higher bitrate, of the third adaptation sets associated to the further audio scene; and while already having available versions of the second representations of the second adaptation sets associated to both the current audio scene and the further audio scene. 8. The system of claim 1 , wherein the system requests and acquires versions of the first representations of the first adaptation sets associated only to the current audio scene and versions of the second representations of the second adaptation sets associated to both the current audio scene and the further audio scene at a higher bitrate than versions of the third representations and the versions of the third representations of the third adaptation sets associated to the further audio scene at a lower bitrate than the versions of the first representations and the versions of the second representations, in case the user's position in the current audio scene is neighboring or adjacent to the further audio scene. 9. The system of claim 1 , wherein a plurality of N audio elements are defined in the current audio scene and provided in at least one representation at a high bitrate and/or quality level, and in case the user's movement data require the reproduction of the plurality of N audio elements, the plurality of N audio elements are processed to acquire a smaller number M of audio elements, wherein the M audio elements are associated to a position or area close to a position or area of the N audio elements and are provided in one or more representations at bitrates and/or quality levels lower than the high bitrate and/or quality level. 10. The system of claim 9 , wherein: in case a user's distance from a position or area of the N audio elements is larger than a predetermined threshold, the one or more representations associated with the M audio elements are obtained; and in case the user's distance from the position or area of the N audio elements is smaller than the predetermined threshold, the at least one representation associated with the N audio elements is obtained. 11. The system of claim 9 , wherein the processing of the N audio elements comprises: addition of at least one of the N audio elements with at least another of the N audio elements to obtain at least one of the M audio elements provided in at least one representation at a bitrate and/or quality level which is lower or higher than each one associated with the N audio elements depending on the relevance and/or audibility of the N audio elements; and/or active downmix of each of the N audio elements to obtain a single M audio element provided in at least one representation at a bitrate and/or quality level which is lower or higher than each one associated with the N audio elements depending on the relevance and/or audibility of the N audio elements. 12. The system of claim 9 , wherein: the processing of the N audio elements comprises rendering of the N audio elements using the old positions of the N audio elements t

Assignees

Inventors

Classifications

  • Media network packetisation · CPC title

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • Head tracking input arrangements · CPC title

  • slaved to motion of at least a part of the body of the user, e.g. head, eye · CPC title

  • Display position adjusting means not related to the information to be displayed · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12411648B2 cover?
There are disclosed techniques, systems, methods and instructions for a virtual reality, VR, augmented reality, AR, mixed reality, MR, or 360-degree video environment. In one example, the system includes at least one media video decoder configured to decode video signals from video streams for the representation of VR, AR, MR or 360-degree video environment scenes to a user. The system in…
Who is the assignee on this patent?
Fraunhofer Ges Forschung
What technology area does this patent fall under?
Primary CPC classification H04N21/439. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Sep 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).