Video streaming method
US-10600153-B2 · Mar 24, 2020 · US
US12411648B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12411648-B2 |
| Application number | US-202217734461-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 2, 2022 |
| Priority date | Oct 12, 2017 |
| Publication date | Sep 9, 2025 |
| Grant date | Sep 9, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
There are disclosed techniques, systems, methods and instructions for a virtual reality, VR, augmented reality, AR, mixed reality, MR, or 360-degree video environment. In one example, the system includes at least one media video decoder configured to decode video signals from video streams for the representation of VR, AR, MR or 360-degree video environment scenes to a user. The system includes at least one audio decoder configured to decode audio signals from at least one audio stream. The system is configured to request at least one audio stream and/or one audio element of an audio stream and/or one adaptation set to a server on the basis of at least the user's current viewport and/or head orientation and/or movement data and/or interaction metadata and/or virtual positional data.
Opening claim text (preview).
The invention claimed is: 1. A system for receiving audio streams to be reproduced, wherein the system comprises at least one audio decoder to decode audio signals from representations of adaptation sets, wherein the system is configured to receive at least user's movement data and in a current audio scene, to request and receive, on a basis of at least the user's movement data: first representations of first adaptation sets; second representations of second adaptation sets; and metadata information of the current audio scene, wherein first audio elements in the first representations of the first adaptation sets belong only to the current audio scene, the first representations of the first adaptation sets being current main audio streams, and wherein the metadata information indicates that second audio elements in the second representations of the second adaptation sets belong to both the current audio scene and a further audio scene, the second representations of the second adaptation sets being current auxiliary audio streams of the current audio scene, wherein the system is further configured to request and receive, in the further audio scene, on a basis of at least the user's movement data: third representations of third adaptation sets, and metadata information of the further audio scene, wherein third audio elements in the third representations of the third adaptation sets belong only to the further audio scene, the third representations of the third adaptation sets being further main audio streams, and wherein the system is further configured, in the further audio scene, to access, as indicated by the metadata information of the further audio scene and on a basis of at least the user's movement data, the second representations of the second adaptation sets already requested and received in the current audio scene, the second representations of the second adaptation sets being further auxiliary audio streams of the further audio scene. 2. The system of claim 1 , wherein ones of the first representations of the first adaptation sets, the second representations of the second adaptation sets, and the third representations of the third adaptation sets are requested and received irrespective of at least the user's movement data, while other ones of the first representations of the first adaptation sets, the second representations of the second adaptation sets, and the third representations of the third adaptation sets are requested and received based on at least the user's movement data. 3. The system of claim 1 , wherein each audio element is associated to a position or area in a virtual environment, so that different representations of the adaptation sets are provided for different user's movement data. 4. The system of claim 1 , configured to predictively decide whether at least one audio element of one of the adaptation sets among the first, the second, and the third adaptation sets will become relevant or audible based on at least the user's movement data, wherein the system is configured to request and to receive the at least one audio element or the one of the adaptation sets at a particular user's movement based on relevance or audibility of the at least one audio element or the one of the adaptation sets, and wherein the system is configured to reproduce the at least one audio element or the one of the adaptation sets, when received, after the particular user's movement or interaction in the current audio scene or the further audio scene. 5. The system of claim 4 , further configured to predictively decide the relevance and/or audibility of the at least one audio element of the adaptation set by using a plurality of predetermined thresholds. 6. The system of claim 1 , further configured to retain access to both the metadata information of the current audio scene and the second representations of the second adaptation sets, before and after a user's interaction, the interaction resulting from movement data in the current audio scene initiating the further audio scene. 7. The system of claim 1 , wherein at the beginning of a transition from the current audio scene to the further audio scene, the system receives and requests: versions of the first representations, at a higher bitrate than versions of the second representations, of the first adaptation sets associated to the current audio scene, and the versions of the second representations of the second adaptation sets associated to both the current audio scene and the further audio scene; and at the end of the transition from the current audio scene to the the further audio scene, the system receives and requests: versions of the third representations, at the higher bitrate, of the third adaptation sets associated to the further audio scene; and while already having available versions of the second representations of the second adaptation sets associated to both the current audio scene and the further audio scene. 8. The system of claim 1 , wherein the system requests and acquires versions of the first representations of the first adaptation sets associated only to the current audio scene and versions of the second representations of the second adaptation sets associated to both the current audio scene and the further audio scene at a higher bitrate than versions of the third representations and the versions of the third representations of the third adaptation sets associated to the further audio scene at a lower bitrate than the versions of the first representations and the versions of the second representations, in case the user's position in the current audio scene is neighboring or adjacent to the further audio scene. 9. The system of claim 1 , wherein a plurality of N audio elements are defined in the current audio scene and provided in at least one representation at a high bitrate and/or quality level, and in case the user's movement data require the reproduction of the plurality of N audio elements, the plurality of N audio elements are processed to acquire a smaller number M of audio elements, wherein the M audio elements are associated to a position or area close to a position or area of the N audio elements and are provided in one or more representations at bitrates and/or quality levels lower than the high bitrate and/or quality level. 10. The system of claim 9 , wherein: in case a user's distance from a position or area of the N audio elements is larger than a predetermined threshold, the one or more representations associated with the M audio elements are obtained; and in case the user's distance from the position or area of the N audio elements is smaller than the predetermined threshold, the at least one representation associated with the N audio elements is obtained. 11. The system of claim 9 , wherein the processing of the N audio elements comprises: addition of at least one of the N audio elements with at least another of the N audio elements to obtain at least one of the M audio elements provided in at least one representation at a bitrate and/or quality level which is lower or higher than each one associated with the N audio elements depending on the relevance and/or audibility of the N audio elements; and/or active downmix of each of the N audio elements to obtain a single M audio element provided in at least one representation at a bitrate and/or quality level which is lower or higher than each one associated with the N audio elements depending on the relevance and/or audibility of the N audio elements. 12. The system of claim 9 , wherein: the processing of the N audio elements comprises rendering of the N audio elements using the old positions of the N audio elements t
Media network packetisation · CPC title
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
Head tracking input arrangements · CPC title
slaved to motion of at least a part of the body of the user, e.g. head, eye · CPC title
Display position adjusting means not related to the information to be displayed · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.