Mixing audio based on a pose of a user

US11395089B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11395089-B2
Application numberUS-201915733512-A
CountryUS
Kind codeB2
Filing dateMay 6, 2019
Priority dateMay 8, 2018
Publication dateJul 19, 2022
Grant dateJul 19, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system, apparatus, and method are disclosed for utilizing a sensed pose of a user to dynamically control the mixing of audio tracks to provide a user with a more realistic, informative, and/or immersive audio experience with a virtual environment, such as a video.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving media content containing unmixed audio tracks, the unmixed audio tracks including a first audio track of a first object in a virtual environment and a second audio track of a second object in the virtual environment; receiving information from at least one sensor corresponding to a user and a display; determining a pose of the user based on the information from the at least one sensor, the pose including a relative distance between the user and the display; determining a first position of the first object in the virtual environment and a second position of the second object int the virtual environment; adjusting an audio mixer based on (i) the relative distance between the user and the display and (ii) the first position of the first object and the second position of the second object in the virtual environment, so that a difference between amplitudes of the first audio track and the second audio track is larger for a shorter relative distance between the user and the display and is smaller for a larger relative distance between the user and the display; applying the unmixed audio tracks to the adjusted audio mixer to create mixed audio for the media content; and presenting the media content to the user, the media content including the mixed audio. 2. The method according to claim 1 , wherein the media content is a video. 3. The method according to claim 1 , wherein the audio mixer includes an audio channel for each unmixed audio track, each audio channel of the mixer adjustable to control at least one characteristic of the applied unmixed audio track. 4. The method according to claim 3 , wherein the at least one characteristic includes a volume or a spectral profile of the applied audio track. 5. The method according to claim 1 , wherein the information from at the least one sensor includes at least one image of the user. 6. The method according to claim 1 , wherein the pose of the user further includes a relative orientation between the user and the display. 7. The method according to claim 1 , wherein the pose of the user includes an expression or a movement of the user. 8. The method according to claim 1 , further comprising: repeating the determining, the adjusting, the applying, and the presenting so that the mixed audio of the media content responds to changes in the pose of the user as the media content is played. 9. A system for mixing audio, comprising: a display configured to display media content including a first object and a second object in a virtual environment, the first object having a first audio track and the second object having a second audio track; at least one sensor configured to receive information corresponding to a user and the display; and a processor that is communicatively coupled to the at least one sensor, the processor configured to: receive unmixed audio tracks associated with the media content, the unmixed audio tracks including the first audio track and the second audio track, determine a pose from the information corresponding to the user, the pose including a relative distance between the user and the display, determine a first position of the first object in the virtual environment and a second position of the second object in the virtual environment, adjust an audio mixer based on the relative distance between the user and the display and the first position of the first object and the second position of the second object in the virtual environment so that a difference between amplitudes of the first audio track and the second audio track is larger for a shorter relative distance between the user and the display and is smaller for a larger relative distance between the user and the display, and apply the unmixed audio tracks to the adjusted audio mixer to create mixed audio for the media content. 10. The system according to claim 9 , wherein the processor is further configured to transmit the media content with the mixed audio to the display and a sound device of the system. 11. The system according to claim 10 , wherein the sound device is a headset. 12. The system according to claim 10 , wherein the pose of the user further includes a relative orientation between the user and the display. 13. The system according to claim 9 , wherein the at least one sensor includes a camera of a mobile device. 14. The system according to claim 9 , wherein the at least one sensor includes a camera of a home security system or a camera of a smart home system. 15. The system according to claim 9 , wherein the at least one sensor includes a camera of smart glasses worn by the user. 16. The system according to claim 9 , wherein the at least one sensor includes a depth sensor. 17. A computing device comprising: an audio interface coupled to a sound system; a display configured to display media content including a first object and a second object in a virtual environment, the first object having a first audio track and the second object having a second audio track; a camera configured to capture at least one image of a user; and a processor that is communicatively coupled to the audio interface, the display, and the camera, the processor configured to: receive unmixed audio tracks associated with the media content, the unmixed audio tracks including the first audio track and the second audio track, determine a pose of the user from the at least one image of the user, the pose including a relative distance between the user and the display, determine a first position of the first object in the virtual environment and a second position of the second object in the virtual environment, adjust an audio mixer based on the relative distance between the user and the display and the first position of the first object and the second position of the second object in the virtual environment so that a difference between amplitudes of the first audio track and the second audio track is larger for a shorter relative distance between the user and the display and is smaller for a larger relative distance between the user and the display, apply the unmixed audio tracks to the adjusted audio mixer to create mixed audio for the media content, and transmit the media content to the display and the mixed audio to the sound system. 18. The computing device according to claim 17 , wherein to determine a pose of a user from the at least one image of the user includes determining a position of a gaze of the user with respect to a position of an object within the media content.

Assignees

Inventors

Classifications

  • Aspects of volume control, not necessarily automatic, in stereophonic sound systems · CPC title

  • H04S7/304Primary

    For headphones · CPC title

  • involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream (arrangements characterised by components specially adapted for monitoring, identification or recognition of video in broadcast systems H04H60/59) · CPC title

  • involving reformatting operations of audio signals (details of audio signal transcoding G10L19/173) · CPC title

  • Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV programme (methods or arrangements for recognising human body or animal bodies or body parts G06V40/10; methods or arrangements for acquiring or recognising human faces, facial parts, facial sketches, facial expressions G06V40/16; methods or arrangements for recognising movements or behaviour G06V40/20; arrangements for identifying users in broadcast systems H04H60/45) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11395089B2 cover?
A system, apparatus, and method are disclosed for utilizing a sensed pose of a user to dynamically control the mixing of audio tracks to provide a user with a more realistic, informative, and/or immersive audio experience with a virtual environment, such as a video.
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification H04S7/304. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jul 19 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).