Audiovisual presence transitions in a collaborative reality environment
US-2021350604-A1 · Nov 11, 2021 · US
US12154205B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12154205-B2 |
| Application number | US-202217933624-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 20, 2022 |
| Priority date | Sep 30, 2021 |
| Publication date | Nov 26, 2024 |
| Grant date | Nov 26, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The embodiments relate to a method comprising establishing a three-dimensional conversational interaction with one or more receivers; generating a pointcloud relating to a user and capturing audio from one or more audio source; generating conversational scene description comprising at least a first dynamic object describing a virtual space for the three-dimensional conversational interaction, wherein the first dynamic object refers to one or more objects specific to the three-dimensional conversational interaction, wherein said one or more objects comprises at least data relating to transformable pointcloud; audio obtained from said one or more audio source and input obtained from one or more connected devices controlling at least the pointcloud, wherein said objects are linked to each other for seamless manipulation; applying the conversational scene description into a metadata, and transmitting the metadata with the respective audio in realtime to said one or more receivers.
Opening claim text (preview).
The invention claimed is: 1. An apparatus comprising: at least one processor; at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: establish a three-dimensional conversational interaction with one or more receivers; wherein the three-dimensional conversational interaction is established by indicating animation capability modes; wherein the indicating of the animation capability modes used to establish the three-dimensional conversational interaction is performed during a negotiation session with a server or with the one or more receivers; wherein the animation capability modes comprise animation capability modes of the apparatus, and the apparatus comprises a sender device or a sender device comprises the apparatus; generate a transformable point cloud relating to a user and obtain real or virtual audio from one or more real or virtual audio sources; generate a conversational scene description comprising at least a first dynamic object describing a virtual space for the three-dimensional conversational interaction, wherein the first dynamic object is represented as a container that includes one or more objects specific to the three-dimensional conversational interaction, wherein said one or more objects comprise at least data relating to the transformable point cloud, the real or virtual audio obtained from the one or more real or virtual audio sources, or an input obtained from one or more connected devices controlling at least the point cloud, and wherein said objects are linked to each other for seamless manipulation; apply the conversational scene description into metadata; and transmit the metadata with the respective audio to said one or more receivers. 2. The apparatus according to claim 1 , wherein the point cloud represents a three-dimensional avatar humanoid with or without skeletal key points. 3. The apparatus according to claim 1 , wherein the audio is segmented into audio sources. 4. The apparatus according to claim 1 , wherein objects being referred from the conversational scene description are connected to one another by a same geometrical coordinate system relative to one global origin. 5. The apparatus according to claim 1 , wherein the animation capability modes comprise a skeletal animation or a point cloud animation. 6. The apparatus of claim 1 , wherein the container includes a respective data type of the one or more objects. 7. The apparatus of claim 1 , wherein the apparatus is caused to: classify the real or virtual audio into audio segments based on a respective type of the real or virtual audio; wherein the real or virtual audio is classified into the audio segments using a neural network or machine learning model. 8. The apparatus of claim 1 , wherein the apparatus is caused to: store a sequence of at least one audio segment object representing at least one audio segment of the real or virtual audio as a child object of an audio object used to represent the real or virtual audio. 9. The apparatus of claim 1 , wherein the apparatus is caused to: represent a real or virtual audio source of the one or more real or virtual audio sources as an audio site object, wherein an audio object included within the container that represents the first dynamic object describing the virtual space for the three-dimensional conversational interaction includes a sequence of audio site objects within a container used to represent the audio object, wherein the sequence of audio site objects include the audio site object. 10. The apparatus of claim 1 , wherein the apparatus is caused to: store a sequence of listenpoint objects within a container that represents an audio object that is included within the container that represents the first dynamic object describing the virtual space for the three-dimensional conversational interaction; wherein a listenpoint object of the listenpoint objects represents a point where a subject of the three-dimensional conversational interaction listens to the real or virtual audio emanating from the one or more real or virtual audio sources. 11. The apparatus of claim 1 , wherein the conversational scene description comprises at least a second dynamic object describing a physical space for the three-dimensional conversational interaction, wherein the second dynamic object is represented as another container that includes one or more other objects specific to the three-dimensional conversational interaction. 12. The apparatus of claim 1 , wherein the apparatus is further caused to: store a sequence of at least one audio site object as a child object of an audio segment object, wherein the at least one audio site object defines where to place the real or virtual audio for hearing at a location defined with an audio segment represented with the audio segment object, wherein the audio segment defines a type of the real or virtual audio. 13. An apparatus comprising: at least one processor; at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: establish a three-dimensional conversational interaction by indicating animation capability modes; wherein the three-dimensional conversational interaction is established by indicating animation capability modes; wherein the indicating of the animation capability modes used to establish the three-dimensional conversational interaction is performed during a negotiation session with a server or with a sender; wherein the animation capability modes comprise animation capability modes of the apparatus, and the apparatus comprises a receiver device or a receiver device comprises the apparatus; receive metadata with respective audio from the sender; unpack a conversational scene description from the metadata, the conversational scene description comprising at least a first dynamic object describing a virtual space for the three-dimensional conversational interaction, wherein the first dynamic object is represented as a container that includes one or more objects specific to the three-dimensional conversational interaction, wherein said one or more objects comprise at least data relating to a transformable point cloud, audio obtained from one or more real or virtual audio sources, or input obtained from one or more connected devices controlling at least the point cloud, and wherein said objects are linked to each other for seamless manipulation; compose a conversational scene described with the conversational scene description based on the objects and the respective audio; and render the conversational scene to a display. 14. The apparatus according to claim 13 , wherein the point cloud represents a three-dimensional avatar humanoid with or without skeletal key points. 15. The apparatus according to claim 13 , wherein the audio is segmented into audio sources. 16. The apparatus according to claim 13 , wherein objects being referred from the conversational scene description are connected to one another by a same geometrical coordinate system relative to one global origin. 17. The apparatus according to claim 13 , wherein the animation capability modes comprise a skeletal animation or a point cloud animation. 18. A method, comprising: establishing a three-dimensional conversational interaction with one or more receivers; wherein the three-dimensional conversational interaction is established by indicating animation capability modes; wherein the indicating of the animation capability modes used to establish the three-dimens
Editing of three-dimensional [3D] images, e.g. changing shapes or colours, aligning objects or positioning parts · CPC title
Three-dimensional [3D] modelling for computer graphics · CPC title
Sound input; Sound output (speech processing G10L) · CPC title
specially adapted for multi-view video sequence encoding · CPC title
with scene description coding, e.g. binary format for scenes [BIFS] compression · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.