What technology area does this patent fall under?

Primary CPC classification G06T13/40. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method, an apparatus and a computer program product for video encoding and video decoding

US12154205B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12154205-B2
Application number	US-202217933624-A
Country	US
Kind code	B2
Filing date	Sep 20, 2022
Priority date	Sep 30, 2021
Publication date	Nov 26, 2024
Grant date	Nov 26, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The embodiments relate to a method comprising establishing a three-dimensional conversational interaction with one or more receivers; generating a pointcloud relating to a user and capturing audio from one or more audio source; generating conversational scene description comprising at least a first dynamic object describing a virtual space for the three-dimensional conversational interaction, wherein the first dynamic object refers to one or more objects specific to the three-dimensional conversational interaction, wherein said one or more objects comprises at least data relating to transformable pointcloud; audio obtained from said one or more audio source and input obtained from one or more connected devices controlling at least the pointcloud, wherein said objects are linked to each other for seamless manipulation; applying the conversational scene description into a metadata, and transmitting the metadata with the respective audio in realtime to said one or more receivers.

First claim

Opening claim text (preview).

The invention claimed is: 1. An apparatus comprising: at least one processor; at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: establish a three-dimensional conversational interaction with one or more receivers; wherein the three-dimensional conversational interaction is established by indicating animation capability modes; wherein the indicating of the animation capability modes used to establish the three-dimensional conversational interaction is performed during a negotiation session with a server or with the one or more receivers; wherein the animation capability modes comprise animation capability modes of the apparatus, and the apparatus comprises a sender device or a sender device comprises the apparatus; generate a transformable point cloud relating to a user and obtain real or virtual audio from one or more real or virtual audio sources; generate a conversational scene description comprising at least a first dynamic object describing a virtual space for the three-dimensional conversational interaction, wherein the first dynamic object is represented as a container that includes one or more objects specific to the three-dimensional conversational interaction, wherein said one or more objects comprise at least data relating to the transformable point cloud, the real or virtual audio obtained from the one or more real or virtual audio sources, or an input obtained from one or more connected devices controlling at least the point cloud, and wherein said objects are linked to each other for seamless manipulation; apply the conversational scene description into metadata; and transmit the metadata with the respective audio to said one or more receivers. 2. The apparatus according to claim 1 , wherein the point cloud represents a three-dimensional avatar humanoid with or without skeletal key points. 3. The apparatus according to claim 1 , wherein the audio is segmented into audio sources. 4. The apparatus according to claim 1 , wherein objects being referred from the conversational scene description are connected to one another by a same geometrical coordinate system relative to one global origin. 5. The apparatus according to claim 1 , wherein the animation capability modes comprise a skeletal animation or a point cloud animation. 6. The apparatus of claim 1 , wherein the container includes a respective data type of the one or more objects. 7. The apparatus of claim 1 , wherein the apparatus is caused to: classify the real or virtual audio into audio segments based on a respective type of the real or virtual audio; wherein the real or virtual audio is classified into the audio segments using a neural network or machine learning model. 8. The apparatus of claim 1 , wherein the apparatus is caused to: store a sequence of at least one audio segment object representing at least one audio segment of the real or virtual audio as a child object of an audio object used to represent the real or virtual audio. 9. The apparatus of claim 1 , wherein the apparatus is caused to: represent a real or virtual audio source of the one or more real or virtual audio sources as an audio site object, wherein an audio object included within the container that represents the first dynamic object describing the virtual space for the three-dimensional conversational interaction includes a sequence of audio site objects within a container used to represent the audio object, wherein the sequence of audio site objects include the audio site object. 10. The apparatus of claim 1 , wherein the apparatus is caused to: store a sequence of listenpoint objects within a container that represents an audio object that is included within the container that represents the first dynamic object describing the virtual space for the three-dimensional conversational interaction; wherein a listenpoint object of the listenpoint objects represents a point where a subject of the three-dimensional conversational interaction listens to the real or virtual audio emanating from the one or more real or virtual audio sources. 11. The apparatus of claim 1 , wherein the conversational scene description comprises at least a second dynamic object describing a physical space for the three-dimensional conversational interaction, wherein the second dynamic object is represented as another container that includes one or more other objects specific to the three-dimensional conversational interaction. 12. The apparatus of claim 1 , wherein the apparatus is further caused to: store a sequence of at least one audio site object as a child object of an audio segment object, wherein the at least one audio site object defines where to place the real or virtual audio for hearing at a location defined with an audio segment represented with the audio segment object, wherein the audio segment defines a type of the real or virtual audio. 13. An apparatus comprising: at least one processor; at least one memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to: establish a three-dimensional conversational interaction by indicating animation capability modes; wherein the three-dimensional conversational interaction is established by indicating animation capability modes; wherein the indicating of the animation capability modes used to establish the three-dimensional conversational interaction is performed during a negotiation session with a server or with a sender; wherein the animation capability modes comprise animation capability modes of the apparatus, and the apparatus comprises a receiver device or a receiver device comprises the apparatus; receive metadata with respective audio from the sender; unpack a conversational scene description from the metadata, the conversational scene description comprising at least a first dynamic object describing a virtual space for the three-dimensional conversational interaction, wherein the first dynamic object is represented as a container that includes one or more objects specific to the three-dimensional conversational interaction, wherein said one or more objects comprise at least data relating to a transformable point cloud, audio obtained from one or more real or virtual audio sources, or input obtained from one or more connected devices controlling at least the point cloud, and wherein said objects are linked to each other for seamless manipulation; compose a conversational scene described with the conversational scene description based on the objects and the respective audio; and render the conversational scene to a display. 14. The apparatus according to claim 13 , wherein the point cloud represents a three-dimensional avatar humanoid with or without skeletal key points. 15. The apparatus according to claim 13 , wherein the audio is segmented into audio sources. 16. The apparatus according to claim 13 , wherein objects being referred from the conversational scene description are connected to one another by a same geometrical coordinate system relative to one global origin. 17. The apparatus according to claim 13 , wherein the animation capability modes comprise a skeletal animation or a point cloud animation. 18. A method, comprising: establishing a three-dimensional conversational interaction with one or more receivers; wherein the three-dimensional conversational interaction is established by indicating animation capability modes; wherein the indicating of the animation capability modes used to establish the three-dimens

Assignees

Nokia Technologies Oy

Inventors

Classifications

G06T19/20
Editing of three-dimensional [3D] images, e.g. changing shapes or colours, aligning objects or positioning parts · CPC title
G06T17/00
Three-dimensional [3D] modelling for computer graphics · CPC title
G06F3/16
Sound input; Sound output (speech processing G10L) · CPC title
H04N19/597
specially adapted for multi-view video sequence encoding · CPC title
H04N19/25
with scene description coding, e.g. binary format for scenes [BIFS] compression · CPC title

Patent family

Related publications grouped by family.

View patent family 83598382

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12154205B2 cover?: The embodiments relate to a method comprising establishing a three-dimensional conversational interaction with one or more receivers; generating a pointcloud relating to a user and capturing audio from one or more audio source; generating conversational scene description comprising at least a first dynamic object describing a virtual space for the three-dimensional conversational interaction, w…
Who is the assignee on this patent?: Nokia Technologies Oy
What technology area does this patent fall under?: Primary CPC classification G06T13/40. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Audiovisual presence transitions in a collaborative reality environment

Signaling of Scene Description For Multimedia Conferencing

Scene construction using object-based immersive media

Using gltf2 extensions to support video and audio data

Control of virtual objects based on gesture changes of users

Utilizing totems for augmented or virtual reality systems

Frequently asked questions