Text-to-3D avatars
US-12340480-B2 · Jun 24, 2025 · US
US12561878B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12561878-B2 |
| Application number | US-202418440889-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 13, 2024 |
| Priority date | Feb 13, 2023 |
| Publication date | Feb 24, 2026 |
| Grant date | Feb 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure provides a text-driven motion recommendation and neural mesh stylization system and a method producing human mesh animation using the same. The system comprises at least one instruction stored in a memory, and a processor that executes the at least one instruction, wherein the at least one instruction, when executed by the processor, causes the processor to find raw action labels matching a query given as a text prompt in a human motion dataset stored in a database, encode the raw action labels and the query for vectorizing the raw action labels and the query, and measure similarity between the raw action labels and the query based on the vectorized vectors.
Opening claim text (preview).
What is claimed is: 1 . A text-driven motion recommendation and neural mesh stylization system comprising: at least one instruction stored in a memory; and a processor that executes the at least one instruction, wherein the at least one instruction, when executed by the processor, causes the processor to: find raw action labels matching a query given as a text prompt in a human motion dataset stored in a database; encode the raw action labels and the query for vectorizing the raw action labels and the query; measure similarity between the raw action labels and the query based on vectorized vectors to obtain content meshes; obtain style attributes comprising color and displacement from a decoupled neural style field (DNSF) network that takes a template human mesh and learn text-driven style attributes; and apply the style attributes to the content meshes to obtain a human mesh sequence in motion. 2 . The system of claim 1 , wherein the at least one instruction, when executed by the processor, further causes the processor to: select a plurality of indices of the raw action labels based on the measured similarity; and retrieve top-k action labels corresponding the plurality of indices by a top-k filter from encoded motion datasets with the raw action labels. 3 . The system of claim 2 , wherein the at least one instruction, when executed by the processor, further causes the processor to: vectorize the query and the top-k action labels; and retrieve a highest-scored raw action label as a final matched result for the input text prompt. 4 . The system of claim 3 , wherein the at least one instruction, when executed by the processor, further causes the processor to: find a best semantically matched motion sequence from a motion database based on the highest-scored raw action label; and sample the content meshes in multi-modal context corresponding to the best semantically matched motion sequence. 5 . The system of claim 1 , wherein the at least one instruction, when executed by the processor, further causes the processor to: map the style attributes from the template human mesh and merge the style attributes mapped from the template human mesh with the content meshes by the DNSF network. 6 . The system of claim 5 , wherein the at least one instruction, when executed by the processor, further causes the processor to: achieve a same mesh stylization as a basic neural style field while decoupling a style from a content mesh. 7 . The system of claim 1 , wherein the at least one instruction, when executed by the processor, further causes the processor to: detailize and texturize the human mesh sequence by optimizing the DNSF network in a temporally-consistent and pose-agnostic manner. 8 . The system of claim 7 , wherein the at least one instruction, when executed by the processor, further causes the processor to: compute a semantic loss between the text prompt and a text obtained by encoding the detailized and texturized human mesh sequence for optimizing the DNSF network. 9 . A method for producing human mesh animation performed by a processor, the method comprising: finding raw action labels matching a query given as a text prompt in a human motion dataset stored in a database; encoding the raw action labels and the query for vectorizing the raw action labels and the query; measuring similarity between the raw action labels and the query based on vectorized vectors to obtain content meshes; obtaining style attributes comprising color and displacement from a decoupled neural style field (DNSF) network that takes a template human mesh and learn text-driven style attributes; and applying the style attributes to the content meshes to obtain a human mesh sequence in motion. 10 . The method of claim 9 , further comprising: selecting a plurality of indices of the raw action labels based on the measured similarity; and retrieving top-k action labels corresponding the plurality of indices by a top-k filter from encoded motion datasets with the raw action labels. 11 . The method of claim 10 , further comprising: vectorizing the query and the top-k action labels; and retrieving a highest-scored raw action label as a final matched result for the text prompt. 12 . The method of claim 11 , further comprising: finding a best semantically matched motion sequence from a motion database based on the highest-scored raw action label; and sampling the content meshes in multi-modal context corresponding to the best semantically matched motion sequence. 13 . The method of claim 9 , further comprising: mapping the style attributes from the template human mesh and merge the style attributes mapped from the template human mesh with the content meshes by the DNSF network. 14 . The method of claim 13 , further comprising: achieving a same mesh stylization as a basic neural style field while decoupling a style from a content mesh. 15 . The method of claim 9 , further comprising: detailizing and texturizing the human mesh sequence by optimizing the DNSF network in a temporally-consistent and pose-agnostic manner. 16 . The method of claim 15 , further comprising: computing a semantic loss between the text prompt and a text obtained by encoding the detailized and texturized human mesh sequence for optimizing the DNSF network.
Texture mapping · CPC title
Style variation · CPC title
Semantic analysis · CPC title
Colour editing, changing, or manipulating; Use of colour codes · CPC title
Finite element generation, e.g. wire-frame surface description, {tesselation} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.