Who is the assignee on this patent?

Beijing Baidu Netcom Sci & Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification H04N21/2187. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Dec 30 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Video rendering method for live broadcast scene, electronic device and storage medium

US12513348B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12513348-B2
Application number	US-202418747082-A
Country	US
Kind code	B2
Filing date	Jun 18, 2024
Priority date	Dec 12, 2023
Publication date	Dec 30, 2025
Grant date	Dec 30, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided is a video rendering method for a live broadcast scene, relating to the field of live broadcast and the field of large model. The method includes: recording a live broadcast of an anchor to obtain a first video stream; performing speech recognition on live speech in the first video stream to obtain first text information; determining topic popularity of the live broadcast based on audience response information in a process of recording the live broadcast and the first text information; determining corresponding reply text information based on the first text information when the topic popularity of the live broadcast meets a first set condition; rendering virtual characters based on the reply text information to obtain a second video stream; and generating a third video stream of the anchor chatting with the virtual characters based on the first video stream and the second video stream.

First claim

Opening claim text (preview).

What is claimed is: 1 . A video rendering method for a live broadcast scene, comprising: recording a live broadcast of an anchor to obtain a first video stream; performing speech recognition on live speech in the first video stream to obtain first text information; determining topic popularity of the live broadcast based on audience response information in a process of recording the live broadcast and the first text information; determining corresponding reply text information based on the first text information when the topic popularity of the live broadcast meets a first set condition; rendering virtual characters based on the reply text information to obtain a second video stream; and generating a third video stream of the anchor chatting with the virtual characters based on the first video stream and the second video stream. 2 . The method of claim 1 , wherein the determining topic popularity of the live broadcast based on audience response information in a process of recording the live broadcast and the first text information, comprises: extracting at least one first text segment from the first text information; for each first text segment, searching the audience response information for a second text segment that can form a key-value pair with the first text segment, and counting the number of key-value pairs; and determining the topic popularity of the live broadcast based on the number of key-value pairs. 3 . The method of claim 1 , wherein the determining corresponding reply text information based on the first text information when the topic popularity of the live broadcast meets a first set condition, comprises: extracting keywords from the first text information to obtain a plurality of keywords when the topic popularity of the live broadcast meets the first set condition; performing topic classification on the plurality of keywords to obtain at least one topic set; determining a topic repetition degree of the live broadcast based on the number of topic sets; and determining the corresponding reply text information based on the first text information when the topic repetition degree of the live broadcast meets a second set condition. 4 . The method of claim 1 , wherein the virtual characters comprise N virtual characters, N is a positive integer greater than 1, and the determining the corresponding reply text information based on the first text information comprises: determining a corresponding target text generation model among M text generation models based on a style of a first virtual character among the N virtual characters, wherein Mis a positive integer greater than 1; inputting the first text information into the target text generation model corresponding to the first virtual character to obtain reply text information of the first virtual character; for an i th virtual character among the N virtual characters, performing operations of: determining a corresponding target text generation model among the M text generation models based on a style of the i th virtual character, wherein i is a positive integer greater than 1; and inputting the first text information and reply text information of the first virtual character to an i−1 th virtual character into the target text generation model corresponding to the i th virtual character to obtain reply text information of the i th virtual character. 5 . The method of claim 4 , wherein the rendering the virtual characters based on the reply text information to obtain a second video stream, comprises: rendering each virtual character based on the reply text information of each virtual character to obtain a second video stream of each virtual character. 6 . The method of claim 5 , wherein the generating a third video stream of the anchor chatting with the virtual characters based on the first video stream and the second video stream, comprises: mixing the first video stream with the second video stream of each virtual character based on a generation order of the reply text information of the virtual characters, to obtain a third video stream of the anchor chatting with each virtual character. 7 . The method of claim 1 , wherein the determining the corresponding reply text information based on the first text information, comprises: processing the first text information based on styles of the virtual characters to obtain second text information; and processing the second text information based on a text generation model to obtain the reply text information of the virtual characters. 8 . An electronic device, comprising: at least one processor; and a memory connected in communication with the at least one processor, wherein the memory stores an instruction executable by the at least one processor, and the instruction, when executed by the at least one processor, enables the at least one processor to execute: recording a live broadcast of an anchor to obtain a first video stream; performing speech recognition on live speech in the first video stream to obtain first text information; determining topic popularity of the live broadcast based on audience response information in a process of recording the live broadcast and the first text information; determining corresponding reply text information based on the first text information when the topic popularity of the live broadcast meets a first set condition; rendering virtual characters based on the reply text information to obtain a second video stream; and generating a third video stream of the anchor chatting with the virtual characters based on the first video stream and the second video stream. 9 . A non-transitory computer-readable storage medium storing a computer instruction thereon, wherein the computer instruction is used to cause a computer to execute: recording a live broadcast of an anchor to obtain a first video stream; performing speech recognition on live speech in the first video stream to obtain first text information; determining topic popularity of the live broadcast based on audience response information in a process of recording the live broadcast and the first text information; determining corresponding reply text information based on the first text information when the topic popularity of the live broadcast meets a first set condition; rendering virtual characters based on the reply text information to obtain a second video stream; and generating a third video stream of the anchor chatting with the virtual characters based on the first video stream and the second video stream. 10 . The electronic device of claim 8 , wherein the determining topic popularity of the live broadcast based on audience response information in a process of recording the live broadcast and the first text information, comprises: extracting at least one first text segment from the first text information; for each first text segment, searching the audience response information for a second text segment that can form a key-value pair with the first text segment, and counting the number of key-value pairs; and determining the topic popularity of the live broadcast based on the number of key-value pairs. 11 . The electronic device of claim 8 , wherein the determining corresponding reply text information based on the first text information when the topic popularity of the live broadcast meets a first set condition, comprises: extracting keywords from the first text information to obtain a plurality of keywords when the topic popularity of the live broadcast meets the first set condition; performing topic classification on the plurality of keywords to obtain at least one topic set; determining a topic repetition degree o

Assignees

Beijing Baidu Netcom Sci & Tech Co Ltd

Inventors

Wang Chenfei

Classifications

G10L15/26
Speech to text systems (G10L15/08 takes precedence) · CPC title
G06T17/00
Three-dimensional [3D] modelling for computer graphics · CPC title
G10L13/00
Speech synthesis; Text to speech systems · CPC title
H04L51/10
Multimedia information · CPC title
H04L51/52
for supporting social networking services · CPC title

Patent family

Related publications grouped by family.

View patent family 90421018

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12513348B2 cover?: Provided is a video rendering method for a live broadcast scene, relating to the field of live broadcast and the field of large model. The method includes: recording a live broadcast of an anchor to obtain a first video stream; performing speech recognition on live speech in the first video stream to obtain first text information; determining topic popularity of the live broadcast based on audi…
Who is the assignee on this patent?: Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification H04N21/2187. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Dec 30 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).