Who is the assignee on this patent?

Beijing Zitiao Network Technology Co Ltd

What technology area does this patent fall under?

Primary CPC classification H04N9/8715. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Nov 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Interactive information processing method, device and medium

US12483683B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12483683-B2
Application number	US-202418415223-A
Country	US
Kind code	B2
Filing date	Jan 17, 2024
Priority date	Sep 29, 2020
Publication date	Nov 25, 2025
Grant date	Nov 25, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are an interactive information processing method, an electronic device and a storage medium. The method includes establishing a correspondence between a multimedia data stream and a display text generated based on the multimedia data stream; presenting the multimedia data stream and the display text based on the correspondence; and in response to detecting a triggering operation triggering a display content in the display text, adjusting, based on a timestamp corresponding to the display content and the correspondence, the multimedia data stream to navigate to a playback position corresponding to the display content; the display content comprises a text corresponding to speech in the multimedia data stream; and the display text and the multimedia data stream are displayed on different display areas of a page respectively, and a display area occupied by the display text is not superimposed on a display area occupied by the multimedia data stream.

First claim

Opening claim text (preview).

What is claimed is: 1 . An interactive information processing method, comprising: establishing a correspondence between a multimedia data stream and a display text generated based on the multimedia data stream; presenting the multimedia data stream and the display text based on the correspondence; and in response to detecting a triggering operation triggering a first display content in the display text, adjusting, based on a timestamp corresponding to the first display content and the correspondence, the multimedia data stream to navigate to a playback position corresponding to the first display content; wherein the first display content comprises a text corresponding to speech in the multimedia data stream; and wherein the display text and the multimedia data stream are displayed on different display areas of a page respectively, and a display area occupied by the display text is not superimposed on a display area occupied by the multimedia data stream, wherein the interactive information processing method further comprises: acquiring an audio-video frame of the multimedia data stream, and determining a user identity of a speaking user corresponding to the audio-video frame; generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame; acquiring a search content edited in a search content editing control, and acquiring a target content corresponding to the search content from the display text, each target content is the same as the search content; displaying the target content differentially in the display text, and marking the audio- video frame corresponding to the target content in a controlling control corresponding to the multimedia data stream; and displaying the display text and the multimedia data stream on a target page, and wherein displaying the display text and the multimedia data stream on the target page comprises: displaying a first display text and a third display text in the display text and a recording screen video in preset display regions on the target page, respectively, wherein content displayed in the first display text are characters generated based on an audio frame comprised in the audio-video frame, the third display text comprises at least one keyword or at least one key sentence, determining a content corresponding to the target content from the first display text in response to detecting that the target content in the third display text is triggered, and displaying the content differentially. 2 . The method of claim 1 , wherein acquiring the audio-video frame of the multimedia data stream, and determining the user identity of the speaking user corresponding to the audio-video frame comprise at least one of: determining the user identity of the speaking user by performing a voiceprint recognition on the audio frame; or determining a client identity of a client to which the audio frame belongs, and determining the user identity of the speaking user based on the client identity. 3 . The method of claim 1 , wherein generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame comprises: obtaining a literal expression corresponding to the audio frame by performing a speech-to-text processing on the audio frame, and generating a first display text in the display text based on the literal expression and the user identity. 4 . The method of claim 3 , wherein obtaining the literal expression corresponding to the audio frame by performing the speech-to-text processing on the audio frame, and generating the first display text in the display text based on the literal expression and the user identity comprise: determining the literal expression corresponding to the audio frame, a timestamp currently corresponding to the audio frame and a user identity of a speaking user to which the audio frame belongs; and generating a second display content in the display text based on the user identity, the timestamp and the literal expression; wherein the second display content comprises at least one paragraph; and obtaining the literal expression corresponding to the audio frame by performing the speech-to-text processing on the audio frame, and generating the first display text in the display text based on the literal expression and the user identity comprise: in a process of performing the speech-to-text processing based on the audio frame, in response to detecting that an interval duration between adjacent audio frames is greater than or equal to a preset interval duration threshold and a user identity of a latter audio frame of the adjacent audio frames is not changed, generating a next paragraph in the second display content based on the latter audio frame. 5 . The method of claim 3 , wherein generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame comprises: determining, based on the audio-video frame, the third display text in the display text to determine the content corresponding to the target content from the first display text in response to detecting that the target content in the third display text is triggered, and display the content differentially. 6 . The method of claim 1 , wherein generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame comprises: obtaining characters in the video frame by performing an image-text recognition on the video frame, and generating a second display text in the display text based on the characters and the user identity. 7 . The method of claim 6 , wherein obtaining the second display text in the display text by performing the image-text recognition on the video frame comprises at least one of: in response to determining that the video frame comprises at least one uniform resource locator (URL) address, generating a third display content in the second display text based on the at least one URL address; or in response to determining that the video frame comprises a character, determining a fourth display content in the second display text based on the character. 8 . The method of claim 1 , wherein generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame comprises: obtaining an original language type corresponding to audio information in the audio-video frame; and generating the display text corresponding to the multimedia data stream based on the user identity, the audio-video frame and the original language type corresponding to the audio-video frame. 9 . The method of claim 1 , wherein generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame comprises: determining a target language type, and converting a text obtained by performing speech recognition on the audio-video frame, which corresponds to an original language type, to a literal expression corresponding to the target language type; and generating the display text based on the literal expression corresponding to the target language type and the user identity. 10 . The method of claim 9 , wherein determining the target language type comprises: acquiring a historical language type used by a current client, and determining the target language type based on the historical language type; wherein the historical language type comprises at least one language type; and determining the target language type based on the historical language type comprises at least one of: determining the target language type from the at least one historical language type based on a use freq

Assignees

Beijing Zitiao Network Technology Co Ltd

Inventors

Classifications

G10L25/57
for processing of video signals · CPC title
G10L17/06
Decision making techniques; Pattern matching strategies · CPC title
G06F3/165
Management of the audio stream, e.g. setting of volume, audio stream path · CPC title
G06V20/40
in video content (extracting overlay text G06V20/62; video retrieval G06F16/70; processing of video elementary streams in video servers H04N21/234; processing of video elementary streams in video clients H04N21/44) · CPC title
G06V30/19
Recognition using electronic means · CPC title

Patent family

Related publications grouped by family.

View patent family 74119748

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12483683B2 cover?: Disclosed are an interactive information processing method, an electronic device and a storage medium. The method includes establishing a correspondence between a multimedia data stream and a display text generated based on the multimedia data stream; presenting the multimedia data stream and the display text based on the correspondence; and in response to detecting a triggering operation trigg…
Who is the assignee on this patent?: Beijing Zitiao Network Technology Co Ltd
What technology area does this patent fall under?: Primary CPC classification H04N9/8715. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Nov 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).