Interactive information processing method, device and medium
US-11917344-B2 · Feb 27, 2024 · US
US12483683B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12483683-B2 |
| Application number | US-202418415223-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 17, 2024 |
| Priority date | Sep 29, 2020 |
| Publication date | Nov 25, 2025 |
| Grant date | Nov 25, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed are an interactive information processing method, an electronic device and a storage medium. The method includes establishing a correspondence between a multimedia data stream and a display text generated based on the multimedia data stream; presenting the multimedia data stream and the display text based on the correspondence; and in response to detecting a triggering operation triggering a display content in the display text, adjusting, based on a timestamp corresponding to the display content and the correspondence, the multimedia data stream to navigate to a playback position corresponding to the display content; the display content comprises a text corresponding to speech in the multimedia data stream; and the display text and the multimedia data stream are displayed on different display areas of a page respectively, and a display area occupied by the display text is not superimposed on a display area occupied by the multimedia data stream.
Opening claim text (preview).
What is claimed is: 1 . An interactive information processing method, comprising: establishing a correspondence between a multimedia data stream and a display text generated based on the multimedia data stream; presenting the multimedia data stream and the display text based on the correspondence; and in response to detecting a triggering operation triggering a first display content in the display text, adjusting, based on a timestamp corresponding to the first display content and the correspondence, the multimedia data stream to navigate to a playback position corresponding to the first display content; wherein the first display content comprises a text corresponding to speech in the multimedia data stream; and wherein the display text and the multimedia data stream are displayed on different display areas of a page respectively, and a display area occupied by the display text is not superimposed on a display area occupied by the multimedia data stream, wherein the interactive information processing method further comprises: acquiring an audio-video frame of the multimedia data stream, and determining a user identity of a speaking user corresponding to the audio-video frame; generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame; acquiring a search content edited in a search content editing control, and acquiring a target content corresponding to the search content from the display text, each target content is the same as the search content; displaying the target content differentially in the display text, and marking the audio- video frame corresponding to the target content in a controlling control corresponding to the multimedia data stream; and displaying the display text and the multimedia data stream on a target page, and wherein displaying the display text and the multimedia data stream on the target page comprises: displaying a first display text and a third display text in the display text and a recording screen video in preset display regions on the target page, respectively, wherein content displayed in the first display text are characters generated based on an audio frame comprised in the audio-video frame, the third display text comprises at least one keyword or at least one key sentence, determining a content corresponding to the target content from the first display text in response to detecting that the target content in the third display text is triggered, and displaying the content differentially. 2 . The method of claim 1 , wherein acquiring the audio-video frame of the multimedia data stream, and determining the user identity of the speaking user corresponding to the audio-video frame comprise at least one of: determining the user identity of the speaking user by performing a voiceprint recognition on the audio frame; or determining a client identity of a client to which the audio frame belongs, and determining the user identity of the speaking user based on the client identity. 3 . The method of claim 1 , wherein generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame comprises: obtaining a literal expression corresponding to the audio frame by performing a speech-to-text processing on the audio frame, and generating a first display text in the display text based on the literal expression and the user identity. 4 . The method of claim 3 , wherein obtaining the literal expression corresponding to the audio frame by performing the speech-to-text processing on the audio frame, and generating the first display text in the display text based on the literal expression and the user identity comprise: determining the literal expression corresponding to the audio frame, a timestamp currently corresponding to the audio frame and a user identity of a speaking user to which the audio frame belongs; and generating a second display content in the display text based on the user identity, the timestamp and the literal expression; wherein the second display content comprises at least one paragraph; and obtaining the literal expression corresponding to the audio frame by performing the speech-to-text processing on the audio frame, and generating the first display text in the display text based on the literal expression and the user identity comprise: in a process of performing the speech-to-text processing based on the audio frame, in response to detecting that an interval duration between adjacent audio frames is greater than or equal to a preset interval duration threshold and a user identity of a latter audio frame of the adjacent audio frames is not changed, generating a next paragraph in the second display content based on the latter audio frame. 5 . The method of claim 3 , wherein generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame comprises: determining, based on the audio-video frame, the third display text in the display text to determine the content corresponding to the target content from the first display text in response to detecting that the target content in the third display text is triggered, and display the content differentially. 6 . The method of claim 1 , wherein generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame comprises: obtaining characters in the video frame by performing an image-text recognition on the video frame, and generating a second display text in the display text based on the characters and the user identity. 7 . The method of claim 6 , wherein obtaining the second display text in the display text by performing the image-text recognition on the video frame comprises at least one of: in response to determining that the video frame comprises at least one uniform resource locator (URL) address, generating a third display content in the second display text based on the at least one URL address; or in response to determining that the video frame comprises a character, determining a fourth display content in the second display text based on the character. 8 . The method of claim 1 , wherein generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame comprises: obtaining an original language type corresponding to audio information in the audio-video frame; and generating the display text corresponding to the multimedia data stream based on the user identity, the audio-video frame and the original language type corresponding to the audio-video frame. 9 . The method of claim 1 , wherein generating the display text corresponding to the multimedia data stream based on the user identity and the audio-video frame comprises: determining a target language type, and converting a text obtained by performing speech recognition on the audio-video frame, which corresponds to an original language type, to a literal expression corresponding to the target language type; and generating the display text based on the literal expression corresponding to the target language type and the user identity. 10 . The method of claim 9 , wherein determining the target language type comprises: acquiring a historical language type used by a current client, and determining the target language type based on the historical language type; wherein the historical language type comprises at least one language type; and determining the target language type based on the historical language type comprises at least one of: determining the target language type from the at least one historical language type based on a use freq
for processing of video signals · CPC title
Decision making techniques; Pattern matching strategies · CPC title
Management of the audio stream, e.g. setting of volume, audio stream path · CPC title
in video content (extracting overlay text G06V20/62; video retrieval G06F16/70; processing of video elementary streams in video servers H04N21/234; processing of video elementary streams in video clients H04N21/44) · CPC title
Recognition using electronic means · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.