Real-time video conference chat filtering using machine learning models

US12488792B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12488792-B2
Application numberUS-202318339138-A
CountryUS
Kind codeB2
Filing dateJun 21, 2023
Priority dateOct 30, 2020
Publication dateDec 2, 2025
Grant dateDec 2, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various examples, as a user is speaking or presenting content during an online video conference, the data stream may be processed to generate a textual representation (e.g., transcript) of the audio and/or information relating to the video. The textual representation and/or video related information may then be processed to determine a context or one or more topic(s) of discussion. Based on the determined context/topic(s), a corresponding neural network(s) may be selected. Once a neural network has been selected, comments may be retrieved from a chat feature of the application and applied to the neural network. The neural network may then output data to indicate the relevance of the comments to the determined discussion topic. Based on the relevance of the comment, the comment may be allowed, prioritized, deleted, de-emphasized, or otherwise filtered in the chat feature.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: determining a context corresponding to audio data from a data stream; selecting, based at least on the determined context, at least one natural language processing (NLP) machine learning model from at least two NLP machine learning models, each of the at least two NLP machine learning models having a different level of breadth to evaluate a relevance of textual data to the context; computing, using the at least one NLP machine learning model and based at least on the textual data, data indicative of the relevance of the textual data to the determined context evaluated at the different level of breadth that corresponds to the at least one NLP machine learning model; and filtering, based at least on the relevance, the textual data for a display of the textual data in an interface of an application associated with the data stream. 2 . The method of claim 1 , further comprising determining a breadth level corresponding to the at least one NLP machine learning model, wherein the selecting is further based at least on the breadth level. 3 . The method of claim 1 , wherein the selecting is based at least on associating a breadth level with the at least one NLP machine learning model using a hierarchical data structure corresponding to the at least two NLP machine learning models with relative breadth levels corresponding to the context. 4 . The method of claim 1 , further comprising receiving the textual data from a chat feature of the application, wherein the display of the textual data is within the chat feature. 5 . The method of claim 1 , wherein the at least two NLP machine learning models include a first neural network to output a first prediction of the relevance evaluated at a first level of breadth and a second neural network to output a second prediction of the relevance evaluated at a second level of breadth. 6 . The method of claim 1 , further comprising: computing, using the at least one NLP machine learning model and based at least on the textual data, data indicating whether the textual data is offensive or inoffensive, wherein the display of the textual data in the interface of the application is further based at least on whether the textual data is offensive or inoffensive. 7 . The method of claim 1 , wherein the context comprises a topic of discussion for one or more users of the application. 8 . The method of claim 1 , wherein the application includes at least one of a game streaming application, a video conferencing application, or a video streaming application. 9 . A system comprising: one or more processors to execute operations including: determining a context corresponding to a data stream; selecting, based at least on the determined context, at least one natural language processing (NLP) machine learning model from at least two NLP machine learning models, each of the at least two NLP machine learning models having a different level of breadth to evaluate a relevance of textual data to the context; computing, using the at least one NLP machine learning model and based at least on the textual data, data indicative of the relevance of the textual data to the determined context evaluated at the different level of breadth that corresponds to the at least one NLP machine learning model; and presenting, based at least on the relevance, the textual data using an interface of an application associated with the data stream. 10 . The system of claim 9 , wherein the operations further comprise determining a breadth level corresponding to the at least one NLP machine learning model, wherein the selecting is further based at least on the breadth level. 11 . The system of claim 9 , wherein the selecting is based at least on associating a breadth level with the at least one NLP machine learning model using a hierarchical data structure corresponding to the at least two NLP machine learning models with relative breadth levels corresponding to the context. 12 . The system of claim 9 , wherein the operations further include receiving the textual data from a chat feature of the application, wherein the presentation of the textual data is within the chat feature. 13 . The system of claim 9 , wherein the determining the context corresponding to the data stream includes applying audio data from the application to one or more NLP machine learning models. 14 . The system of claim 9 , further comprising: computing, using the at least one NLP machine learning model and based at least on the textual data, data indicating whether the textual data is offensive or inoffensive, wherein the presenting is further based at least on whether the textual data is offensive or inoffensive. 15 . The system of claim 9 , wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing light transport simulation; a system for performing deep learning operations; a system implemented using an edge device; a system implemented using a robot; a system for presenting at least one of virtual reality content or augmented reality content; a system incorporating one or more virtual machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources. 16 . At least one processor comprising: one or more circuits to present textual data from an application associated with a data stream using data computed based at least on applying the textual data to at least one natural language processing (NLP) machine learning model, the at least one NLP machine learning model selected, based at least on a determined context of the data stream, from at least two NLP machine learning models, each of the at least two NLP machine learning models having a different level of scope to evaluate a relevance of the textual data to the context. 17 . The at least one processor of claim 16 , wherein the one or more circuits are further to determine a scope corresponding to the context, wherein the at least one NLP machine learning model is selected based at least on the scope corresponding to the context. 18 . The at least one processor of claim 16 , wherein the at least one NLP machine learning model is selected based at least on associating a scope corresponding to the context with the at least one NLP machine learning model using a hierarchical arrangement of the at least two NLP machine learning models with relative scopes corresponding to the context. 19 . The at least one processor of claim 16 , wherein the one or more circuits are further to receive the textual data from a chat feature of the application, wherein the presentation of the textual data is within the chat feature. 20 . The at least one processor of claim 16 , wherein the processor is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing light transport simulation; a system for performing deep learning operations; a system implemented using an edge device; a system implemented using a robot; a system for presenting at least one of virtual reality content or augmented reality content; a system incorporating one or more virtual machines (VMs); a sy

Assignees

Inventors

Classifications

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Interoperability with other network applications or services · CPC title

  • using artificial neural networks · CPC title

  • Word spotting · CPC title

  • Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12488792B2 cover?
In various examples, as a user is speaking or presenting content during an online video conference, the data stream may be processed to generate a textual representation (e.g., transcript) of the audio and/or information relating to the video. The textual representation and/or video related information may then be processed to determine a context or one or more topic(s) of discussion. Based on …
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G10L15/1815. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 02 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).