Video conference transcript querying using artificial intelligence

US12499894B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-12499894-B1
Application numberUS-202318383751-A
CountryUS
Kind codeB1
Filing dateOct 25, 2023
Priority dateSep 15, 2023
Publication dateDec 16, 2025
Grant dateDec 16, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for video conference transcript querying using artificial intelligence are provided. In an example method, a video conference provider joins a first client device of a plurality of client devices to a video conference hosted by a video conference provider. The video conference provider receives an audio stream from the first client device and generates, based on the audio stream, a portion of a transcript of the video conference. The video conference provider processes the portion of the transcript to configure an AI service to respond to queries based on the video conference. The video conference provider receives a query relating to the video conference and causes the AI service to process the query and the portion of the transcript. The video conference provider outputs a response, generated by the AI service, to the query. The video conference provider then deletes the transcript, responsive to the video conference concluding.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: joining a first client device of a plurality of client devices to a video conference hosted by a video conference provider; receiving an audio stream from the first client device; generating, based on the audio stream, a portion of a transcript of the video conference; prior to the video conference concluding: processing the portion of the transcript to configure an AI service to respond to queries based on the video conference, including the portion of the transcript; receiving a query relating to the video conference; causing the AI service to process the query and the portion of the transcript; and outputting a response, generated by the AI service, to the query; and responsive to the video conference concluding, deleting the portion of the transcript for configuring the AI service. 2 . The method of claim 1 , further comprising responsive to the video conference concluding, deleting the transcript. 3 . The method of claim 1 , further comprising receiving a deletion election, wherein deleting the portion of the transcript for configuring the AI service is further responsive to the deletion election. 4 . The method of claim 1 , wherein generating the portion of the transcript of the video conference comprises: determining, from the audio stream, one or more audio stream portions; responsive to an end of meeting signal, generating the transcript based on the one or more audio stream portions; and designating the transcript as the portion of the transcript. 5 . The method of claim 1 , wherein generating the portion of the transcript of the video conference comprises: determining, from the audio stream, one or more audio stream portions; and generating, based on the one or more audio stream portions, a first portion of the transcript. 6 . The method of claim 5 , wherein the one or more audio stream portions comprise one or more utterances, each utterance comprising at least a portion of a word. 7 . The method of claim 5 , wherein generating the portion of the transcript of the video conference further comprises: determining, from the audio stream, one or more second audio stream portions; generating, based on the one or more second audio stream portions, a second portion of the transcript; generating, based on the first portion of the transcript and the second portion of the transcript, the portion of the transcript; determining, from the audio stream, one or more third audio stream portions; responsive to an end of meeting signal, generating a third portion of the transcript based on the one or more third audio stream portions; generating the transcript based on the first portion of the transcript, the second portion of the transcript, and the third portion of the transcript; and designating the transcript as the portion of the transcript. 8 . The method of claim 1 , wherein the AI service comprises a large language model. 9 . A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations including: joining a first client device of a plurality of client devices to a video conference hosted by a video conference provider; receiving an audio stream from the first client device; generating, based on the audio stream, a portion of a transcript of the video conference; prior to the video conference concluding: processing the portion of the transcript to configure an AI service to respond to queries based on the video conference, including the portion of the transcript; receiving a query relating to the video conference; causing the AI service to process the query and the portion of the transcript; and outputting a response, generated by the AI service, to the query; and responsive to the video conference concluding, deleting the portion of the transcript for configuring the AI service. 10 . The non-transitory computer-readable medium of claim 9 , further responsive to the video conference concluding, deleting the transcript. 11 . The non-transitory computer-readable medium of claim 9 , wherein generating the portion of the transcript of the video conference comprises: determining, from the audio stream, one or more audio stream portions; and generating, based on the one or more audio stream portions, a first portion of the transcript; determining, from the audio stream, one or more second audio stream portions; generating, based on the one or more second audio stream portions, a second portion of the transcript; generating, based on the first portion of the transcript and the second portion of the transcript, the portion of the transcript; determining, from the audio stream, one or more third audio stream portions; responsive to an end of meeting signal, generating a third portion of the transcript based on the one or more third audio stream portions; generating the transcript based on the first portion of the transcript, the second portion of the transcript, and the third portion of the transcript; and designating the transcript as the portion of the transcript. 12 . The non-transitory computer-readable medium of claim 9 , wherein the portion of the transcript is generated in near-real-time. 13 . The non-transitory computer-readable medium of claim 9 , further comprising receiving an indication of a period of time, wherein the AI service is configured to base the response on the portion of the transcript included in the period of time. 14 . The non-transitory computer-readable medium of claim 9 , wherein the query is human-readable and the query is a request to summarize the video conference. 15 . A system comprising: one or more processors; and one or more computer-readable storage media storing instructions which, when executed by the one or more processors, cause the one or more processors to perform operations including: joining a first client device of a plurality of client devices to a video conference hosted by a video conference provider; receiving an audio stream from the first client device; generating, based on the audio stream, a portion of a transcript of the video conference; prior to the video conference concluding: processing the portion of the transcript to configure an AI service to respond to queries based on the video conference, including the portion of the transcript; receiving a query relating to the video conference; causing the AI service to process the query and the portion of the transcript; and outputting a response, generated by the AI service, to the query; and responsive to the video conference concluding, deleting the portion of the transcript for configuring the AI service. 16 . The system of claim 15 , further comprising receiving a deletion election comprising an election to delete the transcript, wherein deleting the portion of the transcript for configuring the AI service is further responsive to the deletion election. 17 . The system of claim 15 , wherein the query is human-readable and wherein the query comprises query context information and the query is a request to generate one or more tasks based on the video conference. 18 . The system of claim 15 , wherein deleting the transcript comprises removing information from at least one of a hard disk, a memory device, or a memory cache. 19 . The system of claim 15 , wherein the portion of the transcript is generated in near-real-time and the query comprises query parameters, wherein the query

Assignees

Inventors

Classifications

  • Conducting the conference, e.g. admission, detection, selection or grouping of participants, correlating users to one or more conference sessions, prioritising transmission · CPC title

  • involving storage of or access to video conference sessions (tracking arrangements for later retrieval of a computer conference content or participants activities H04L12/1831) · CPC title

  • G10L15/26Primary

    Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Tracking arrangements for later retrieval, e.g. recording contents, participants activities or behavior, network status · CPC title

  • Querying · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12499894B1 cover?
Techniques for video conference transcript querying using artificial intelligence are provided. In an example method, a video conference provider joins a first client device of a plurality of client devices to a video conference hosted by a video conference provider. The video conference provider receives an audio stream from the first client device and generates, based on the audio stream, a p…
Who is the assignee on this patent?
Zoom Video Communications Inc, Zoom Communications Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).