Conference summary generation

US12200402B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12200402-B2
Application numberUS-202318513437-A
CountryUS
Kind codeB2
Filing dateNov 17, 2023
Priority dateApr 30, 2021
Publication dateJan 14, 2025
Grant dateJan 14, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A transcript of a conference (e.g., a video conference, an audio conference, or a telephone call with two or more participants) is processed to extract a conference summary. The transcript includes strings that are associated with respective timestamps and respective speaker identifiers. Speaker segments—sequences of consecutive strings attributed to the same speaker—are identified in the transcript. A speaker segment is selected based on its duration in time and one or more strings are selected from with the selected speaker segment for inclusion in the conference summary. A short video conference summary or a short audio conference summary is then generated using timestamps from the transcript associated with strings (e.g., sentences) that have been selected for inclusion in the conference summary. The short video or audio summary may be presented to users to enable efficient storage and transmission of information from the conference within a unified communications system.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining a transcript of a conference, wherein the transcript includes strings with respective timestamps; inputting one or more strings from the transcript to a machine learning model to obtain respective scores for the one or more strings, wherein the respective scores are used to rank the one or more strings of the transcript; selecting a string for highlighting from the transcript based on the respective scores of the one or more strings; selecting a video excerpt from a video of the conference based on the respective timestamp of the selected string; and generating a video conference summary as a sequence of video excerpts from the video, including the selected video excerpt. 2. The method of claim 1 , wherein the strings of the transcript have respective speaker identifiers and the respective speaker identifier for the selected string is associated with a role identifier, and further comprising: selecting the selected string for highlighting from the transcript based on the role identifier. 3. The method of claim 1 , further comprising: generating a highlighted transcript as a copy of the transcript with a subset of the strings highlighted, wherein the selected string is highlighted; presenting the highlighted transcript to a user; receiving user edits to the highlighting of the highlighted transcript; and selecting a video excerpt from the video of the conference based on the respective timestamp of a string selected based on the user edits to the highlighting. 4. The method of claim 1 , further comprising: detecting one or more words from a set of keywords in the selected string from the transcript, wherein the selected string is selected based on presence of the one or more words from the set of keywords. 5. The method of claim 1 , further comprising: detecting an action item phrase in the selected string from the transcript, wherein the selected string is selected based on presence of the action item phrase. 6. The method of claim 5 , wherein detecting an action item phrase in the selected string from the transcript comprises: inputting the strings from the transcript to a machine learning classifier that has been trained to output predictions of whether a string includes an action item phrase. 7. A system comprising: a processor, and a memory, wherein the memory stores instructions executable by the processor to: obtain a transcript of a conference, wherein the transcript includes strings with respective timestamps; input one or more strings from the transcript to a machine learning model to obtain respective scores for the strings, wherein the respective scores are used to rank the one or more strings of the transcript; select a string for highlighting from the transcript based on the respective scores of the one or more strings; select a video excerpt from a video of the conference based on the respective timestamp of the selected string; and generate a video conference summary as a sequence of video excerpts from the video, including the selected video excerpt. 8. The system of claim 7 , wherein the strings of the transcript have respective speaker identifiers and the respective speaker identifier for the selected string is associated with a role identifier, and wherein the memory stores instructions executable by the processor to: select the selected string for highlighting from the transcript based on the role identifier. 9. The system of claim 7 , wherein the memory stores instructions executable by the processor to: detect one or more words from a set of keywords in the selected string from the transcript, wherein the selected string is selected based on presence of the one or more words from the set of keywords. 10. The system of claim 7 , wherein the memory stores instructions executable by the processor to: generate a highlighted transcript as a copy of the transcript with a subset of the strings highlighted, wherein the selected string is highlighted; present the highlighted transcript to a user; receive user edits to the highlighting of the highlighted transcript; and select a video excerpt from the video of the conference based on the respective timestamp of a string selected based on the user edits to the highlighting. 11. The system of claim 7 , wherein the memory stores instructions executable by the processor to: detect an action item phrase in the selected string from the transcript, wherein the selected string is selected based on presence of the action item phrase. 12. The system of claim 11 , wherein the memory stores instructions executable by the processor to: input the strings from the transcript to a machine learning classifier that has been trained to output predictions of whether a string includes an action item phrase. 13. A method comprising: obtaining a transcript of a conference, wherein the transcript includes strings with respective timestamps; determining, using a processing apparatus, respective scores for one or more strings of the transcript based on content of the one or more strings, wherein the respective scores are used to rank the one or more strings of the transcript; selecting a string for highlighting from the transcript based on the respective scores of the one or more strings; selecting an audio excerpt from a recording of the conference based on the respective timestamp of the selected string; and generating an audio conference summary as a sequence of audio excerpts from the recording, including the selected audio excerpt. 14. The method of claim 13 , wherein determining respective scores for one or more strings of the transcript based on content of the one or more strings comprises: inputting the one or more strings from the transcript to a machine learning model to obtain the respective scores for the one or more strings. 15. The method of claim 13 , further comprising: generating a highlighted transcript as a copy of the transcript with a subset of the strings highlighted, wherein the selected string is highlighted; presenting the highlighted transcript to a user; receiving user edits to the highlighting of the highlighted transcript; and selecting an audio excerpt from a recording of the conference based on the respective timestamp of a string selected based on the user edits to the highlighting. 16. The method of claim 13 , wherein the strings of the transcript have respective speaker identifiers, and further comprising: identifying speaker segments with respective durations in the transcript, wherein a speaker segment is a sequence of consecutive strings in the transcript that have the same speaker identifier; selecting a speaker segment from the transcript based on a respective duration of the speaker segment; and selecting the string for highlighting from the selected speaker segment based on the respective scores of strings in the speaker segment. 17. The method of claim 16 , wherein the respective speaker identifier for the selected string is associated with a role identifier, and further comprising: selecting the speaker segment from the transcript based on the role identifier. 18. The method of claim 13 , further comprising: detecting one or more words from a set of keywords in a string from the transcript, wherein the selected string is selected based on presence of the one or more words from the set of keywords. 19. The method of claim 13 , further comprising: detecting an action item phrase in the selected string from the transcript, wherein th

Assignees

Inventors

Classifications

  • Electronic editing of digitised analogue information signals, e.g. audio or video signals · CPC title

  • Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • Editing, e.g. inserting or deleting · CPC title

  • Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals (selecting H04Q) · CPC title

  • H04N7/155Primary

    involving storage of or access to video conference sessions (tracking arrangements for later retrieval of a computer conference content or participants activities H04L12/1831) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12200402B2 cover?
A transcript of a conference (e.g., a video conference, an audio conference, or a telephone call with two or more participants) is processed to extract a conference summary. The transcript includes strings that are associated with respective timestamps and respective speaker identifiers. Speaker segments—sequences of consecutive strings attributed to the same speaker—are identified in the trans…
Who is the assignee on this patent?
Zoom Video Communications Inc
What technology area does this patent fall under?
Primary CPC classification H04N7/155. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jan 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).