Automated recording highlights for conferences

US11570403B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11570403-B2
Application numberUS-202117245962-A
CountryUS
Kind codeB2
Filing dateApr 30, 2021
Priority dateApr 30, 2021
Publication dateJan 31, 2023
Grant dateJan 31, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A transcript of a conference (e.g., a video conference, an audio conference, or a telephone call with two or more participants) is processed to extract a conference summary. A short video conference summary or a short audio conference summary is then generated using timestamps from the transcript associated with strings (e.g., sentences) in the transcript that have been selected for highlighting as part of the conference summary. The short video or audio summary may be presented to users along with a text summary of the conference to enable efficient storage and transmission of information from the conference within a unified communications system.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining a transcript of a conference, wherein the transcript includes strings with respective timestamps; inputting strings from the transcript to a machine learning model to obtain respective scores for the strings; selecting a string for highlighting from the transcript based on respective scores of strings; selecting a video excerpt from a video of the conference based on the respective timestamp of the selected string; generating a highlighted transcript as a copy of the transcript with a subset of the strings highlighted, wherein the selected string is highlighted; presenting the highlighted transcript to a user; receiving user edits to the highlighting of the highlighted transcript; selecting a video excerpt from the video of the conference based on the respective timestamp of a string selected based on the user edits to the highlighting; and generating a video conference summary as a sequence of video excerpts from the video, including the selected video excerpt. 2. The method of claim 1 , wherein the strings of the transcript have respective speaker identifiers and the respective speaker identifier for the selected string is associated with a role identifier, and further comprising: selecting the string for highlighting from the transcript based on the role identifier. 3. The method of claim 1 , further comprising: detecting one or more words from a set of keywords in a string from the transcript, wherein the selected string is selected based on presence of the one or more words from the set of keywords. 4. The method of claim 1 , further comprising: detecting an action item phrase in a string from the transcript, wherein the selected string is selected based on presence of the action item phrase. 5. The method of claim 4 , wherein detecting an action item phrase in a string from the transcript comprises: inputting the strings from the transcript to a machine learning classifier that has been trained to output predictions of whether a string includes an action item phrase. 6. A system comprising: a processor, and a memory, wherein the memory stores instructions executable by the processor to: obtain a transcript of a conference, wherein the transcript includes strings with respective timestamps; detect an action item phrase in a string from the transcript, wherein detecting an action item phrase in a string from the transcript comprises inputting the strings from the transcript to a machine learning classifier that has been trained to output predictions of whether a string includes an action item phrase; input strings from the transcript to a machine learning model to obtain respective scores for the strings; select a string for highlighting from the transcript based on respective scores of strings, wherein the selected string is selected based on presence of the action item phrase; select a video excerpt from a video of the conference based on the respective timestamp of the selected string; and generate a video conference summary as a sequence of video excerpts from the video, including the selected video excerpt. 7. The system of claim 6 , wherein the strings of the transcript have respective speaker identifiers and the respective speaker identifier for the selected string is associated with a role identifier, and wherein the memory stores instructions executable by the processor to: select the string for highlighting from the transcript based on the role identifier. 8. The system of claim 6 , wherein the memory stores instructions executable by the processor to: detect one or more words from a set of keywords in a string from the transcript, wherein the selected string is selected based on presence of the one or more words from the set of keywords. 9. The system of claim 6 , wherein the memory stores instructions executable by the processor to: generate a highlighted transcript as a copy of the transcript with a subset of the strings highlighted, wherein the selected string is highlighted; present the highlighted transcript to a user; receive user edits to the highlighting of the highlighted transcript; and select a video excerpt from the video of the conference based on the respective timestamp of a string selected based on the user edits to the highlighting. 10. The system of claim 6 , wherein determining respective scores for strings of the transcript based on content of the strings comprises: determining respective sentence vectors for strings of the transcript, wherein a sentence vector has elements corresponding to words present in the transcript that are non-zero for words present in the string; determining pairwise dot products of the sentence vectors; and determining a respective score for one of the strings based on a sum of the pairwise dot products for the sentence vector of the string. 11. The system of claim 10 , wherein a non-zero element of the respective sentence vector for one of the strings of the transcript is a term frequency-inverse document frequency for a word associated with the non-zero element. 12. A method comprising: obtaining a transcript of a conference, wherein the transcript includes strings with respective timestamps; determining, using a processing apparatus, respective scores for strings of the transcript based on content of the strings; selecting a string for highlighting from the transcript based on respective scores of strings; selecting an audio excerpt from a recording of the conference based on the respective timestamp of the selected string; and generating an audio conference summary as a sequence of audio excerpts from the recording, including the selected audio excerpt, wherein determining respective scores for strings of the transcript based on content of the strings comprises: determining respective sentence vectors for strings of the transcript, wherein a sentence vector has elements corresponding to words present in the transcript that are non-zero for words present in the string; determining pairwise dot products of the sentence vectors; and determining a respective score for one of the strings based on a sum of the pairwise dot products for the sentence vector of the string. 13. The method of claim 12 , wherein determining respective scores for strings of the transcript based on content of the strings comprises: inputting the strings from the transcript to a machine learning model to obtain the respective scores for the strings. 14. The method of claim 12 , wherein a non-zero element of the respective sentence vector for one of the strings of the transcript is a term frequency-inverse document frequency for a word associated with the non-zero element. 15. The method of claim 12 , wherein the strings of the transcript have respective speaker identifiers, and further comprising: identifying speaker segments with respective durations in the transcript, wherein a speaker segment is a sequence of consecutive strings in the transcript that have the same speaker identifier; selecting a speaker segment from the transcript based on a respective duration of the speaker segment; and selecting the string for highlighting from the selected speaker segment based on respective scores of strings in the speaker segment. 16. The method of claim 15 , wherein the respective speaker identifier for the selected string is associated with a role identifier, and further comprising: selecting the speaker segment from the transcript based on the role identifier. 17. The method of claim 12 , further comprising: detecting one or

Assignees

Inventors

Classifications

  • Electronic editing of digitised analogue information signals, e.g. audio or video signals · CPC title

  • Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • H04N7/155Primary

    involving storage of or access to video conference sessions (tracking arrangements for later retrieval of a computer conference content or participants activities H04L12/1831) · CPC title

  • G06F40/166Primary

    Editing, e.g. inserting or deleting · CPC title

  • Recognition of textual entities · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11570403B2 cover?
A transcript of a conference (e.g., a video conference, an audio conference, or a telephone call with two or more participants) is processed to extract a conference summary. A short video conference summary or a short audio conference summary is then generated using timestamps from the transcript associated with strings (e.g., sentences) in the transcript that have been selected for highlightin…
Who is the assignee on this patent?
Zoom Video Communications Inc
What technology area does this patent fall under?
Primary CPC classification H04N7/155. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).