Conference segmentation based on conversational dynamics

US2018336902A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018336902-A1
Application numberUS-201615546109-A
CountryUS
Kind codeA1
Filing dateFeb 3, 2016
Priority dateFeb 3, 2015
Publication dateNov 22, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various disclosed implementations involve processing and/or playback of a recording of a conference involving a plurality of conference participants. Some implementations disclosed herein involve analyzing conversational dynamics of the conference recording. Some examples may involve searching the conference recording to determine instances of segment classifications. The segment classifications may be based, at least in part, on conversational dynamics data. Some implementations may involve segmenting the conference recording into a plurality of segments, each of the segments corresponding with a time interval and at least one of the segment classifications. Some implementations allow a listener to scan through a conference recording quickly according to segments, words, topics and/or talkers of interest.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for processing audio data, the method comprising: receiving, by a conversational dynamics analysis module, audio data corresponding to a conference recording of a conference involving a plurality of conference participants, the audio data including at least one of: (a) conference participant speech data from multiple endpoints, recorded separately or (b) conference participant speech data from a single endpoint corresponding to multiple conference participants and including information for identifying conference participant speech for each conference participant of the multiple conference participants; analyzing conversational dynamics of the conference recording to determine conversational dynamics data; searching the conference recording to determine instances of each of a plurality of segment classifications, each of the segment classifications based, at least in part, on the conversational dynamics data; and segmenting the conference recording into a plurality of segments, each of the segments corresponding with a time interval and at least one of the segment classifications, wherein the analyzing, searching and segmenting processes are performed by the conversational dynamics analysis module. 2 . The method of claim 1 , wherein the searching and segmenting processes are recursive. 3 . The method of claim 1 , wherein the searching and segmenting processes are performed multiple times at different time scales. 4 . The method of claim 1 , wherein the searching and segmenting processes are based, at least in part, on a hierarchy of segment classifications. 5 . The method of claim 4 , wherein the hierarchy of segment classifications is based upon one or more criteria from a list of criteria consisting of: a level of confidence with which segments of a particular segment classification may be identified; a level of confidence with which a start time of a segment may be determined; a level of confidence with which an end time of a segment may be determined; and a likelihood that a particular segment classification includes conference participant speech corresponding to a conference topic. 6 . The method of claim 1 , wherein instances of the segment classifications are determined according to a set of rules and wherein the rules are based on one or more conversational dynamics data types selected from a group of conversational dynamics data types consisting of: (a) a doubletalk ratio indicating a fraction of speech time in a time interval during which at least two conference participants are speaking simultaneously; (b) a speech density metric indicating a fraction of the time interval during which there is any conference participant speech; and (c) a dominance metric indicating a fraction of total speech uttered by a dominant conference participant during the time interval, the dominant conference participant being a conference participant who spoke the most during the time interval. 7 . The method of claim 6 , wherein the set of rules includes a rule that classifies a segment as a Mutual Silence segment if the speech density metric is less than a mutual silence threshold. 8 . The method of claim 7 , wherein the set of rules includes a rule that classifies a segment as a Babble segment if the speech density metric is greater than or equal to the mutual silence threshold and the doubletalk ratio is greater than a babble threshold. 9 . The method of claim 8 , wherein the set of rules includes a rule that classifies a segment as a Discussion segment if the speech density metric is greater than or equal to the silence threshold and if the doubletalk ratio is less than or equal to the babble threshold but greater than a discussion threshold. 10 . The method of claim 9 , wherein the set of rules includes a rule that classifies a segment as a Presentation segment if the speech density metric is greater than or equal to the silence threshold, if the doubletalk ratio is less than or equal to the discussion threshold and if the dominance metric is greater than a presentation threshold. 11 . The method of claim 10 , wherein the set of rules includes a rule that classifies a segment as a Question and Answer segment if the speech density metric is greater than or equal to the silence threshold, if the doubletalk ratio is less than or equal to the discussion threshold and if the dominance metric is less than or equal to the presentation threshold but greater than a question and answer threshold. 12 . The method of claim 11 , wherein the searching and segmenting processes are based, at least in part, on a hierarchy of segment classifications and wherein a first hierarchical level of the searching process involves searching the conference recording to determine instances of Babble segments. 13 . The method of claim 12 , wherein a second hierarchical level of the searching process involves searching the conference recording to determine instances of Presentation segments. 14 . The method of claim 13 , wherein a third hierarchical level of the searching process involves searching the conference recording to determine instances of Question and Answer segments and wherein a fourth hierarchical level of the searching process involves searching the conference recording to determine instances of Discussion segments. 15 . The method of claim 1 , wherein instances of the segment classifications are determined according to a machine learning classifier. 16 . The method of claim 15 , wherein the machine learning classifier is selected from a group of machine learning classifiers consisting of: (a) an adaptive boosting technique; (b) a support vector machine technique; (c) a Bayesian network model technique; (d) a neural networks technique; (e) a hidden Markov model technique; and (f) a conditional random fields technique. 17 . An apparatus for processing audio data, the apparatus comprising: an interface system; and a control system capable of: receiving, via the interface system, audio data corresponding to a conference recording of a conference involving a plurality of conference participants, the audio data including at least one of: (a) conference participant speech data from multiple endpoints, recorded separately or (b) conference participant speech data from a single endpoint corresponding to multiple conference participants and including information for identifying conference participant speech for each conference participant of the multiple conference participants; analyzing conversational dynamics of the conference recording to determine conversational dynamics data; searching the conference recording to determine instances of each of a plurality of segment classifications, each of the segment classifications based, at least in part, on the conversational dynamics data; and segmenting the conference recording into a plurality of segments, each of the segments corresponding with a time interval and at least one of the segment classifications. 18 . The apparatus of claim 17 , wherein the control system is capable of performing the searching and segmenting processes multiple times at different time scales. 19 . The apparatus of claim 17 , wherein the searching and segmenting processes are based, at least in part, on a hierarchy of segment classifications. 20 .- 21 . (canceled) 22 . A non-transitory medium having software stored thereon, the software including instructions for controlling one or more devices for processing audio data, the software incl

Assignees

Inventors

Classifications

  • Semantic analysis · CPC title

  • Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities (video conference systems H04N7/15) · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Marking · CPC title

  • specially adapted for particular use · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018336902A1 cover?
Various disclosed implementations involve processing and/or playback of a recording of a conference involving a plurality of conference participants. Some implementations disclosed herein involve analyzing conversational dynamics of the conference recording. Some examples may involve searching the conference recording to determine instances of segment classifications. The segment classification…
Who is the assignee on this patent?
Dolby Laboratories Licensing Corp
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 22 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).