Intelligent detection and automatic correction of erroneous audio settings in a video conference

US11570223B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11570223-B2
Application numberUS-202117360807-A
CountryUS
Kind codeB2
Filing dateJun 28, 2021
Priority dateAug 20, 2020
Publication dateJan 31, 2023
Grant dateJan 31, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and software to provide intelligent detection and automatic correction of erroneous audio settings in a video conference. Electronic conferences can often be the source of frustration and wasted resources as participants may be forced to contend with extraneous sounds, such as background/ambient noises, or conversations not intended for the conference, provided by an endpoint that should be muted. Similarly, participants may speak with the intention of providing their speech to the conference while their associated endpoint is muted. As a result, the conference may be awkward and lack a productive flow while endpoints are erroneously muted or non-muted. By intelligently processing at least the video portion of a video conference, endpoints/participants may be prompted to mute/unmute or automatically muted/unmuted.

First claim

Opening claim text (preview).

What is claimed is: 1. A video conference server, comprising: a network interface to a network; a storage component comprising a non-transitory storage device; a processor, comprising at least one microprocessor; and wherein the processor, upon accessing machine-executable instructions, causes the processor to perform: broadcast conference content to each of a plurality of endpoints, wherein the conference content comprises an audio portion and/or a video portion received from each of the plurality of endpoints; process the video portion from at least one endpoint to determine whether a respective participant may be unintentionally muted, and to determine a confidence score associated with the determination whether the respective participant may be unintentionally muted; and upon determining that the respective participant is unintentionally muted, execute signaling to an endpoint associated with the respective participant to cause the associated endpoint to prompt the respective participant to unmute their audio. 2. The video conference server of claim 1 , wherein additional instructions, when executed further cause the processor to: determine that audio is being muted from the at least one endpoint; and determine based on analyzing the video portion from the at least one endpoint that a participant appears to be speaking. 3. The video conference server of claim 1 , wherein additional instructions, when executed further cause the processor to: determine that audio is being muted from the at least one endpoint; and determine based on analyzing the video portion from the at least one endpoint that participant's lips are moving. 4. The video conference server of claim 1 , wherein additional instructions, when executed further cause the processor to: determine that audio is being muted from the at least one endpoint; and determine based on analyzing the video portion from the at least one endpoint that participant is looking at a camera and/or screen, and at least one of: the participant's lips are moving, the participant's other facial parts indicate speech, and/or the participant's facial expressions indicate speech. 5. The video conference server of claim 1 , wherein the conference content comprises the audio portion and wherein additional instructions, when executed further cause the processor to: process the audio portion from at least one endpoint to determine a name associated with a particular conference participant was spoken; and upon determining that the name associated with the particular conference participant was spoken, transmit to an endpoint associated with the particular conference participant a prompt to unmute their audio. 6. The video conference server of claim 5 , wherein the prompt comprises at least one of: a textual, visual, and/or audible alert. 7. A method of unmuting an endpoint in a video conference, the method comprising: broadcasting conference content to each of a plurality of endpoints, wherein the conference content comprises an audio portion and/or a video portion received from each of the plurality of endpoints; processing video portion from at least one endpoint to determine whether a respective participant may be unintentionally muted, and to determine a confidence score associated with the determination whether the respective participant may be unintentionally muted; and upon determining that the respective participant is unintentionally muted, executing signaling to an endpoint associated with the respective participant to cause the associated endpoint to prompt the respective participant to unmute their audio. 8. The method of claim 7 , wherein processing the video portion from the at least one endpoint to determine whether the respective participant is unintentionally muted comprises: determining that the at least one endpoint is muted; and determining from the video portion from the at least one endpoint that the respective participant appears to be speaking. 9. The method of claim 7 , wherein processing the video portion from the at least one endpoint to determine whether the respective participant is unintentionally muted comprises: determining that the at least one endpoint is muted; and determining from the video portion from the at least one endpoint that the respective participant's lips are moving. 10. The method of claim 7 , wherein processing the video portion from the at least one endpoint to determine whether the respective participant is unintentionally muted comprises: determining that the at least one endpoint is muted; and determining from the video portion from the at least one endpoint that the participant is looking at a camera and/or screen, and at least one of: the respective participant's lips are moving, the respective participant's other facial parts indicate speech, and/or the respective participant's facial expressions indicate speech. 11. The method of claim 7 , wherein the conference content comprises the audio portion and wherein processing the video portion from the at least one endpoint to determine whether the respective participant may be unintentionally muted further comprises: processing the audio portion from at least one endpoint to determine a name associated with a particular conference participant was spoken; and upon determining that the name associated with the particular conference participant was spoken, signaling an endpoint associated with the particular conference participant to prompt particular conference participant to unmute their audio. 12. The method of claim 11 , wherein the prompt comprises at least one of: a textual, visual, and/or audible alert. 13. A video conferencing endpoint, comprising: a network interface to a network; a storage component comprising a non-transitory storage device; a processor, comprising at least one microprocessor; and wherein the processor, upon accessing machine-executable instructions, causes the processor to perform: receive conference content intended for a video conference, wherein the conference content comprises an audio portion and/or a video portion, and wherein audio of the video conferencing endpoint is muted; process the video portion of a video conferencing endpoint to determine whether the video conferencing endpoint may be unintentionally muted, and to determine a confidence score associated with the determination whether the video conferencing endpoint may be unintentionally muted; and upon determining that the video conferencing endpoint is unintentionally muted, display a prompt to unmute. 14. The video conferencing endpoint of claim 13 , wherein additional machine-executable instructions, when executed further cause the processor to: process the video portion associated with the video conferencing endpoint to determine a participant appears to be speaking. 15. The video conferencing endpoint of claim 13 , wherein additional machine-executable instructions, when executed further cause the processor to: process the video portion associated with the video conferencing endpoint to determine a participant's lips are moving. 16. The video conferencing endpoint of claim 13 , wherein additional machine-executable instructions, when executed further cause the processor to: process the video portion associated with the video conferencing endpoint to determine a participant is looking at a camera and/or screen, and at least one of: the participant's lips are moving, the participant's other facial features indicate speech, and/or the participant's facial expressions indicate speech. 17. The vide

Assignees

Inventors

Classifications

  • with floor control · CPC title

  • Management of the audio stream, e.g. setting of volume, audio stream path · CPC title

  • H04N7/152Primary

    Multipoint control units therefor · CPC title

  • Dynamic expression · CPC title

  • In-session procedures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11570223B2 cover?
Systems, methods, and software to provide intelligent detection and automatic correction of erroneous audio settings in a video conference. Electronic conferences can often be the source of frustration and wasted resources as participants may be forced to contend with extraneous sounds, such as background/ambient noises, or conversations not intended for the conference, provided by an endpoint …
Who is the assignee on this patent?
Avaya Man Lp
What technology area does this patent fall under?
Primary CPC classification H04L65/4038. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jan 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).