Reducing noise in a shared media session
US-9293148-B2 · Mar 22, 2016 · US
US10440324B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10440324-B1 |
| Application number | US-201816123653-A |
| Country | US |
| Kind code | B1 |
| Filing date | Sep 6, 2018 |
| Priority date | Sep 6, 2018 |
| Publication date | Oct 8, 2019 |
| Grant date | Oct 8, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This disclosure describes techniques implemented partly by a communications service for identifying and altering undesirable portions of communication data, such as audio data and video data, from a communication session between computing devices. For example, the communications service may monitor the communications session to alter or remove undesirable audio data, such as a dog barking, a doorbell ringing, etc., and/or video data, such as rude gestures, inappropriate facial expressions, etc. The communications service may stream the communication data for the communication session partly through managed servers and analyze the communication data to detect undesirable portions. The communications service may alter or remove the portions of communication data received from a first user device, such as by filtering, refraining from transmitting, or modifying the undesirable portions. The communications service may send the modified communication data to a second user device engaged in the communication session after removing the undesirable portions.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving, at one or more computing devices of a cloud-based service provider, a request from a first user device to establish a communication session between the first user device and a second user device via a network-based connection managed by a communications service at least partly managed by the cloud-based service provider; establishing the communication session between the first user device and the second user device via the network-based connection; receiving, from the first user device and via the network-based connection, first audio call data representing sound from an environment of the first user device; receiving, from the first user device and via the network-based connection, first video data representing the environment of the first user device; identifying a first portion of the first audio call data that corresponds to an acoustic fingerprint associated with an undesirable sound; identifying a first portion of the first video data that corresponds to an image fingerprint associated with an undesirable image; determining a first amount of time associated with a first duration of the acoustic fingerprint; determining a second amount of time associated with a second direction of the image fingerprint; altering a second portion of the first audio call data corresponding to the first amount of time associated with the acoustic fingerprint to generate second audio call data, the second portion of the first audio call data being subsequent to the first portion of the first audio call data; altering a second portion of the first video data corresponding to the second amount of time associated with the image fingerprint to generate second video data, the second portion of the first video data being subsequent to the first portion of the first video data; sending, via the network-based connection, the second audio call data to the second user device; and sending, via the network-based connection, the second video data to the second user device. 2. The computer-implemented method of claim 1 , further comprising: identifying substitute audio data associated with the acoustic fingerprint, the substitute audio data representing at least one of a word or a sound to replace the first portion of the first audio call data; and inserting the substitute audio data into the second audio call data at a location from which the second portion of the first audio call data was altered such that the substitute audio data is configured to be output at the second user device in place of the second portion of the first audio call data. 3. The computer-implemented method of claim 1 , wherein identifying the first portion of the first audio call data that corresponds to the acoustic fingerprint associated with the undesirable sound is performed at least partly using a machine-learning (ML) model, and further comprising: identifying the ML model based at least in part on a user account associated with the first user device; generating training audio data based at least in part on the first audio call data, wherein the generating includes: labeling at least one of the first portion of the first audio call data or the second portion of the first audio call data with a first indication that the at least one of the first portion of the first audio call data or the second portion of the first audio call data represents an undesirable sound; and labeling a third portion of the first audio call data with a second indication that the third portion of the first audio call data represents desirable sound, wherein the third portion of the first audio call data does not overlap with the first portion of the first audio call data or the second portion of the first audio call data; and training the ML model using the training audio data. 4. The computer-implemented method of claim 1 , wherein: the identifying the first portion of the first audio call data is performed in real-time or near-real-time for the communication session; and the second audio call data includes the first portion of the first audio call data. 5. A system comprising: one or more processors; and one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to: establishing, at least partly by a communication service associated with a cloud-based service provider, a network-based communication session between a first computing device and a second computing device; receiving, from the first computing device and via the network-based communication session, first audio data representing sound from an environment of the first computing device; identifying a first portion of the first audio data that corresponds to an initial portion of an acoustic fingerprint associated with a sound; in response to identifying the first portion, altering a second portion of the first audio data to generate second audio data, the second portion being adjacent to the first portion of the audio data; and sending the second audio data to the second computing device via the network-based communication session. 6. The system of claim 5 , further comprising: receiving, from the first computing device and via the network-based communication session, first video data representing the environment of the first computing device; identifying a portion of the first video data that corresponds to an image fingerprint associated with an undesirable image; altering the portion of the first video data to generate second video data; and sending the second video data via the network-based communication session to the first computing device. 7. The system of claim 5 , wherein altering the second portion of the first audio data to generate the second audio data comprises refraining from sending the second portion of the first audio data to the second computing device. 8. The system of claim 5 , wherein altering second the portion of the first audio data to generate the second audio data comprises removing the second portion of the first audio data such that the second audio data does not include audio data at a location corresponding to the second portion of the first audio data. 9. The system of claim 8 , comprising further instructions that, when executed by the one or more processors, cause the one or more processors to: identify substitute audio data associated with the acoustic fingerprint, the substitute audio data representing at least one of a word or a noise to replace the second portion of the first audio data; and insert the substitute audio data into the second audio data at the location corresponding to the second portion of the first audio data that was removed. 10. The system of claim 5 , wherein identifying the first portion of the first audio data that corresponds to the initial portion of the acoustic fingerprint associated with the sound comprises utilizing a machine-learning (ML) model to determine that the first portion of the first audio data corresponds to the initial portion of the acoustic fingerprint. 11. The system of claim 10 , comprising further instructions that, when executed by the one or more processors, cause the one or more processors to: identifying the ML model based at least in part on a user account associated with the first computing device; generating training audio data based at least in part on the first audio data, wherein the generating includes: labeling the second portion of the first audio data with a first indication that the second portion of the first audio data represents the sound; and labeling a third portion of the first audio d
using kernel methods, e.g. support vector machines [SVM] · CPC title
Learning methods · CPC title
for comparison or discrimination · CPC title
Network arrangements for conference optimisation or adaptation · CPC title
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.