What technology area does this patent fall under?

Primary CPC classification H04N7/15. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Oct 08 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Altering undesirable communication data for communication sessions

US10440324B1 · US · B1

Patent metadata
Field	Value
Publication number	US-10440324-B1
Application number	US-201816123653-A
Country	US
Kind code	B1
Filing date	Sep 6, 2018
Priority date	Sep 6, 2018
Publication date	Oct 8, 2019
Grant date	Oct 8, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This disclosure describes techniques implemented partly by a communications service for identifying and altering undesirable portions of communication data, such as audio data and video data, from a communication session between computing devices. For example, the communications service may monitor the communications session to alter or remove undesirable audio data, such as a dog barking, a doorbell ringing, etc., and/or video data, such as rude gestures, inappropriate facial expressions, etc. The communications service may stream the communication data for the communication session partly through managed servers and analyze the communication data to detect undesirable portions. The communications service may alter or remove the portions of communication data received from a first user device, such as by filtering, refraining from transmitting, or modifying the undesirable portions. The communications service may send the modified communication data to a second user device engaged in the communication session after removing the undesirable portions.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, at one or more computing devices of a cloud-based service provider, a request from a first user device to establish a communication session between the first user device and a second user device via a network-based connection managed by a communications service at least partly managed by the cloud-based service provider; establishing the communication session between the first user device and the second user device via the network-based connection; receiving, from the first user device and via the network-based connection, first audio call data representing sound from an environment of the first user device; receiving, from the first user device and via the network-based connection, first video data representing the environment of the first user device; identifying a first portion of the first audio call data that corresponds to an acoustic fingerprint associated with an undesirable sound; identifying a first portion of the first video data that corresponds to an image fingerprint associated with an undesirable image; determining a first amount of time associated with a first duration of the acoustic fingerprint; determining a second amount of time associated with a second direction of the image fingerprint; altering a second portion of the first audio call data corresponding to the first amount of time associated with the acoustic fingerprint to generate second audio call data, the second portion of the first audio call data being subsequent to the first portion of the first audio call data; altering a second portion of the first video data corresponding to the second amount of time associated with the image fingerprint to generate second video data, the second portion of the first video data being subsequent to the first portion of the first video data; sending, via the network-based connection, the second audio call data to the second user device; and sending, via the network-based connection, the second video data to the second user device. 2. The computer-implemented method of claim 1 , further comprising: identifying substitute audio data associated with the acoustic fingerprint, the substitute audio data representing at least one of a word or a sound to replace the first portion of the first audio call data; and inserting the substitute audio data into the second audio call data at a location from which the second portion of the first audio call data was altered such that the substitute audio data is configured to be output at the second user device in place of the second portion of the first audio call data. 3. The computer-implemented method of claim 1 , wherein identifying the first portion of the first audio call data that corresponds to the acoustic fingerprint associated with the undesirable sound is performed at least partly using a machine-learning (ML) model, and further comprising: identifying the ML model based at least in part on a user account associated with the first user device; generating training audio data based at least in part on the first audio call data, wherein the generating includes: labeling at least one of the first portion of the first audio call data or the second portion of the first audio call data with a first indication that the at least one of the first portion of the first audio call data or the second portion of the first audio call data represents an undesirable sound; and labeling a third portion of the first audio call data with a second indication that the third portion of the first audio call data represents desirable sound, wherein the third portion of the first audio call data does not overlap with the first portion of the first audio call data or the second portion of the first audio call data; and training the ML model using the training audio data. 4. The computer-implemented method of claim 1 , wherein: the identifying the first portion of the first audio call data is performed in real-time or near-real-time for the communication session; and the second audio call data includes the first portion of the first audio call data. 5. A system comprising: one or more processors; and one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to: establishing, at least partly by a communication service associated with a cloud-based service provider, a network-based communication session between a first computing device and a second computing device; receiving, from the first computing device and via the network-based communication session, first audio data representing sound from an environment of the first computing device; identifying a first portion of the first audio data that corresponds to an initial portion of an acoustic fingerprint associated with a sound; in response to identifying the first portion, altering a second portion of the first audio data to generate second audio data, the second portion being adjacent to the first portion of the audio data; and sending the second audio data to the second computing device via the network-based communication session. 6. The system of claim 5 , further comprising: receiving, from the first computing device and via the network-based communication session, first video data representing the environment of the first computing device; identifying a portion of the first video data that corresponds to an image fingerprint associated with an undesirable image; altering the portion of the first video data to generate second video data; and sending the second video data via the network-based communication session to the first computing device. 7. The system of claim 5 , wherein altering the second portion of the first audio data to generate the second audio data comprises refraining from sending the second portion of the first audio data to the second computing device. 8. The system of claim 5 , wherein altering second the portion of the first audio data to generate the second audio data comprises removing the second portion of the first audio data such that the second audio data does not include audio data at a location corresponding to the second portion of the first audio data. 9. The system of claim 8 , comprising further instructions that, when executed by the one or more processors, cause the one or more processors to: identify substitute audio data associated with the acoustic fingerprint, the substitute audio data representing at least one of a word or a noise to replace the second portion of the first audio data; and insert the substitute audio data into the second audio data at the location corresponding to the second portion of the first audio data that was removed. 10. The system of claim 5 , wherein identifying the first portion of the first audio data that corresponds to the initial portion of the acoustic fingerprint associated with the sound comprises utilizing a machine-learning (ML) model to determine that the first portion of the first audio data corresponds to the initial portion of the acoustic fingerprint. 11. The system of claim 10 , comprising further instructions that, when executed by the one or more processors, cause the one or more processors to: identifying the ML model based at least in part on a user account associated with the first computing device; generating training audio data based at least in part on the first audio data, wherein the generating includes: labeling the second portion of the first audio data with a first indication that the second portion of the first audio data represents the sound; and labeling a third portion of the first audio d

Assignees

Amazon Tech Inc

Inventors

Classifications

G06N20/10
using kernel methods, e.g. support vector machines [SVM] · CPC title
G06N3/08
Learning methods · CPC title
G10L25/51
for comparison or discrimination · CPC title
H04L12/1827
Network arrangements for conference optimisation or adaptation · CPC title
G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title

Patent family

Related publications grouped by family.

View patent family 68102025

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10440324B1 cover?: This disclosure describes techniques implemented partly by a communications service for identifying and altering undesirable portions of communication data, such as audio data and video data, from a communication session between computing devices. For example, the communications service may monitor the communications session to alter or remove undesirable audio data, such as a dog barking, a do…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification H04N7/15. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Oct 08 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).