What technology area does this patent fall under?

Primary CPC classification G06T3/4046. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 05 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for processing content using convolutional neural networks

US9754351B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9754351-B2
Application number	US-201514983477-A
Country	US
Kind code	B2
Filing date	Dec 29, 2015
Priority date	Nov 5, 2015
Publication date	Sep 5, 2017
Grant date	Sep 5, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and non-transitory computer-readable media can obtain a set of video frames at a first resolution. Process the set of video frames using a convolutional neural network to output one or more signals, the convolutional neural network including (i) a set of two-dimensional convolutional layers and (ii) a set of three-dimensional convolutional layers, wherein the processing causes the set of video frames to be reduced to a second resolution. Process the one or more signals using a set of three-dimensional de-convolutional layers of the convolutional neural network. Obtain one or more outputs corresponding to the set of video frames from the convolutional neural network.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: obtaining, by a computing system, a set of video frames at a first resolution; processing, by the computing system, the set of video frames using a convolutional neural network to output one or more signals corresponding to the set of video frames, the convolutional neural network including (i) a set of two-dimensional convolutional layers, (ii) a set of three-dimensional convolutional layers, and (iii) a set of three-dimensional de-convolutional layers, wherein the three-dimensional convolutional layers reduce the set of video frames to a second resolution, and wherein the three-dimensional de-convolutional layers upsample the set of video frames; and obtaining, by the computing system, the one or more outputted signals corresponding to the set of video frames from the convolutional neural network. 2. The computer-implemented method of claim 1 , wherein obtaining the one or more outputs corresponding to the set of video frames further comprises: obtaining, by the computing system, one or more respective feature descriptors for one or more voxels in the set of video frames, wherein each feature descriptor references a recognized scene, object, or action. 3. The computer-implemented method of claim 1 , wherein obtaining the one or more outputs corresponding to the set of video frames further comprises: obtaining, by the computing system, a respective optical flow for one or more voxels in the set of video frames, wherein the optical flow for a voxel describes at least a predicted direction and magnitude of the voxel. 4. The computer-implemented method of claim 1 , wherein obtaining the one or more outputs corresponding to the set of video frames further comprises: obtaining, by the computing system, a respective depth measurement for one or more voxels in the set of video frames. 5. The computer-implemented method of claim 1 , wherein processing the one or more signals using the set of three-dimensional de-convolutional layers of the convolutional neural network further comprises: inputting, by the computing system, at least a portion of signals produced by the set of three-dimensional convolutional layers to the set of three-dimensional de-convolutional layers, the three-dimensional de-convolutional layers being trained to apply at least one three-dimensional de-convolutional operation to the portion of signals. 6. The computer-implemented method of claim 5 , wherein the at least one three-dimensional de-convolutional operation is based at least on one or more three-dimensional filters to de-convolve the portion of signals, and wherein the three-dimensional de-convolutional operation causes the representation of the video content to be increased in signal size. 7. The computer-implemented method of claim 1 , wherein processing the set of video frames using the convolutional neural network to output one or more signals, further comprises: inputting, by the computing system, a representation of the set of video frames to the set of two-dimensional convolutional layers to output a set of first signals, the two-dimensional convolutional layers being trained to apply at least one two-dimensional convolutional operation to the representation of the video content; inputting, by the computing system, at least a portion of the set of first signals to the set of three-dimensional convolutional layers to output a set of second signals, the three-dimensional convolutional layers being trained to apply at least one three-dimensional convolutional operation to the set of first signals; and inputting, by the computing system, at least a portion of the set of second signals to the set of three-dimensional de-convolutional layers to output a set of third signals, the three-dimensional de-convolutional layers being trained to apply at least one three-dimensional de-convolutional operation to the set of second signals. 8. The computer-implemented method of claim 7 , wherein the at least one two-dimensional convolutional operation is based at least on one or more two-dimensional filters to convolve the representation of the video content, and wherein the two-dimensional convolutional operation causes the representation of the video content to be reduced in signal size. 9. The computer-implemented method of claim 7 , wherein the at least one three-dimensional convolutional operation is based at least on one or more three-dimensional filters to convolve the set of first signals, and wherein the three-dimensional convolutional operation causes the representation of the video content to be reduced in signal size. 10. The computer-implemented method of claim 1 , wherein the set of video frames includes more than two video frames. 11. A system comprising: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the system to perform: obtaining a set of video frames at a first resolution; processing the set of video frames using a convolutional neural network to output one or more signals corresponding to the set of video frames, the convolutional neural network including (i) a set of two-dimensional convolutional layers, (ii) a set of three-dimensional convolutional layers, and (iii) a set of three-dimensional de-convolutional layers, wherein the three-dimensional convolutional layers reduce the set of video frames to a second resolution, and wherein the three-dimensional de-convolutional layers upsample the set of video frames; and obtaining the one or more outputted signals corresponding to the set of video frames from the convolutional neural network. 12. The system of claim 11 , wherein obtaining the one or more outputs corresponding to the set of video frames further causes the system to perform: obtaining one or more respective feature descriptors for one or more voxels in the set of video frames, wherein each feature descriptor references a recognized scene, object, or action. 13. The system of claim 11 , wherein obtaining the one or more outputs corresponding to the set of video frames further causes the system to perform: obtaining a respective optical flow for one or more voxels in the set of video frames, wherein the optical flow for a voxel describes at least a predicted direction and magnitude of the voxel. 14. The system of claim 11 , wherein obtaining the one or more outputs corresponding to the set of video frames further causes the system to perform: obtaining a respective depth measurement for one or more voxels in the set of video frames. 15. The system of claim 11 , wherein processing the one or more signals using the set of three-dimensional de-convolutional layers of the convolutional neural network further causes the system to perform: inputting, by the computing system, at least a portion of signals produced by the set of three-dimensional convolutional layers to the set of three-dimensional de-convolutional layers, the three-dimensional de-convolutional layers being trained to apply at least one three-dimensional de-convolutional operation to the portion of signals. 16. A non-transitory computer-readable storage medium including instructions that, when executed by at least one processor of a computing system, cause the computing system to perform: obtaining a set of video frames at a first resolution; processing the set of video frames using a convolutional neural network to output one or more signals corresponding to the set of video frames, the convolutional neural network including (i) a set of two-dimensional convolutional layers, (ii) a set of three-d

Assignees

Facebook Inc

Inventors

Classifications

G06V10/764
using classification, e.g. of video objects · CPC title
G06F18/24137
Distances to cluster centroïds · CPC title
G06N3/045
Combinations of networks · CPC title
G06V10/454
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title

Patent family

Related publications grouped by family.

View patent family 58662372

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9754351B2 cover?: Systems, methods, and non-transitory computer-readable media can obtain a set of video frames at a first resolution. Process the set of video frames using a convolutional neural network to output one or more signals, the convolutional neural network including (i) a set of two-dimensional convolutional layers and (ii) a set of three-dimensional convolutional layers, wherein the processing causes…
Who is the assignee on this patent?: Facebook Inc
What technology area does this patent fall under?: Primary CPC classification G06T3/4046. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 05 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Video annotation using deep network architectures

Image classification using images with separate grayscale and color channels

System and method for fast template matching in 3D

Image Classification Using Images with Separate Grayscale and Color Channels

Frequently asked questions