Systems and methods for processing content using convolutional neural networks

US9754351B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9754351-B2
Application numberUS-201514983477-A
CountryUS
Kind codeB2
Filing dateDec 29, 2015
Priority dateNov 5, 2015
Publication dateSep 5, 2017
Grant dateSep 5, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and non-transitory computer-readable media can obtain a set of video frames at a first resolution. Process the set of video frames using a convolutional neural network to output one or more signals, the convolutional neural network including (i) a set of two-dimensional convolutional layers and (ii) a set of three-dimensional convolutional layers, wherein the processing causes the set of video frames to be reduced to a second resolution. Process the one or more signals using a set of three-dimensional de-convolutional layers of the convolutional neural network. Obtain one or more outputs corresponding to the set of video frames from the convolutional neural network.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: obtaining, by a computing system, a set of video frames at a first resolution; processing, by the computing system, the set of video frames using a convolutional neural network to output one or more signals corresponding to the set of video frames, the convolutional neural network including (i) a set of two-dimensional convolutional layers, (ii) a set of three-dimensional convolutional layers, and (iii) a set of three-dimensional de-convolutional layers, wherein the three-dimensional convolutional layers reduce the set of video frames to a second resolution, and wherein the three-dimensional de-convolutional layers upsample the set of video frames; and obtaining, by the computing system, the one or more outputted signals corresponding to the set of video frames from the convolutional neural network. 2. The computer-implemented method of claim 1 , wherein obtaining the one or more outputs corresponding to the set of video frames further comprises: obtaining, by the computing system, one or more respective feature descriptors for one or more voxels in the set of video frames, wherein each feature descriptor references a recognized scene, object, or action. 3. The computer-implemented method of claim 1 , wherein obtaining the one or more outputs corresponding to the set of video frames further comprises: obtaining, by the computing system, a respective optical flow for one or more voxels in the set of video frames, wherein the optical flow for a voxel describes at least a predicted direction and magnitude of the voxel. 4. The computer-implemented method of claim 1 , wherein obtaining the one or more outputs corresponding to the set of video frames further comprises: obtaining, by the computing system, a respective depth measurement for one or more voxels in the set of video frames. 5. The computer-implemented method of claim 1 , wherein processing the one or more signals using the set of three-dimensional de-convolutional layers of the convolutional neural network further comprises: inputting, by the computing system, at least a portion of signals produced by the set of three-dimensional convolutional layers to the set of three-dimensional de-convolutional layers, the three-dimensional de-convolutional layers being trained to apply at least one three-dimensional de-convolutional operation to the portion of signals. 6. The computer-implemented method of claim 5 , wherein the at least one three-dimensional de-convolutional operation is based at least on one or more three-dimensional filters to de-convolve the portion of signals, and wherein the three-dimensional de-convolutional operation causes the representation of the video content to be increased in signal size. 7. The computer-implemented method of claim 1 , wherein processing the set of video frames using the convolutional neural network to output one or more signals, further comprises: inputting, by the computing system, a representation of the set of video frames to the set of two-dimensional convolutional layers to output a set of first signals, the two-dimensional convolutional layers being trained to apply at least one two-dimensional convolutional operation to the representation of the video content; inputting, by the computing system, at least a portion of the set of first signals to the set of three-dimensional convolutional layers to output a set of second signals, the three-dimensional convolutional layers being trained to apply at least one three-dimensional convolutional operation to the set of first signals; and inputting, by the computing system, at least a portion of the set of second signals to the set of three-dimensional de-convolutional layers to output a set of third signals, the three-dimensional de-convolutional layers being trained to apply at least one three-dimensional de-convolutional operation to the set of second signals. 8. The computer-implemented method of claim 7 , wherein the at least one two-dimensional convolutional operation is based at least on one or more two-dimensional filters to convolve the representation of the video content, and wherein the two-dimensional convolutional operation causes the representation of the video content to be reduced in signal size. 9. The computer-implemented method of claim 7 , wherein the at least one three-dimensional convolutional operation is based at least on one or more three-dimensional filters to convolve the set of first signals, and wherein the three-dimensional convolutional operation causes the representation of the video content to be reduced in signal size. 10. The computer-implemented method of claim 1 , wherein the set of video frames includes more than two video frames. 11. A system comprising: at least one processor; and a memory storing instructions that, when executed by the at least one processor, cause the system to perform: obtaining a set of video frames at a first resolution; processing the set of video frames using a convolutional neural network to output one or more signals corresponding to the set of video frames, the convolutional neural network including (i) a set of two-dimensional convolutional layers, (ii) a set of three-dimensional convolutional layers, and (iii) a set of three-dimensional de-convolutional layers, wherein the three-dimensional convolutional layers reduce the set of video frames to a second resolution, and wherein the three-dimensional de-convolutional layers upsample the set of video frames; and obtaining the one or more outputted signals corresponding to the set of video frames from the convolutional neural network. 12. The system of claim 11 , wherein obtaining the one or more outputs corresponding to the set of video frames further causes the system to perform: obtaining one or more respective feature descriptors for one or more voxels in the set of video frames, wherein each feature descriptor references a recognized scene, object, or action. 13. The system of claim 11 , wherein obtaining the one or more outputs corresponding to the set of video frames further causes the system to perform: obtaining a respective optical flow for one or more voxels in the set of video frames, wherein the optical flow for a voxel describes at least a predicted direction and magnitude of the voxel. 14. The system of claim 11 , wherein obtaining the one or more outputs corresponding to the set of video frames further causes the system to perform: obtaining a respective depth measurement for one or more voxels in the set of video frames. 15. The system of claim 11 , wherein processing the one or more signals using the set of three-dimensional de-convolutional layers of the convolutional neural network further causes the system to perform: inputting, by the computing system, at least a portion of signals produced by the set of three-dimensional convolutional layers to the set of three-dimensional de-convolutional layers, the three-dimensional de-convolutional layers being trained to apply at least one three-dimensional de-convolutional operation to the portion of signals. 16. A non-transitory computer-readable storage medium including instructions that, when executed by at least one processor of a computing system, cause the computing system to perform: obtaining a set of video frames at a first resolution; processing the set of video frames using a convolutional neural network to output one or more signals corresponding to the set of video frames, the convolutional neural network including (i) a set of two-dimensional convolutional layers, (ii) a set of three-d

Assignees

Inventors

Classifications

  • using classification, e.g. of video objects · CPC title

  • Distances to cluster centroïds · CPC title

  • Combinations of networks · CPC title

  • Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9754351B2 cover?
Systems, methods, and non-transitory computer-readable media can obtain a set of video frames at a first resolution. Process the set of video frames using a convolutional neural network to output one or more signals, the convolutional neural network including (i) a set of two-dimensional convolutional layers and (ii) a set of three-dimensional convolutional layers, wherein the processing causes…
Who is the assignee on this patent?
Facebook Inc
What technology area does this patent fall under?
Primary CPC classification G06T3/4046. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 05 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).