Feeling-expressing-word processing device, feeling-expressing-word processing method, and feeling-expressing-word processing program
US-9224033-B2 · Dec 29, 2015 · US
US9330171B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9330171-B1 |
| Application number | US-201414161146-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jan 22, 2014 |
| Priority date | Oct 17, 2013 |
| Publication date | May 3, 2016 |
| Grant date | May 3, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method includes receiving, by a processing device of a content sharing platform, a video content, selecting at least one video frame from the video content, subsampling the at least one video frame to generate a first representation of the at least one video frame, selecting a sub-region of the at least one video frame to generate a second representation of the at least one video frame, and applying a convolutional neuron network to the first and second representations of the at least one video frame to generate an annotation for the video content.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving, by a processing device of a content sharing platform, a video content; selecting at least one video frame from the video content, wherein the at least one video frame covers a spatial area at a first resolution; subsampling the at least one video frame to generate a first representation of the at least one video frame, wherein the first representation is at a second resolution that is lower than the first resolution; selecting, at the first resolution, a sub-region of the at least one video frame to generate a second representation of the at least one video frame, wherein the sub-region covers a spatial area that is smaller than the spatial area covered by the at least one video frame; and executing a convolutional neuron network, using the first representation as a first input of the convolutional neuron network and using the second representation as a second input of the convolutional neuron network, to generate an annotation for the video content. 2. The method of claim 1 , wherein the second representation is a fovea representation that is at a same spatial sampling rate as the at least one video frame. 3. The method of claim 1 , wherein the at least one video frame is a single frame. 4. The method of claim 1 , wherein the at least one video frame includes one of two consecutive video frames or at least two non-consecutive video frames. 5. The method of claim 1 , wherein the convolutional neuron network includes at least one convolution layer, at least one pooling layer, and a connected neuron network. 6. The method of claim 5 , wherein a first convolution layer and a first pooling layer are applied to a first number of video frames, and a second convolution layer and a second pooling layer are applied to a second number of video frames, and wherein the first number is different from the second number. 7. The method of claim 5 , wherein an earlier layer of the convolutional neuron network is applied to a higher number of video frames than a later layer of the convolutional neuron network. 8. The method of claim 1 , further comprising making the video content searchable according to the annotation. 9. A non-transitory machine-readable storage medium storing instructions which, when executed, cause a processing device to perform operations comprising: receiving a video content; selecting at least one video frame from the video content, wherein the at least one video frame covers a spatial area at a first resolution; subsampling the at least one video frame to generate a first representation of the at least one video frame, wherein the first representation is at a second resolution that is lower than the first resolution; selecting, at the first resolution, a sub-region of the at least one video frame to generate a second representation of the at least one video frame, wherein the sub-region covers a spatial area that is smaller than the spatial area covered by the at least one video frame; and executing a convolutional neuron network, using the first representation as a first input of the convolutional neuron network and using second representation as a second input of the convolutional neuron network, to generate an annotation for the video content. 10. The machine-readable storage medium of claim 9 , wherein the second representation is a fovea representation that is at a same spatial sampling rate as the at least one video frame. 11. The machine-readable storage medium of claim 9 , wherein the at least one video frame is a single frame. 12. The machined-readable storage medium of claim 9 , wherein the at least one video frame includes one of at least two consecutive video frames or at least two non-consecutive video frames. 13. The machine-readable storage medium of claim 9 , wherein the convolutional neuron network includes at least one convolution layer, at least one pooling layer, and a connected neuron network. 14. The machine-readable storage medium of claim 11 , wherein a first convolution layer and a first pooling layer are applied to a first number of video frames, and a second convolution layer and a second pooling layer are applied to a second number of video frames, and wherein the first number is different from the second number. 15. A system comprising: a memory; and a processor, operatively coupled to the memory, to: receive a video content; select at least one video frame from the video content, wherein the at least one video frame covers a spatial area at a first resolution; subsample the at least one video frame to generate a first representation of the at least one video frame, wherein the first representation is at a second resolution that is lower than the first resolution; select, at the first resolution, a sub-region of the at least one video frame to generate a second representation of the at least one video frame, wherein the sub-region covers a spatial area that is smaller than the spatial area covered by the at least one video frame; and execute a convolutional neuron, using the first representation as a first input of the convolutional neuron network and using the second representation as a second input of the convolutional neuron network, to generate an annotation for the video content. 16. The system of claim 15 , wherein the second representation is a fovea representation that is at a same spatial sampling rate as the at least one video frame. 17. The system of claim 15 , wherein the convolutional neuron network includes at least one convolution layer, at least one pooling layer, and a connected neuron network. 18. The user device of claim 17 , wherein a first convolution layer and a first pooling layer are applied to a first number of video frames, and a second convolution layer and a second pooling layer are applied to a second number of video frames, and wherein the first number is different from the second number.
Combinations of networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.