Video compression using deep generative models
US-2020304802-A1 · Sep 24, 2020 · US
US11575938B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11575938-B2 |
| Application number | US-202017137609-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 30, 2020 |
| Priority date | Jan 10, 2020 |
| Publication date | Feb 7, 2023 |
| Grant date | Feb 7, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Data may be encoded to minimize distortion after decoding, but the quality required for presentation of the decoded data to a machine and the quality required for presentation to a human may be different. To accommodate different quality requirements, video data may be encoded to produce a first set of encoded data and a second set of encoded data, where the first set may be decoded for use by one of a machine consumer or a human consumer, and a combination of the first set and the second set may be decoded for use by the other of a machine consumer or a human consumer. The first and second set may be produced with a neural encoder and a neural decoder, and/or may be produced with the use of prediction and transform neural network modules. A human-targeted structure and a machine-targeted structure may produce the sets of encoded data.
Opening claim text (preview).
What is claimed is: 1. A method comprising: determining whether a human agent or a computer agent will use decoded data, wherein the decoded data comprises at least one of: video data, audio data, image data, or neural features; obtaining a first set of encoded data, wherein the first set of encoded data comprises data encoded with a machine-targeted encoder neural network; obtaining a second set of encoded data, wherein the second set of encoded data comprises data encoded with a human-targeted encoder neural network, wherein the human-targeted encoder neural network is at least partially different from the machine-targeted encoder neural network; and based on a determination that the computer agent will use the decoded data, decoding a first set of encoded data to produce first data and providing the first data for the computer agent; based on a determination that the human agent will use the decoded data, decoding a combination of the first set of encoded data and the second set of encoded data to produce second data and providing the second data for the human agent; or based on a determination that the computer agent and the human agent will use the decoded data, decoding the combination of the first set of encoded data and the second set of encoded data to produce the second data and providing the second data for at least one of the human agent or the computer agent. 2. The method of claim 1 , wherein the decoding of the first set of encoded data to produce the first data comprises: lossless decoding the first set of encoded data; and inverse quantizing the lossless decoded first set of encoded data. 3. The method of claim 1 , wherein the decoding of the combination of the first set of encoded data and the second set of encoded data to produce the second data comprises: lossless decoding the second set of encoded data; inverse quantizing the lossless decoded second set of encoded data; inverse transforming the inverse quantized lossless decoded second set of encoded data; and compensating a combination of the inverse transformed inverse quantized lossless decoded second set of encoded data and machine-targeted features which are converted with a conversion neural network. 4. The method of claim 1 , further comprising at least one of: determining a first rate loss based, at least partially, on the first set of encoded data; transmitting the first data to one or more task neural networks and determining a respective task loss for the one or more task neural networks; determining a consumption loss based, at least partially, on the second set of encoded data; or determining a second rate loss based, at least partially, on the second set of encoded data. 5. The method of claim 4 , further comprising at least one of: causing training of at least one neural network used to encode the first set of encoded data based, at least partially, on the first rate loss; causing training of at least one neural network used to decode the first set of encoded data based, at least partially, on the first rate loss; causing training of the one or more task neural networks based, at least partially, on the first rate loss; causing training of the at least one neural network used to encode the first set of encoded data based, at least partially, on the one or more task losses; causing training of the at least one neural network used to decode the first set of encoded data based, at least partially, on the one or more task losses; causing training of the one or more task neural networks based, at least partially, one the one or more task losses; causing training of at least one neural network used to encode the second set of encoded data based, at least partially, on the consumption loss; causing training of at least one neural network used to decode the second set of encoded data based, at least partially, on the consumption loss; causing training of the at least one neural network used to encode the second set of encoded data based, at least partially, on the second rate loss; or causing training of the at least one neural network used to decode the second set of encoded data based, at least partially, on the second rate loss. 6. An apparatus comprising: at least one processor; and at least one non-transitory memory and computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform: determine whether a human agent or a computer agent will use decoded data, wherein the decoded data comprises at least one of: video data, audio data, or image data; obtain a first set of encoded data, wherein the first set of encoded data comprises data encoded with a machine-targeted encoder neural network; obtain a second set of encoded data, wherein the second set of encoded data comprises data encoded with a human-targeted encoder neural network, wherein the human-targeted encoder neural network is at least partially different from the machine-targeted encoder neural network; and based on a determination that the computer agent will use the decoded data, decode a first set of encoded data to produce first data and provide the first data for the computer agent; based on a determination that the human agent will use the decoded data, decode a combination of the first set of encoded data and the second set of encoded data to produce second data and provide the second data for the human agent; or based on a determination that the computer agent and the human agent will use the decoded data, decode the combination of the first set of encoded data and the second set of encoded data to produce the second data and provide the second data for at least one of the human agent or the computer agent. 7. The apparatus of claim 6 , wherein decoding the first set of encoded data to produce the first data comprises the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to: lossless decode the first set of encoded data; and inverse quantize the lossless decoded first set of encoded data. 8. The apparatus of claim 6 , wherein decoding the combination of the first set of encoded data and the second set of encoded data to produce the second data comprises the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to: lossless decode the second set of encoded data; inverse quantize the lossless decoded second set of encoded data; and inverse transform the inverse quantized lossless decoded second set of encoded data; and compensate a combination of the inverse transformed inverse quantized lossless decoded second set of encoded data and machine-targeted features which are converted with a conversion neural network. 9. The apparatus of claim 6 , wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform at least one of: determine a first rate loss based, at least partially, on the first set of encoded data; transmit the first data to one or more task neural networks and determine a respective task loss for the one or more task neural networks; determine a consumption loss based, at least partially, on the second set of encoded data; or determine a second rate loss based, at least partially, on the second set of encoded data. 10. The apparatus of claim 9 , wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform at least one of: cause training of at least one neural network used to encode th
using hierarchical techniques, e.g. scalability (H04N19/63 takes precedence) · CPC title
Quantisation · CPC title
characterised by the element, parameter or criterion affecting or controlling the adaptive coding · CPC title
the transform being operated outside the prediction loop · CPC title
the region being a block, e.g. a macroblock · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.