Cascaded prediction-transform approach for mixed machine-human targeted video coding

US11575938B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11575938-B2
Application numberUS-202017137609-A
CountryUS
Kind codeB2
Filing dateDec 30, 2020
Priority dateJan 10, 2020
Publication dateFeb 7, 2023
Grant dateFeb 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Data may be encoded to minimize distortion after decoding, but the quality required for presentation of the decoded data to a machine and the quality required for presentation to a human may be different. To accommodate different quality requirements, video data may be encoded to produce a first set of encoded data and a second set of encoded data, where the first set may be decoded for use by one of a machine consumer or a human consumer, and a combination of the first set and the second set may be decoded for use by the other of a machine consumer or a human consumer. The first and second set may be produced with a neural encoder and a neural decoder, and/or may be produced with the use of prediction and transform neural network modules. A human-targeted structure and a machine-targeted structure may produce the sets of encoded data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: determining whether a human agent or a computer agent will use decoded data, wherein the decoded data comprises at least one of: video data, audio data, image data, or neural features; obtaining a first set of encoded data, wherein the first set of encoded data comprises data encoded with a machine-targeted encoder neural network; obtaining a second set of encoded data, wherein the second set of encoded data comprises data encoded with a human-targeted encoder neural network, wherein the human-targeted encoder neural network is at least partially different from the machine-targeted encoder neural network; and based on a determination that the computer agent will use the decoded data, decoding a first set of encoded data to produce first data and providing the first data for the computer agent; based on a determination that the human agent will use the decoded data, decoding a combination of the first set of encoded data and the second set of encoded data to produce second data and providing the second data for the human agent; or based on a determination that the computer agent and the human agent will use the decoded data, decoding the combination of the first set of encoded data and the second set of encoded data to produce the second data and providing the second data for at least one of the human agent or the computer agent. 2. The method of claim 1 , wherein the decoding of the first set of encoded data to produce the first data comprises: lossless decoding the first set of encoded data; and inverse quantizing the lossless decoded first set of encoded data. 3. The method of claim 1 , wherein the decoding of the combination of the first set of encoded data and the second set of encoded data to produce the second data comprises: lossless decoding the second set of encoded data; inverse quantizing the lossless decoded second set of encoded data; inverse transforming the inverse quantized lossless decoded second set of encoded data; and compensating a combination of the inverse transformed inverse quantized lossless decoded second set of encoded data and machine-targeted features which are converted with a conversion neural network. 4. The method of claim 1 , further comprising at least one of: determining a first rate loss based, at least partially, on the first set of encoded data; transmitting the first data to one or more task neural networks and determining a respective task loss for the one or more task neural networks; determining a consumption loss based, at least partially, on the second set of encoded data; or determining a second rate loss based, at least partially, on the second set of encoded data. 5. The method of claim 4 , further comprising at least one of: causing training of at least one neural network used to encode the first set of encoded data based, at least partially, on the first rate loss; causing training of at least one neural network used to decode the first set of encoded data based, at least partially, on the first rate loss; causing training of the one or more task neural networks based, at least partially, on the first rate loss; causing training of the at least one neural network used to encode the first set of encoded data based, at least partially, on the one or more task losses; causing training of the at least one neural network used to decode the first set of encoded data based, at least partially, on the one or more task losses; causing training of the one or more task neural networks based, at least partially, one the one or more task losses; causing training of at least one neural network used to encode the second set of encoded data based, at least partially, on the consumption loss; causing training of at least one neural network used to decode the second set of encoded data based, at least partially, on the consumption loss; causing training of the at least one neural network used to encode the second set of encoded data based, at least partially, on the second rate loss; or causing training of the at least one neural network used to decode the second set of encoded data based, at least partially, on the second rate loss. 6. An apparatus comprising: at least one processor; and at least one non-transitory memory and computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform: determine whether a human agent or a computer agent will use decoded data, wherein the decoded data comprises at least one of: video data, audio data, or image data; obtain a first set of encoded data, wherein the first set of encoded data comprises data encoded with a machine-targeted encoder neural network; obtain a second set of encoded data, wherein the second set of encoded data comprises data encoded with a human-targeted encoder neural network, wherein the human-targeted encoder neural network is at least partially different from the machine-targeted encoder neural network; and based on a determination that the computer agent will use the decoded data, decode a first set of encoded data to produce first data and provide the first data for the computer agent; based on a determination that the human agent will use the decoded data, decode a combination of the first set of encoded data and the second set of encoded data to produce second data and provide the second data for the human agent; or based on a determination that the computer agent and the human agent will use the decoded data, decode the combination of the first set of encoded data and the second set of encoded data to produce the second data and provide the second data for at least one of the human agent or the computer agent. 7. The apparatus of claim 6 , wherein decoding the first set of encoded data to produce the first data comprises the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to: lossless decode the first set of encoded data; and inverse quantize the lossless decoded first set of encoded data. 8. The apparatus of claim 6 , wherein decoding the combination of the first set of encoded data and the second set of encoded data to produce the second data comprises the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to: lossless decode the second set of encoded data; inverse quantize the lossless decoded second set of encoded data; and inverse transform the inverse quantized lossless decoded second set of encoded data; and compensate a combination of the inverse transformed inverse quantized lossless decoded second set of encoded data and machine-targeted features which are converted with a conversion neural network. 9. The apparatus of claim 6 , wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform at least one of: determine a first rate loss based, at least partially, on the first set of encoded data; transmit the first data to one or more task neural networks and determine a respective task loss for the one or more task neural networks; determine a consumption loss based, at least partially, on the second set of encoded data; or determine a second rate loss based, at least partially, on the second set of encoded data. 10. The apparatus of claim 9 , wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform at least one of: cause training of at least one neural network used to encode th

Assignees

Inventors

Classifications

  • H04N19/30Primary

    using hierarchical techniques, e.g. scalability (H04N19/63 takes precedence) · CPC title

  • Quantisation · CPC title

  • characterised by the element, parameter or criterion affecting or controlling the adaptive coding · CPC title

  • H04N19/619Primary

    the transform being operated outside the prediction loop · CPC title

  • the region being a block, e.g. a macroblock · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11575938B2 cover?
Data may be encoded to minimize distortion after decoding, but the quality required for presentation of the decoded data to a machine and the quality required for presentation to a human may be different. To accommodate different quality requirements, video data may be encoded to produce a first set of encoded data and a second set of encoded data, where the first set may be decoded for use by …
Who is the assignee on this patent?
Nokia Technologies Oy
What technology area does this patent fall under?
Primary CPC classification H04N19/30. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Feb 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).