Apparatuses and methods for performing artificial intelligence encoding and artificial intelligence decoding on image
US-2022138904-A1 · May 5, 2022 · US
US12266165B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12266165-B2 |
| Application number | US-202217824587-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 25, 2022 |
| Priority date | May 25, 2021 |
| Publication date | Apr 1, 2025 |
| Grant date | Apr 1, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An electronic device is provided. The electronic device includes a memory storing one or more instructions, and a processor configured to execute the one or more instruction stored in the memory. The processor is configured to execute the one or more instructions to obtain a subjective assessment score for each of a plurality of sub-regions included in an input frame, the subjective assessment score being a Mean Opinion Score (MOS); obtain a location weight for each of the plurality of sub-regions, the location weight indicating characteristics according to a location of a display; obtain a weighted assessment score for each of the plurality of sub-regions, based on the subjective assessment score for each of the plurality of sub-regions and the location weight for each of the plurality of sub-regions; and obtain a final quality score for the entire video frame, based on the weighted assessment score for each of the plurality of sub-regions.
Opening claim text (preview).
What is claimed is: 1. An electronic device comprising: a memory storing one or more instructions; and a processor configured to execute the one or more instructions stored in the memory to: obtain a subjective assessment score for each of a plurality of sub-regions included in an input frame, the subjective assessment score being a Mean Opinion Score (MOS); obtain a location weight for each of the plurality of sub-regions based on the subjective assessment score for each of the plurality of sub-regions and a subjective assessment score for the input frame, the location weight indicating characteristics according to a location of a display; obtain a weighted assessment score for each of the plurality of sub-regions, based on the subjective assessment score for each of the plurality of sub-regions and the location weight for each of the plurality of sub-regions; and obtain a final quality score for the entire input frame, based on the weighted assessment score for each of the plurality of sub-regions. 2. The electronic device of claim 1 , wherein the processor is further configured to execute the one or more instructions to predict the subjective assessment score for each of the plurality of sub-regions included in the input frame, by using a first neural network trained to learn, from a video frame received, a subjective assessment score for each of the plurality of sub-regions included in the video frame. 3. The electronic device of claim 2 , wherein the first neural network is trained to allow the subjective assessment score for each of the plurality of sub-regions included in the input frame to be equal to a Ground Truth (GT) subjective assessment score for the entire input frame, the GT subjective assessment score being a GT MOS. 4. The electronic device of claim 2 , wherein the processor is further configured to execute the one or more instructions to predict the location weight for each of the plurality of sub-regions from the subjective assessment score for each of the plurality of sub-regions by using a second neural network, and the second neural network is trained to predict a weight corresponding to a difference between the subjective assessment score for each sub-region and a Ground Truth (GT) subjective assessment score for the entire input frame as the location weight for each sub-region, from the subjective assessment score for each of the plurality of sub-regions included in the input frame predicted through the first neural network. 5. The electronic device of claim 4 , wherein the second neural network is trained to allow a mean value of weighted assessment scores obtained by multiplying the subjective assessment score for each of the plurality of sub-regions included in the input frame by the location weight to be equal to the GT subjective assessment score for the entire input frame. 6. The electronic device of claim 1 , wherein the processor is further configured to execute the one or more instructions to obtain the location weight for each of the plurality of sub-regions from the memory. 7. The electronic device of claim 6 , wherein the location weight for each of the plurality of sub-regions is predicted through a second neural network and stored in the memory, and the second neural network is trained to predict a weight corresponding to a difference between the subjective assessment score for each sub-region and a Ground Truth (GT) subjective assessment score for the entire input frame as the location weight for each sub-region, from the subjective assessment score for each of the plurality of sub-regions included in the input frame received, and the second neural network is trained to allow a mean value of weighted assessment scores obtained by multiplying the subjective assessment score for each of the plurality of sub-regions by the location weight to be equal to the GT subjective assessment score for the entire input frame. 8. The electronic device of claim 1 , wherein the processor is further configured to execute the one or more instructions to obtain the weighted assessment score for each respective sub-region of the plurality of sub-regions by multiplying the subjective assessment score for the respective sub-region by the location weight for the respective sub-region. 9. The electronic device of claim 1 , wherein the processor is further configured to execute the one or more instructions to: obtain high-complexity information indicating a region of interest from the input frame; and obtain the final quality score for the entire input frame based on the weighted assessment score and the high-complexity information. 10. The electronic device of claim 9 , wherein the high-complexity information includes at least one of speaker identification information, semantic segmentation information, object detection information, or saliency map information. 11. A video quality assessment method performed by an electronic device, the video quality assessment method comprising: obtaining a subjective assessment score for each of a plurality of sub-regions included in an input frame, the subjective assessment score being a Mean Opinion Score (MOS); obtaining a location weight for each of the plurality of sub-regions based on the subjective assessment score for each of the plurality of sub-regions and a subjective assessment score for the input frame, the location weight indicating characteristics according to a location of a display; obtaining a weighted assessment score for each of the plurality of sub-regions, based on the subjective assessment score for each of the plurality of sub-regions and the location weight for each of the plurality of sub-regions; and obtaining a final quality score for the entire input frame, based on the weighted assessment score for each of the plurality of sub-regions. 12. The video quality assessment method of claim 11 , wherein the obtaining of the subjective assessment score for each of the plurality of sub-regions included in the input frame comprises predicting the subjective assessment score for each of the plurality of sub-regions, by using a first neural network trained to learn, from a video frame received a subjective assessment score for each of the plurality of sub-regions included in the video frame. 13. The video quality assessment method of claim 12 , wherein the first neural network is trained to allow the subjective assessment score for each of the plurality of sub-regions included in the input frame to be equal to a Ground Truth (GT) subjective assessment score for the entire input frame, the GT subjective assessment score being a GT MOS. 14. The video quality assessment method of claim 12 , wherein the obtaining of the location weight for each of the plurality of sub-regions comprises predicting the location weight for each of the plurality of sub-regions from the subjective assessment score for each of the plurality of sub-regions by using a second neural network, and the second neural network is trained to predict a weight corresponding to a difference between the subjective assessment score for each sub-region and a Ground Truth (GT) subjective assessment score for the entire input frame as the location weight for each sub-region, from the subjective assessment score for each of the plurality of sub-regions included in the input frame predicted through the first neural network. 15. The video quality assessment method of claim 14 , wherein the second neural network is trained to allow a mean value of weighted assessment scores obtained by multiplying the subjective assessment score for each of the plurality of sub-regions included in the input f
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Combinations of networks · CPC title
using neural networks · CPC title
in video content (extracting overlay text G06V20/62; video retrieval G06F16/70; processing of video elementary streams in video servers H04N21/234; processing of video elementary streams in video clients H04N21/44) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.