Method for encoding/decoding image and device therefor
US-2020162751-A1 · May 21, 2020 · US
US12309526B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12309526-B2 |
| Application number | US-201917417550-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 24, 2019 |
| Priority date | Jan 23, 2019 |
| Publication date | May 20, 2025 |
| Grant date | May 20, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present application relates to a video image transmission method, device, an interactive intelligent tablet and a storage medium. The method comprises: acquiring a video image captured by a first video communication end; acquiring semantic information in the video image; and sending the semantic information to a second video communication end, wherein the semantic information is used to reconstruct a reconstruction image of the video image at the second video communication end.
Opening claim text (preview).
What is claimed is: 1. A video image transmission method, comprising: acquiring a video image captured by a first video communication end; determining an encoding mode, wherein the encoding mode comprises one of a preset object mode; recognizing the preset object in the video image to obtain a sub-image of the preset object; providing the sub-image of the preset object to a trained neural network, wherein the trained neural network comprises an encoder comprising a series of one or more convolution layers and a middle layer to sequentially process the sub-image, and wherein the one or more convolution layers comprise a lower convolution layer whose output is fed to the middle layer; executing the trained neural network to output a part of feature vectors extracted from the lower convolution layer and a low-dimensional vector from the middle layer, the low-dimensional vector representing semantic information of the preset object in the video image; and sending, through a communication network, the part of the feature vectors extracted from the lower convolution layer and the low-dimensional vector representing the semantic information to a second video communication end, wherein the semantic information is used by a decoder to reconstruct a reconstruction image of the video image at the second video communication end. 2. The method according to claim 1 , wherein the trained neural network is configured to recognize image semantic information, and wherein an amount of information of the part of the feature vectors extracted from the lower convolution layer is dynamically adjusted based on a condition of the communication network. 3. The method according to claim 2 , wherein the semantic information in the video image comprises any one of the following: the preset object in the video image, or global semantic information of the video image. 4. The method according to claim 1 , wherein the preset object comprises a human face or a human body, wherein if the preset object is a human face, extracting the semantic information of the preset object in the video image comprises: recognizing a human face area in the video image to obtain a human face sub-image; inputting the human face sub-image into the trained neural network; and acquiring the output of the trained neural network to obtain semantic information of the human face in the video image, and wherein if the preset object is a human body, extracting the semantic information of the preset object in the video image comprises: recognizing a human body area in the video image to obtain a human body sub-image; inputting the human body sub-image into the trained neural network; and acquiring the output of the trained neural network to obtain semantic information of the human body in the video image. 5. The method according to claim 1 , wherein the encoding mode further comprises a global semantic information mode, the method further comprising: responsive to determining that the encoding mode is the global semantic information mode, extracting the global semantic information in the video image by: inputting the video image into the trained neural network; and acquiring the output of the trained neural network to obtain the global semantic information in the video image. 6. The method according to claim 1 , further comprising: sending a first reference image to the second video communication end by using a preset image transmission mode at intervals of N frames, wherein data volume of the first reference image transmitted in the preset image transmission mode is greater than that of the semantic information, and N is greater than 1; the first reference image is part of a video image captured by the first video communication end, and the first reference image is used to enable the second video communication end to reconstruct the reconstruction image of the video image according to the semantic information and the first reference image. 7. The method according to claim 6 , wherein if the semantic information is the preset object, the first reference image is used to enable the second video communication end to obtain a reconstructed sub-image of the preset object according to the received semantic information, and fuse the reconstructed sub-image with the first reference image to obtain the reconstruction image of the video image. 8. The method according to claim 7 , wherein if the semantic information is the preset object, the method further comprises: acquiring position information of the preset object in the video image; and sending the position information to the second video communication end, wherein the position information is used to enable the second video communication end to fuse the reconstructed sub-image of the preset object with the first reference image according to the position information to obtain the reconstruction image of the video image. 9. The method according to claim 6 , wherein if the semantic information is the global semantic information, the first reference image is used to enable the second video communication end to obtain an initial reconstruction image according to the received semantic information, and fuse the initial reconstruction image with the first reference image to obtain the reconstruction image of the video image. 10. The method according to claim 6 , further comprising: sending a second reference image to the second video communication end by using a preset image transmission mode, wherein the data volume of the second reference image transmitted in the preset image transmission mode is greater than that of the semantic information, the second reference image is at least one of an image of the preset object or an environment image of the first video communication end, and the second reference image is used to enable the second video communication end to reconstruct the reconstruction image of the video image according to the semantic information and the second reference image. 11. A video image transmission method, comprising: receiving semantic information of a video image and a part of feature vectors, wherein the video image is a video image captured by a first video communication end, wherein a trained neural network comprising an encoder comprising a series of one or more convolution layers and a middle layer to sequentially process a sub-image of the video image, and wherein the one or more convolution layers comprise a lower convolution layer whose output is fed to the middle layer, and the part of feature vectors is extracted from the lower convolution layer and the semantic information is output from the middle layer; determining an encoding mode, wherein the encoding mode comprises a preset object mode; inputting the semantic information and the part of the feature vectors into a pre-trained neural network; executing the pre-trained neural network to output a reconstructed sub-image of the preset object; obtaining a reconstruction image of the video image; and displaying the reconstruction image on a display screen of a second video communication end. 12. The method according to claim 11 , wherein reconstructing an image according to the semantic information and the part of the feature vectors to obtain the reconstruction image of the video image comprises: acquiring a first reference image received by using a preset image transmission mode in the most recent time, wherein the first reference image is a video image captured and sent by the first video communication end, and data volume of the first reference image received by using the preset image transmission mode is greater than that of the semantic information; and reconstructing an image accord
Convolutional networks [CNN, ConvNet] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Facial expression recognition · CPC title
Static hand or arm · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.