Video image transmission method, device, interactive intelligent tablet and storage medium

US12309526B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12309526-B2
Application numberUS-201917417550-A
CountryUS
Kind codeB2
Filing dateDec 24, 2019
Priority dateJan 23, 2019
Publication dateMay 20, 2025
Grant dateMay 20, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present application relates to a video image transmission method, device, an interactive intelligent tablet and a storage medium. The method comprises: acquiring a video image captured by a first video communication end; acquiring semantic information in the video image; and sending the semantic information to a second video communication end, wherein the semantic information is used to reconstruct a reconstruction image of the video image at the second video communication end.

First claim

Opening claim text (preview).

What is claimed is: 1. A video image transmission method, comprising: acquiring a video image captured by a first video communication end; determining an encoding mode, wherein the encoding mode comprises one of a preset object mode; recognizing the preset object in the video image to obtain a sub-image of the preset object; providing the sub-image of the preset object to a trained neural network, wherein the trained neural network comprises an encoder comprising a series of one or more convolution layers and a middle layer to sequentially process the sub-image, and wherein the one or more convolution layers comprise a lower convolution layer whose output is fed to the middle layer; executing the trained neural network to output a part of feature vectors extracted from the lower convolution layer and a low-dimensional vector from the middle layer, the low-dimensional vector representing semantic information of the preset object in the video image; and sending, through a communication network, the part of the feature vectors extracted from the lower convolution layer and the low-dimensional vector representing the semantic information to a second video communication end, wherein the semantic information is used by a decoder to reconstruct a reconstruction image of the video image at the second video communication end. 2. The method according to claim 1 , wherein the trained neural network is configured to recognize image semantic information, and wherein an amount of information of the part of the feature vectors extracted from the lower convolution layer is dynamically adjusted based on a condition of the communication network. 3. The method according to claim 2 , wherein the semantic information in the video image comprises any one of the following: the preset object in the video image, or global semantic information of the video image. 4. The method according to claim 1 , wherein the preset object comprises a human face or a human body, wherein if the preset object is a human face, extracting the semantic information of the preset object in the video image comprises: recognizing a human face area in the video image to obtain a human face sub-image; inputting the human face sub-image into the trained neural network; and acquiring the output of the trained neural network to obtain semantic information of the human face in the video image, and wherein if the preset object is a human body, extracting the semantic information of the preset object in the video image comprises: recognizing a human body area in the video image to obtain a human body sub-image; inputting the human body sub-image into the trained neural network; and acquiring the output of the trained neural network to obtain semantic information of the human body in the video image. 5. The method according to claim 1 , wherein the encoding mode further comprises a global semantic information mode, the method further comprising: responsive to determining that the encoding mode is the global semantic information mode, extracting the global semantic information in the video image by: inputting the video image into the trained neural network; and acquiring the output of the trained neural network to obtain the global semantic information in the video image. 6. The method according to claim 1 , further comprising: sending a first reference image to the second video communication end by using a preset image transmission mode at intervals of N frames, wherein data volume of the first reference image transmitted in the preset image transmission mode is greater than that of the semantic information, and N is greater than 1; the first reference image is part of a video image captured by the first video communication end, and the first reference image is used to enable the second video communication end to reconstruct the reconstruction image of the video image according to the semantic information and the first reference image. 7. The method according to claim 6 , wherein if the semantic information is the preset object, the first reference image is used to enable the second video communication end to obtain a reconstructed sub-image of the preset object according to the received semantic information, and fuse the reconstructed sub-image with the first reference image to obtain the reconstruction image of the video image. 8. The method according to claim 7 , wherein if the semantic information is the preset object, the method further comprises: acquiring position information of the preset object in the video image; and sending the position information to the second video communication end, wherein the position information is used to enable the second video communication end to fuse the reconstructed sub-image of the preset object with the first reference image according to the position information to obtain the reconstruction image of the video image. 9. The method according to claim 6 , wherein if the semantic information is the global semantic information, the first reference image is used to enable the second video communication end to obtain an initial reconstruction image according to the received semantic information, and fuse the initial reconstruction image with the first reference image to obtain the reconstruction image of the video image. 10. The method according to claim 6 , further comprising: sending a second reference image to the second video communication end by using a preset image transmission mode, wherein the data volume of the second reference image transmitted in the preset image transmission mode is greater than that of the semantic information, the second reference image is at least one of an image of the preset object or an environment image of the first video communication end, and the second reference image is used to enable the second video communication end to reconstruct the reconstruction image of the video image according to the semantic information and the second reference image. 11. A video image transmission method, comprising: receiving semantic information of a video image and a part of feature vectors, wherein the video image is a video image captured by a first video communication end, wherein a trained neural network comprising an encoder comprising a series of one or more convolution layers and a middle layer to sequentially process a sub-image of the video image, and wherein the one or more convolution layers comprise a lower convolution layer whose output is fed to the middle layer, and the part of feature vectors is extracted from the lower convolution layer and the semantic information is output from the middle layer; determining an encoding mode, wherein the encoding mode comprises a preset object mode; inputting the semantic information and the part of the feature vectors into a pre-trained neural network; executing the pre-trained neural network to output a reconstructed sub-image of the preset object; obtaining a reconstruction image of the video image; and displaying the reconstruction image on a display screen of a second video communication end. 12. The method according to claim 11 , wherein reconstructing an image according to the semantic information and the part of the feature vectors to obtain the reconstruction image of the video image comprises: acquiring a first reference image received by using a preset image transmission mode in the most recent time, wherein the first reference image is a video image captured and sent by the first video communication end, and data volume of the first reference image received by using the preset image transmission mode is greater than that of the semantic information; and reconstructing an image accord

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Facial expression recognition · CPC title

  • Static hand or arm · CPC title

  • using neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12309526B2 cover?
The present application relates to a video image transmission method, device, an interactive intelligent tablet and a storage medium. The method comprises: acquiring a video image captured by a first video communication end; acquiring semantic information in the video image; and sending the semantic information to a second video communication end, wherein the semantic information is used to rec…
Who is the assignee on this patent?
Guangzhou Shiyuan Electronics Co Ltd, Guangzhou Shizhen Information Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification H04N7/15. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue May 20 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).