Document image analysis apparatus, document image analysis method and program thereof
US-2021383106-A1 · Dec 9, 2021 · US
US11798210B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11798210-B2 |
| Application number | US-202017116944-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 9, 2020 |
| Priority date | Dec 9, 2020 |
| Publication date | Oct 24, 2023 |
| Grant date | Oct 24, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed herein are system, method and computer readable storage medium for detecting space suitable for overlaying media content onto an image. The system receives a candidate image which may be an image or a video frame. The candidate image is then input into a neural network. The neural network may output coordinates and one or more dimensions representing one or more bounding boxes for inserting media content into the candidate image. The one or more bounding boxes may be transmitted with a request for a media content item to be displayed in a bounding box. In response to the request the media content item may be received, and the candidate image and the media content item overlaid on top of the candidate image within the bounding box may be displayed.
Opening claim text (preview).
What is claimed is: 1. A computer implemented method for detecting space suitable for overlaying media content onto an image, the method comprising: training a neural network using a plurality of images, each of the plurality of images corresponding to one or more vectors, each vector comprising (1) a coordinate representing a point associated with a bounding box in a corresponding image, and (2) a dimension of the bounding box, wherein each bounding box is suitable for media content overlay; receiving by a media content insertion system a candidate image for a media content overlay; inputting the candidate image into the neural network; outputting, by the neural network, one or more bounding boxes as one or more data structures, each of which comprises at least (1) a coordinate representing a point associated with the bounding box, (2) a dimension of the bounding box, and (3) a probability that the bounding box is located in an area suitable for the media content overlay; for at least one of the one or more bounding boxes, determining whether the probability meets a threshold probability; responsive to determining that the probability meets the threshold probability, determining that the bounding box is a candidate bounding box of the candidate image; receiving, from the neural network, the data structure corresponding to the candidate bounding box of the candidate image; transmitting a request for a media content item to be displayed in the candidate bounding box of the candidate image, the request comprising the dimension of the candidate bounding box; receiving the media content item in response to the request; and causing a display of the candidate image and the media content item overlaid on top of the candidate image within the candidate bounding box. 2. The method of claim 1 , further comprising: receiving the plurality of images; receiving for each of the plurality of images, one or more vectors, each vector comprising a set of coordinates and a set of dimensions, wherein each vector of the one or more vectors represents a particular bounding box; training the neural network using each of the plurality of images and corresponding one or more vectors. 3. The method of claim 1 , wherein receiving the coordinates and the one or more dimensions representing the one or more bounding boxes for inserting the media content into the candidate image comprises receiving for each bounding box: a first coordinate representing a first offset along a horizontal axis of the candidate image and a second coordinate representing a second offset along a vertical axis of the candidate image, a first dimension extending from the first coordinate along the horizontal axis, and a second dimension extending from the second coordinate along the vertical axis. 4. The method of claim 1 , further comprising: in response to the request, receiving a plurality of media content items corresponding to the one or more bounding boxes; identifying from the plurality of media content items, a particular media content item corresponding to a bounding box with the highest probability; and selecting the particular media content item as the media content item. 5. The method of claim 1 , further comprising: determining that the candidate image is a video frame associated with a video content item; retrieving a set of video frames of the video content item, wherein the set of video frames comprises video frames that are played subsequently to the candidate image; inputting each video frame of the set of video frames into the neural network; receiving, from the neural network for each video frame in the set of video frames, corresponding coordinates and corresponding one or more dimensions representing one or more bounding boxes; identifying in each video frame of the set of video frames, a bounding box matching a bounding box in each other video frame within the set of video frames; and including the bounding box matching the bounding box of each other video frame in the request. 6. The method of claim 5 , further comprising causing a display of the set of video frames and the media content item overlaid on top of each of a plurality of subsequent video frames within the bounding box. 7. A system for detecting space suitable for overlaying media content onto an image, the system comprising: memory with instructions encoded thereon; and one or more processors that, when executing the instructions, are caused to perform operations comprising: training a neural network using a plurality of images, each of the plurality of images corresponding to one or more vectors, each vector comprising (1) a coordinate representing a point associated with a bounding box in a corresponding image, and (2) a dimension of the bounding box, wherein each bounding box is suitable for media content overlay; receiving by a media content insertion system a candidate image for a media content overlay; inputting the candidate image into the neural network; outputting, by the neural network, one or more bounding boxes as one or more data structures, each of which comprises at least (1) a coordinate representing a point associated with the bounding box, (2) a dimension of the bounding box, and (3) a probability that the bounding box is located in an area suitable for the media content overlay; for at least one of the one or more bounding boxes, determining whether the probability meets a threshold probability; responsive to determining that the probability meets the threshold probability, determining that the bounding box is a candidate bounding box of the candidate image; receiving, from the neural network, the data structure corresponding to the candidate bounding box of the candidate image; transmitting a request for a media content item to be displayed in the candidate bounding box of the candidate image, the request comprising the dimension of the candidate bounding box; receiving the media content item in response to the request; and causing a display of the candidate image and the media content item overlaid on top of the candidate image within the candidate bounding box. 8. The system of claim 7 , wherein the instructions cause the one or more processors to perform operations comprising: receiving the plurality of images; receiving for each of the plurality of images, one or more vectors, each vector comprising a set of coordinates and a set of dimensions, wherein each vector of the one or more vectors represents a particular bounding box; training the neural network using each of the plurality of images and corresponding one or more vectors. 9. The system of claim 7 , wherein receiving the coordinates and the one or more dimensions representing the one or more bounding boxes for inserting the media content into the candidate image comprises receiving for each bounding box: a first coordinate representing a first offset along a horizontal axis of the candidate image and a second coordinate representing a second offset along a vertical axis of the candidate image, a first dimension extending from the first coordinate along the horizontal axis, and a second dimension extending from the second coordinate along the vertical axis. 10. The system of claim 7 , wherein the instructions cause the one or more processors to perform operations comprising: in response to the request, receiving a plurality of media content items corresponding to the one or more bounding boxes; identifying from the plurality of media content items, a particular media content item corresponding to a bounding box with the highest probability; and selecting the particular media content item as the media content item.
Supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Creating or editing images; Combining images with text · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.