Video processing method and apparatus, device, and medium
US-2024402902-A1 · Dec 5, 2024 · US
US9734867B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9734867-B2 |
| Application number | US-201113069136-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 22, 2011 |
| Priority date | Mar 22, 2011 |
| Publication date | Aug 15, 2017 |
| Grant date | Aug 15, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In accordance with an embodiment of the present invention, a method for inserting secondary content into a media stream includes dividing the media stream having a plurality of frames into a plurality of shots at a processor. The method further includes grouping consecutive shots from the plurality of shots into a plurality of scenes. A first list of insertion points is generated for introducing the secondary content. The insertion points of the first list are boundaries between consecutive scenes in the plurality of scenes. An average insertion point saliency of the media stream is generated at the insertion points in the first list. A second list of insertion points is then generated. The insertion points in the second list are arranged to maximize a function of the average insertion point saliency and a distance between each insertion point in the second list with other insertion points in the second list.
Opening claim text (preview).
What is claimed is: 1. A method for inserting secondary content into a media stream having primary content, the method comprising: at a processor, dividing the media stream comprising a plurality of frames into a plurality of shots; assigning visual concept labels to shots of the plurality of shots to produce a sequence of visual concept labels; grouping consecutive shots from the plurality of shots into a plurality of scenes, each scene comprising a cluster of interrelated shots in accordance with the sequence of visual concept labels; generating a first list of insertion points between the frames for introducing the secondary content, wherein the insertion points of the first list are boundaries between consecutive scenes in the plurality of scenes; generating an average insertion point saliency of the media stream at the insertion points of the first list; generating a second list of insertion points between the frames, wherein the insertion points are arranged in the second list to maximize a function of the average insertion point saliency and a distance in frames between each insertion point in the second list with other insertion points in the second list, and wherein the function is: ∑ I j ∈ Ins dist ( I i , I j ) · X ( I i ) , ∀ I i ∈ Ins , wherein dist(I i ,I j ) is a metric for a distance between a first insertion point I i and a second insertion point I j , wherein Ins is the first list of insertion points, and wherein X(I i ) is the average insertion point saliency; wherein generating the average insertion point saliency of the media stream at the insertion points in the first list comprises selecting a first insertion point from the first list having a highest value of the function as the first insertion point of the second list; and inserting one or more other media streams into the media stream in accordance with an insertion point order in the second list. 2. The method of claim 1 , further comprising: determining a distance between each possible insertion point with other insertion points in the first list. 3. The method of claim 1 , wherein generating the average insertion point saliency of the media stream comprises: generating a video frame saliency for each frame within each shot of the plurality of the shots forming the boundaries between consecutive scenes in the plurality of scenes; generating an attention factor caused by camera motion; and scaling the video frame saliency with the attention factor to generate a visual frame saliency. 4. The method of claim 3 , wherein generating the average insertion point saliency of the media stream further comprises: generating an audio frame saliency for each frame within each shot of the plurality of the shots forming the boundaries between consecutive scenes in the plurality of scenes. 5. The method of claim 4 , wherein generating an average insertion point saliency of the media stream further comprises: generating an audio-video frame saliency by combining the audio frame saliency with the visual frame saliency; computing a shot saliency by averaging the combined audio-video frame saliency over all frames of each shot of the plurality of the shots forming the boundaries between consecutive scenes in the plurality of scenes; and computing the average insertion point saliency by averaging the shot saliency at the insertion point. 6. The method of claim 5 , wherein combining the audio frame saliency with the visual frame saliency comprises: normalizing the audio frame saliency for each frame; normalizing the visual frame saliency for each frame; and linearly combining the normalized audio frame saliency and the normalized visual frame saliency. 7. The method of claim 1 , wherein the distance between a first insertion point I i and a second insertion point I j is: dist( I i ,I j )=exp[λ·( d ( I i ,I j )− d )/ L ], where d is an average number of frames between two nearby insertion points in an uniform sampling of insertion point pairs in the first list of insertion points, d(I i ,I j ) represents a number of frames between the first and the second insertion points I i and I j , L is a total number of frames in the media stream, and lamda λ is a variance constant. 8. The method of claim 1 , wherein generating the second list of insertion points further comprises: computing a second function, wherein the second function is a sum of the average insertion point saliency-weighted distance of the insertion point in the first list with other insertion points in the second list; and selecting a second insertion point from the first list having a highest rank of the second function as the second insertion point of the second list. 9. The method of claim 8 , wherein the second function is: ∑ I i ∈ SIns dist ( I i , I j ) · X ( I i ) , ∀ I i ∈ Ins , I j ∈ SIns
Processing of audio elementary streams {(monitoring, identification or recognition of audio in broadcast systems H04H60/58)} · CPC title
involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement · CPC title
Electronic editing of digitised analogue information signals, e.g. audio or video signals · CPC title
by using information signals recorded by the same method as the main recording {(G11B27/22 takes precedence)} · CPC title
by decomposing the content in the time domain, e.g. in time segments · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.