Dynamic split-frame preview of video editing effects
US-9208819-B1 · Dec 8, 2015 · US
US10134440B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10134440-B2 |
| Application number | US-201113099391-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 3, 2011 |
| Priority date | May 3, 2011 |
| Publication date | Nov 20, 2018 |
| Grant date | Nov 20, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for producing an audio-visual slideshow for a video sequence having an audio soundtrack and a corresponding video track including a time sequence of image frames, comprising: segmenting the audio soundtrack into a plurality of audio segments; subdividing the audio segments into a sequence of audio frames; determining a corresponding audio classification for each audio frame; automatically selecting a subset of the audio segments responsive to the audio classification for the corresponding audio frames; for each of the selected audio segments automatically analyzing the corresponding image frames to select one or more key image frames; merging the selected audio segments to form an audio summary; forming an audio-visual slideshow by combining the selected key frames with the audio summary, wherein the selected key frames are displayed synchronously with their corresponding audio segment; and storing the audio-visual slideshow in a processor-accessible storage memory.
Opening claim text (preview).
The invention claimed is: 1. A method for producing an audio-visual slideshow from a video, comprising: receiving a video sequence, the video sequence comprising image frames and a corresponding audio soundtrack; dividing the audio soundtrack into audio frames, wherein the audio frames are divided based on a predefined time interval; extracting an audio feature vector from each of the audio frames; applying an audio classification model to the audio feature vectors, wherein the audio classification model determines a corresponding audio classification for each of the audio frames; using a clustering algorithm to form audio frame clusters, wherein the audio frame clusters comprise audio frames having a same corresponding audio classification; selecting an audio frame from each of the audio frame clusters; segmenting the audio soundtrack into audio segments using a change detection operation; selecting the audio segments that contain the selected audio frames; identifying a subset of the selected audio segments, wherein the subset of the selected audio segments includes selected audio frames from a diverse set of audio frame clusters; determining which of the image frames correspond to the selected subset of audio segments; selecting key image frames from the image frames corresponding to the selected subset of audio segments, wherein the selected number of key image frames is less than the total number of image frames that correspond to the selected subset of audio segments; merging the selected subset of audio segments to form an audio summary; combining the selected key image frames with the audio summary; and displaying the selected key image frames synchronously with their corresponding audio segments. 2. The method of claim 1 , wherein the audio classification for each audio frame is determined using one or more audio classification models trained using a ground-truth data set. 3. The method of claim 2 , wherein the one or more audio classification models comprises a support vector machine (SVM) model. 4. The method of claim 2 , wherein a set of audio classification models are used to determine classification scores for each of a predetermined subset of the diverse set of audio frame clusters. 5. The method of claim 1 , wherein the clustering algorithm comprises a K-means algorithm. 6. The method of claim 1 , wherein identifying the subset of the selected audio segments includes: for each audio frame cluster, selecting an audio frame corresponding to each relevant audio classification; and selecting the audio segments that include the selected audio frames. 7. The method of claim 1 , wherein the change detection operation to identify identifies appropriate audio segment boundaries corresponding to substantial changes in audio characteristics. 8. The method of claim 7 , wherein applying the change detection operation comprises applying a Bayesian information criterion. 9. The method of claim 1 , further comprising expanding the selected audio segments by appending to the selected audio segments one or more other audio segments having similar audio characteristics to the selected audio segments. 10. The method of claim 1 , wherein selecting the key image frames comprises: identifying an image frame subset corresponding to a particular audio segment; determining one or more visual quality scores for each of the image frames in the image frame subset; and selecting one or more key image frames from the image frame subset responsive to the one or more visual quality scores. 11. The method of claim 10 , wherein the image frame subset includes a sampling of the image frames corresponding to the particular audio segment. 12. The method of claim 10 , wherein the one or more visual quality scores include a facial quality score and an overall image quality score. 13. The method of claim 12 , wherein a determination of the facial quality score for a particular image frame comprises: analyzing the particular image frame using a face detection process to detect the presence of any faces; determining visual feature vectors for the detected presence of faces; and determining the facial quality score responsive to the visual feature vectors. 14. The method of claim 10 , wherein the key image frames are selected according to a visual diversity criterion. 15. The method of claim 10 , wherein selecting the one or more key image frames from the image frame subset responsive to the one or more visual quality scores comprises: identifying a set of candidate key image frames having the highest visual quality scores; determining a visual feature vector for each of the candidate key image frames; computing visual distance values between the candidate key image values responsive to the visual feature values; and selecting a subset of the candidate key image frames to be the key image frames responsive to the visual distance values. 16. The method of claim 15 , wherein the selected number of key image frames are selected such that each of the selected number of key image frames are separated by a visual distance value that exceeds a predefined threshold visual distance value. 17. The method of claim 1 , wherein the selected number of key image frames are sorted into chronological order. 18. The method of claim 1 , wherein combining the selected key image frames with the audio summary forms an audio-visual slideshow and the audio-visual slideshow is stored in a video file using a video file format adapted to be played using a standard video player. 19. The method of claim 1 , wherein each of the selected number of key image frames is displayed for a time interval, and wherein the time interval is determined by dividing the length of the selected audio segment by the selected number of key image frames for the respective selected audio segment.
on discs (G11B27/036, G11B27/038 take precedence) · CPC title
by using information not detectable on the record carrier · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.