Video editing method, apparatus, and device, and storage medium
US-2021264952-A1 · Aug 26, 2021 · US
US11301641B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11301641-B2 |
| Application number | US-201916660407-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 22, 2019 |
| Priority date | Sep 30, 2017 |
| Publication date | Apr 12, 2022 |
| Grant date | Apr 12, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A terminal for generating music may identify, based on execution of scenario recognition, scenarios for images previously received by the terminal. The terminal may generate respective description texts for the scenarios. The terminal may execute keyword-based rhyme matching based on the respective description texts. The terminal may generate respective rhyming lyrics corresponding to the images. The terminal may convert the respective rhyming lyrics corresponding to the images into a speech. The terminal may synthesize the speech with preset background music to obtain image music.
Opening claim text (preview).
What is claimed is: 1. A method for generating music, the method comprising: identifying, by a terminal, based on execution of scenario recognition, scenarios for images previously received by the terminal; generating respective description texts for the scenarios; executing keyword-based rhyme matching based on the respective description texts; generating respective rhyming lyrics corresponding to the images; converting the respective rhyming lyrics corresponding to the images into a speech; and synthesizing the speech with preset background music to obtain image music. 2. The method of claim 1 , wherein the identifying, by the terminal, based on execution of scenario recognition, the scenarios for images previously received by the terminal further comprises: obtaining image features for the images based on a deep learning neural network model; and determining the scenarios for the images based on the image features. 3. The method of claim 2 , wherein generating respective description texts for the scenarios further comprises: generating image descriptions based on the image features and the scenarios for the images to obtain the respective description texts for the scenarios. 4. The method of claim 1 , wherein executing keyword-based rhyme matching based on the respective description texts further comprises: obtaining, from the respective description texts, Chinese pinyins and rhymes, the Chinese pinyins and rhymes corresponding to last words in the respective description texts; and generating the respective rhyming lyrics corresponding to the images based on the Chinese pinyins and the rhymes, wherein the respective rhyming lyrics each have a same corresponding rhyme as the last word in the respective description text. 5. The method of claim 4 , wherein obtaining the Chinese pinyins and rhymes and generating the respective rhyming lyrics corresponding to the images based on the Chinese pinyins and rhymes comprises: arranging the Chinese pinyins corresponding to the last words in the description texts; determining a distribution rule based on the arranged Chinese pinyins; determining the rhymes based on the Chinese pinyins that satisfy the distribution rule; and obtaining the respective rhyming lyrics corresponding to the images from pre-generated lyrics patterns based on the scenarios for the images and the rhymes. 6. The method of claim 5 , wherein obtaining the respective rhyming lyrics corresponding to the images from pre-generated lyrics patterns based on the scenarios for the images and the rhymes comprises: generating image description lyrics based on the respective description text; obtaining supplementary lyrics from the pre-generated lyrics patterns based on the scenarios corresponding to the images and the rhymes; and synthesizing the image description lyrics with the supplementary lyrics to obtain the rhyming lyrics. 7. The method of claim 1 , wherein the images are acquired by the terminal in response to the terminal entering a photographing mode, or the images are obtained from a photo album of the terminal. 8. The method of claim 1 , wherein the converting the respective rhyming lyrics corresponding to the images into a speech further comprises: executing text analysis on the respective rhyming lyrics corresponding to the images to obtain a text analysis result; extracting a linguistic feature from the text analysis result; executing phoneme-level duration prediction and adaptive duration adjustment based on the linguistic feature to obtain a rhythm feature and a part-of-speech feature corresponding to the rhyming lyrics; and generating pronunciations based on a neural network model, the linguistic feature, the rhythm feature, and the part-of-speech feature to obtain the speech. 9. A terminal, comprising: a processor, the processor configured to: identify, based on execution of scenario recognition, scenarios for images previously received by the terminal; generate respective description texts for the scenarios; execute keyword-based rhyme matching based on the respective description texts; generate respective rhyming lyrics corresponding to the images; convert the respective rhyming lyrics corresponding to the images into a speech; and synthesize the speech with preset background music to obtain image music. 10. The terminal of claim 9 , wherein to identify, based on execution of scenario recognition, the scenarios for images previously received by the terminal, the processor is further configured to: obtain image features for the images based on a deep learning neural network model; and determine the scenarios for the images based on the image features. 11. The terminal of claim 10 , wherein to generate respective description texts for the scenarios, the processor is further configured to: generate image descriptions based on the image features and the scenarios for the images to obtain the respective description texts for the scenarios. 12. The terminal of claim 9 , wherein to execute keyword-based rhyme matching based on the respective description texts, the processor is further configured to: obtain, from the respective description texts, Chinese pinyins and rhymes, the Chinese pinyins and rhymes corresponding to last words in the respective description texts; and generate the respective rhyming lyrics corresponding to the images based on the Chinese pinyins and the rhymes, wherein the respective rhyming lyrics each have a same corresponding rhyme as the last word in the respective description text. 13. The terminal of claim 12 , wherein to obtain the Chinese pinyins and rhymes and to generate the respective rhyming lyrics corresponding to the images based on the Chinese pinyins and rhymes, the processor is configured to: arrange the Chinese pinyins corresponding to the last words in the description texts; determine a distribution rule based on the arranged Chinese pinyins; determine the rhyme based on the Chinese pinyins that satisfy the distribution rule; and obtain the respective rhyming lyrics corresponding to the images from pre-generated lyrics patterns based on the scenarios for the images and the rhymes. 14. The terminal of claim 13 , wherein to obtain the respective rhyming lyrics corresponding to the images from pre-generated lyrics patterns based on the scenarios for the images and the rhymes, the processor is configured to: generate image description lyrics based on the respective description text; obtain supplementary lyrics from the pre-generated lyrics patterns based on the scenarios corresponding to the images and the rhymes; and synthesize the image description lyrics with the supplementary lyrics to obtain the rhyming lyrics. 15. The terminal of claim 9 , wherein to convert the respective rhyming lyrics corresponding to the images into a speech, the processor is further configured to: executing text analysis on the respective rhyming lyrics corresponding to the images to obtain a text analysis result; extracting a linguistic feature from the text analysis result; executing phoneme-level duration prediction and adaptive duration adjustment based on the linguistic feature to obtain a rhythm feature and a part-of-speech feature corresponding to the rhyming lyrics; and generating pronunciations based on a neural network model, the linguistic feature, the rhythm feature, and the part-of-speech feature to obtain the speech. 16. A non-transitory computer-readable storage medium, comprising: a plurality of instructions executable by a processor, the instructions comprising: instructions e
Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems · CPC title
Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis · CPC title
Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece (automatically producing a series of tones G10H1/26) · CPC title
Lyrics displays, e.g. for karaoke applications · CPC title
Processing of non-Latin text (kana-to-kanji conversion G06F40/129; vowelisation G06F40/232) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.