Method and apparatus for generating music

US11301641B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11301641-B2
Application numberUS-201916660407-A
CountryUS
Kind codeB2
Filing dateOct 22, 2019
Priority dateSep 30, 2017
Publication dateApr 12, 2022
Grant dateApr 12, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A terminal for generating music may identify, based on execution of scenario recognition, scenarios for images previously received by the terminal. The terminal may generate respective description texts for the scenarios. The terminal may execute keyword-based rhyme matching based on the respective description texts. The terminal may generate respective rhyming lyrics corresponding to the images. The terminal may convert the respective rhyming lyrics corresponding to the images into a speech. The terminal may synthesize the speech with preset background music to obtain image music.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for generating music, the method comprising: identifying, by a terminal, based on execution of scenario recognition, scenarios for images previously received by the terminal; generating respective description texts for the scenarios; executing keyword-based rhyme matching based on the respective description texts; generating respective rhyming lyrics corresponding to the images; converting the respective rhyming lyrics corresponding to the images into a speech; and synthesizing the speech with preset background music to obtain image music. 2. The method of claim 1 , wherein the identifying, by the terminal, based on execution of scenario recognition, the scenarios for images previously received by the terminal further comprises: obtaining image features for the images based on a deep learning neural network model; and determining the scenarios for the images based on the image features. 3. The method of claim 2 , wherein generating respective description texts for the scenarios further comprises: generating image descriptions based on the image features and the scenarios for the images to obtain the respective description texts for the scenarios. 4. The method of claim 1 , wherein executing keyword-based rhyme matching based on the respective description texts further comprises: obtaining, from the respective description texts, Chinese pinyins and rhymes, the Chinese pinyins and rhymes corresponding to last words in the respective description texts; and generating the respective rhyming lyrics corresponding to the images based on the Chinese pinyins and the rhymes, wherein the respective rhyming lyrics each have a same corresponding rhyme as the last word in the respective description text. 5. The method of claim 4 , wherein obtaining the Chinese pinyins and rhymes and generating the respective rhyming lyrics corresponding to the images based on the Chinese pinyins and rhymes comprises: arranging the Chinese pinyins corresponding to the last words in the description texts; determining a distribution rule based on the arranged Chinese pinyins; determining the rhymes based on the Chinese pinyins that satisfy the distribution rule; and obtaining the respective rhyming lyrics corresponding to the images from pre-generated lyrics patterns based on the scenarios for the images and the rhymes. 6. The method of claim 5 , wherein obtaining the respective rhyming lyrics corresponding to the images from pre-generated lyrics patterns based on the scenarios for the images and the rhymes comprises: generating image description lyrics based on the respective description text; obtaining supplementary lyrics from the pre-generated lyrics patterns based on the scenarios corresponding to the images and the rhymes; and synthesizing the image description lyrics with the supplementary lyrics to obtain the rhyming lyrics. 7. The method of claim 1 , wherein the images are acquired by the terminal in response to the terminal entering a photographing mode, or the images are obtained from a photo album of the terminal. 8. The method of claim 1 , wherein the converting the respective rhyming lyrics corresponding to the images into a speech further comprises: executing text analysis on the respective rhyming lyrics corresponding to the images to obtain a text analysis result; extracting a linguistic feature from the text analysis result; executing phoneme-level duration prediction and adaptive duration adjustment based on the linguistic feature to obtain a rhythm feature and a part-of-speech feature corresponding to the rhyming lyrics; and generating pronunciations based on a neural network model, the linguistic feature, the rhythm feature, and the part-of-speech feature to obtain the speech. 9. A terminal, comprising: a processor, the processor configured to: identify, based on execution of scenario recognition, scenarios for images previously received by the terminal; generate respective description texts for the scenarios; execute keyword-based rhyme matching based on the respective description texts; generate respective rhyming lyrics corresponding to the images; convert the respective rhyming lyrics corresponding to the images into a speech; and synthesize the speech with preset background music to obtain image music. 10. The terminal of claim 9 , wherein to identify, based on execution of scenario recognition, the scenarios for images previously received by the terminal, the processor is further configured to: obtain image features for the images based on a deep learning neural network model; and determine the scenarios for the images based on the image features. 11. The terminal of claim 10 , wherein to generate respective description texts for the scenarios, the processor is further configured to: generate image descriptions based on the image features and the scenarios for the images to obtain the respective description texts for the scenarios. 12. The terminal of claim 9 , wherein to execute keyword-based rhyme matching based on the respective description texts, the processor is further configured to: obtain, from the respective description texts, Chinese pinyins and rhymes, the Chinese pinyins and rhymes corresponding to last words in the respective description texts; and generate the respective rhyming lyrics corresponding to the images based on the Chinese pinyins and the rhymes, wherein the respective rhyming lyrics each have a same corresponding rhyme as the last word in the respective description text. 13. The terminal of claim 12 , wherein to obtain the Chinese pinyins and rhymes and to generate the respective rhyming lyrics corresponding to the images based on the Chinese pinyins and rhymes, the processor is configured to: arrange the Chinese pinyins corresponding to the last words in the description texts; determine a distribution rule based on the arranged Chinese pinyins; determine the rhyme based on the Chinese pinyins that satisfy the distribution rule; and obtain the respective rhyming lyrics corresponding to the images from pre-generated lyrics patterns based on the scenarios for the images and the rhymes. 14. The terminal of claim 13 , wherein to obtain the respective rhyming lyrics corresponding to the images from pre-generated lyrics patterns based on the scenarios for the images and the rhymes, the processor is configured to: generate image description lyrics based on the respective description text; obtain supplementary lyrics from the pre-generated lyrics patterns based on the scenarios corresponding to the images and the rhymes; and synthesize the image description lyrics with the supplementary lyrics to obtain the rhyming lyrics. 15. The terminal of claim 9 , wherein to convert the respective rhyming lyrics corresponding to the images into a speech, the processor is further configured to: executing text analysis on the respective rhyming lyrics corresponding to the images to obtain a text analysis result; extracting a linguistic feature from the text analysis result; executing phoneme-level duration prediction and adaptive duration adjustment based on the linguistic feature to obtain a rhythm feature and a part-of-speech feature corresponding to the rhyming lyrics; and generating pronunciations based on a neural network model, the linguistic feature, the rhythm feature, and the part-of-speech feature to obtain the speech. 16. A non-transitory computer-readable storage medium, comprising: a plurality of instructions executable by a processor, the instructions comprising: instructions e

Assignees

Inventors

Classifications

  • Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems · CPC title

  • Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis · CPC title

  • G10H1/0025Primary

    Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece (automatically producing a series of tones G10H1/26) · CPC title

  • Lyrics displays, e.g. for karaoke applications · CPC title

  • Processing of non-Latin text (kana-to-kanji conversion G06F40/129; vowelisation G06F40/232) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11301641B2 cover?
A terminal for generating music may identify, based on execution of scenario recognition, scenarios for images previously received by the terminal. The terminal may generate respective description texts for the scenarios. The terminal may execute keyword-based rhyme matching based on the respective description texts. The terminal may generate respective rhyming lyrics corresponding to the image…
Who is the assignee on this patent?
Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10H1/0025. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 12 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).