Artificial intelligence apparatus for generating text or speech having content-based style and method for the same

US11488576B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11488576-B2
Application numberUS-201916492842-A
CountryUS
Kind codeB2
Filing dateMay 21, 2019
Priority dateMay 21, 2019
Publication dateNov 1, 2022
Grant dateNov 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided is an artificial intelligence (AI) apparatus for generating a speech having a content-based style, including: a memory configured to store a plurality of TTS (Text-To-Speech) engines; and a processor configured to: obtain image data or text data containing a text, extract at least one content keyword corresponding to the text, determine a speech style based on the extracted content keyword, generate a speech corresponding to the text by using a TTS engine corresponding to the determined speech style among the plurality of TTS engines, and output the generated speech.

First claim

Opening claim text (preview).

The invention claimed is: 1. An artificial intelligence (AI) apparatus for generating a speech having a content-based style, the AI apparatus comprising: a memory configured to store a plurality of TTS (Text-To-Speech) engines; and a processor configured to: obtain image data or text data containing a text including a plurality of different handwriting styles, extract at least one content keyword per predetermined unit for the text, the predetermined unit being one of a sentence unit or a word unit included in the text, determine different speech styles, respectively, mapped with the extracted content keywords within the text based on the extracted content keywords within the text and the plurality of different handwriting styles, generate a speech corresponding to the text by using a TTS engine corresponding to each of the determined one or more speech styles among the plurality of TTS engines, and output the generated speech, wherein the generated speech output by the processor includes at least two different voice styles for two different predetermined units within the text, wherein the generated speech output by the processor includes successively outputting first audio for a first predetermined unit of text using a first voice and second audio for a second predetermined unit of text using a second voice different than the first voice, and the first and second predetermined units are adjacent to each other within the text, and wherein the first voice is determined based on a first handwriting style included in the text and the second voice is determined based on a second handwriting style included in the text. 2. The AI apparatus of claim 1 , wherein the processor is configured to select a speech style having a content keyword which is most similar to the extracted content keywords. 3. The AI apparatus of claim 2 , wherein the content keyword includes at least one of an identification keyword for a content of the text, a type keyword indicating a type of the content, or a mood keyword indicating a mood of the content. 4. The AI apparatus of claim 3 , wherein the identification keyword is a keyword for identifying media indicated by the content and includes at least one of a broadcast program title, a movie title, a music title, or a person's name. 5. The AI apparatus of claim 1 , wherein the processor is configured to determine the speech style per the predetermined unit by using priority information between the extracted content keywords. 6. The AI apparatus of claim 1 , wherein each of the plurality of TTS engines includes at least one speech style feature, and wherein the speech style feature includes at least one of a tone, a pitch, a speed, an accent, a speech volume, or a pronunciation. 7. The AI apparatus of claim 1 , wherein the processor is configured to extract the content keyword by using a content keyword extraction model, and wherein at least one of the plurality of TTS engines and the content keyword extraction model is learned by using a machine learning algorithm or a deep learning algorithm. 8. The AI apparatus of claim 1 , wherein the processor is further configured to: obtain image data or text data containing a first text, extract at least one first content keyword corresponding to the first text, determine a text style based on the extracted at least one first content keyword, generate a second text corresponding to the first text by using a text-generating engine corresponding to the determined text style among the plurality of text-generating engines, and output the generated second text. 9. The AI apparatus of claim 8 , wherein the processor is configured to select the text style having a first content keyword which is most similar to the extracted at least one first content keyword. 10. The AI apparatus of claim 9 , wherein the first content keyword includes at least one of a first identification keyword for a content of the first text, a type keyword indicating a type of the content, or a mood keyword indicating a mood of the content. 11. The AI apparatus of claim 10 , wherein the first identification keyword is a keyword for identifying media indicated by the content of the first text and includes at least one of a broadcast program title, a movie title, a music title, or a person's name. 12. The AI apparatus of claim 10 , wherein the processor is configured to determine the text style by extracting a content keyword per predetermined unit for the first text and generate the second text corresponding to the first text based on the text style which is determined per the predetermined unit. 13. The AI apparatus of claim 12 , wherein the processor is configured to determine the text style per the predetermined unit by using priority information between the extracted content keywords. 14. The AI apparatus of claim 8 , wherein each of the plurality of text-generating engines includes at least one text style feature, and wherein the text style feature includes at least one of a text size, a first letter size, an initial consonant size, a font, a color, a pen pressure, a writing speed, an angulated degree, regularity, a horizontal degree, a space between two adjacent lines, or a space between two adjacent letters. 15. The AI apparatus of claim 1 , wherein the at least two different voice styles include a news style of a male or female announcer, a fairy tale style of a voice actor, an entertainment style of a celebrity, or a speech style of a specific actor. 16. The AI apparatus of claim 1 , further comprising: a touch panel configured to receive handwriting inputs; and a speaker configured to output audio, wherein the processor is further configured to: receive, via the touch panel, a plurality of different handwriting inputs from a plurality of users, covert the plurality of different handwriting inputs into a plurality of different speaker voices, each of the plurality of different handwriting inputs corresponding to a different speech style, and output, via the speaker, the plurality of different speaker voices for the plurality of different handwriting inputs. 17. The AI apparatus of claim 1 , wherein the processor is further configured to: assign priorities to the extracted content keywords within the text, and determine a speech style for a unit of the text based on a corresponding priority among the priorities, wherein the priorities are set in an order of a content identification keyword, a content type keyword and a content mood keyword.

Assignees

Inventors

Classifications

  • Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title

  • G06F40/279Primary

    Recognition of textual entities · CPC title

  • Formatting, i.e. changing of presentation of documents (automatic justification G06F40/189; automatic line break hyphenation G06F40/191) · CPC title

  • Voice editing, e.g. manipulating the voice of the synthesiser · CPC title

  • Named entity recognition · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11488576B2 cover?
Provided is an artificial intelligence (AI) apparatus for generating a speech having a content-based style, including: a memory configured to store a plurality of TTS (Text-To-Speech) engines; and a processor configured to: obtain image data or text data containing a text, extract at least one content keyword corresponding to the text, determine a speech style based on the extracted content key…
Who is the assignee on this patent?
Lg Electronics Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/279. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).