Analyzing messages with typographic errors due to phonemic spellings using text-to-speech and speech-to-text algorithms
US-2019295527-A1 · Sep 26, 2019 · US
US11488576B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11488576-B2 |
| Application number | US-201916492842-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 21, 2019 |
| Priority date | May 21, 2019 |
| Publication date | Nov 1, 2022 |
| Grant date | Nov 1, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Provided is an artificial intelligence (AI) apparatus for generating a speech having a content-based style, including: a memory configured to store a plurality of TTS (Text-To-Speech) engines; and a processor configured to: obtain image data or text data containing a text, extract at least one content keyword corresponding to the text, determine a speech style based on the extracted content keyword, generate a speech corresponding to the text by using a TTS engine corresponding to the determined speech style among the plurality of TTS engines, and output the generated speech.
Opening claim text (preview).
The invention claimed is: 1. An artificial intelligence (AI) apparatus for generating a speech having a content-based style, the AI apparatus comprising: a memory configured to store a plurality of TTS (Text-To-Speech) engines; and a processor configured to: obtain image data or text data containing a text including a plurality of different handwriting styles, extract at least one content keyword per predetermined unit for the text, the predetermined unit being one of a sentence unit or a word unit included in the text, determine different speech styles, respectively, mapped with the extracted content keywords within the text based on the extracted content keywords within the text and the plurality of different handwriting styles, generate a speech corresponding to the text by using a TTS engine corresponding to each of the determined one or more speech styles among the plurality of TTS engines, and output the generated speech, wherein the generated speech output by the processor includes at least two different voice styles for two different predetermined units within the text, wherein the generated speech output by the processor includes successively outputting first audio for a first predetermined unit of text using a first voice and second audio for a second predetermined unit of text using a second voice different than the first voice, and the first and second predetermined units are adjacent to each other within the text, and wherein the first voice is determined based on a first handwriting style included in the text and the second voice is determined based on a second handwriting style included in the text. 2. The AI apparatus of claim 1 , wherein the processor is configured to select a speech style having a content keyword which is most similar to the extracted content keywords. 3. The AI apparatus of claim 2 , wherein the content keyword includes at least one of an identification keyword for a content of the text, a type keyword indicating a type of the content, or a mood keyword indicating a mood of the content. 4. The AI apparatus of claim 3 , wherein the identification keyword is a keyword for identifying media indicated by the content and includes at least one of a broadcast program title, a movie title, a music title, or a person's name. 5. The AI apparatus of claim 1 , wherein the processor is configured to determine the speech style per the predetermined unit by using priority information between the extracted content keywords. 6. The AI apparatus of claim 1 , wherein each of the plurality of TTS engines includes at least one speech style feature, and wherein the speech style feature includes at least one of a tone, a pitch, a speed, an accent, a speech volume, or a pronunciation. 7. The AI apparatus of claim 1 , wherein the processor is configured to extract the content keyword by using a content keyword extraction model, and wherein at least one of the plurality of TTS engines and the content keyword extraction model is learned by using a machine learning algorithm or a deep learning algorithm. 8. The AI apparatus of claim 1 , wherein the processor is further configured to: obtain image data or text data containing a first text, extract at least one first content keyword corresponding to the first text, determine a text style based on the extracted at least one first content keyword, generate a second text corresponding to the first text by using a text-generating engine corresponding to the determined text style among the plurality of text-generating engines, and output the generated second text. 9. The AI apparatus of claim 8 , wherein the processor is configured to select the text style having a first content keyword which is most similar to the extracted at least one first content keyword. 10. The AI apparatus of claim 9 , wherein the first content keyword includes at least one of a first identification keyword for a content of the first text, a type keyword indicating a type of the content, or a mood keyword indicating a mood of the content. 11. The AI apparatus of claim 10 , wherein the first identification keyword is a keyword for identifying media indicated by the content of the first text and includes at least one of a broadcast program title, a movie title, a music title, or a person's name. 12. The AI apparatus of claim 10 , wherein the processor is configured to determine the text style by extracting a content keyword per predetermined unit for the first text and generate the second text corresponding to the first text based on the text style which is determined per the predetermined unit. 13. The AI apparatus of claim 12 , wherein the processor is configured to determine the text style per the predetermined unit by using priority information between the extracted content keywords. 14. The AI apparatus of claim 8 , wherein each of the plurality of text-generating engines includes at least one text style feature, and wherein the text style feature includes at least one of a text size, a first letter size, an initial consonant size, a font, a color, a pen pressure, a writing speed, an angulated degree, regularity, a horizontal degree, a space between two adjacent lines, or a space between two adjacent letters. 15. The AI apparatus of claim 1 , wherein the at least two different voice styles include a news style of a male or female announcer, a fairy tale style of a voice actor, an entertainment style of a celebrity, or a speech style of a specific actor. 16. The AI apparatus of claim 1 , further comprising: a touch panel configured to receive handwriting inputs; and a speaker configured to output audio, wherein the processor is further configured to: receive, via the touch panel, a plurality of different handwriting inputs from a plurality of users, covert the plurality of different handwriting inputs into a plurality of different speaker voices, each of the plurality of different handwriting inputs corresponding to a different speech style, and output, via the speaker, the plurality of different speaker voices for the plurality of different handwriting inputs. 17. The AI apparatus of claim 1 , wherein the processor is further configured to: assign priorities to the extracted content keywords within the text, and determine a speech style for a unit of the text based on a corresponding priority among the priorities, wherein the priorities are set in an order of a content identification keyword, a content type keyword and a content mood keyword.
Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title
Recognition of textual entities · CPC title
Formatting, i.e. changing of presentation of documents (automatic justification G06F40/189; automatic line break hyphenation G06F40/191) · CPC title
Voice editing, e.g. manipulating the voice of the synthesiser · CPC title
Named entity recognition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.