Intelligent text-to-speech conversion

US8996376B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-8996376-B2
Application numberUS-9841708-A
CountryUS
Kind codeB2
Filing dateApr 5, 2008
Priority dateApr 5, 2008
Publication dateMar 31, 2015
Grant dateMar 31, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for improved text-to-speech processing are disclosed. The improved text-to-speech processing can convert text from an electronic document into an audio output that includes speech associated with the text as well as audio contextual cues. One aspect provides audio contextual cues to the listener when outputting speech (spoken text) pertaining to a document. The audio contextual cues can be based on an analysis of a document prior to a text-to-speech conversion. Another aspect can produce an audio summary for a file. The audio summary for a document can thereafter be presented to a user so that the user can hear a summary of the document without having to process the document to produce its spoken text via text-to-speech conversion.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of converting text to speech, the method comprising: selecting a document to be converted to speech, the selected document including base text and one or more links located within the base text; parsing the selected document, wherein the parsing comprises: resolving at least one of the one or more links in the selected document; and retrieving pre-existing text from one or more documents obtained by said resolving; appending at least a portion of the retrieved pre-existing text to the base text; generating speech by converting to speech the base text and the portion of the retrieved pre-existing text appended to the base text; and creating an audio file based on the converted text, wherein the audio file includes at least one audio cue configured to be beneficial to visually impaired listeners. 2. The computer-implemented method of claim 1 , wherein the at least one audio cue is associated with a text element in the selected document. 3. The computer-implemented method of claim 1 , wherein the at least one audio cue is associated with a non-text element in the selected document. 4. The computer-implemented method of claim 1 , wherein parsing the selected document comprises: identifying one or more text elements in the selected document, including: determining a first subset of the one or more identified text elements, wherein the first subset of the one or more identified text elements includes one or more spoken text elements; and determining a second subset of the identified text elements, wherein the second subset of the identified text elements includes one or more non-spoken text elements, wherein the second subset of the one or more identified text elements is excluded from the first subset of the identified text elements; and determining an order in which to speak the first subset of the one or more identified text elements. 5. The computer-implemented method of claim 1 , further comprising: storing the created audio file at a host computer for use by a media management application operating on the host computer. 6. The computer-implemented method of claim 5 , further comprising: copying the audio file from the host computer to a portable media player where the audio file is stored on the portable media player in a predetermined organization. 7. The computer-implemented method of claim 1 , wherein the selected document is selected from the group consisting of: an audio file, a webpage, a PDF, a text file, an RSS feed, an e-mail, a list of e-mail headers, a list of hyperlinks, and metadata information. 8. A computer-implemented method of generating an audio summary for a document, the method comprising: parsing a document to extract metadata from the document; generating an audio summary for the parsed document based on the extracted metadata; and associating the audio summary with the parsed document, wherein the associating the audio summary to the parsed document includes at least embedding the audio summary into the parsed document. 9. The computer-implemented method of claim 8 , wherein generating the audio summary for the parsed document comprises summarizing the parsed document based on textual information contained in the document. 10. The computer-implemented method of claim 8 , further comprising: after associating the audio summary with the parsed document, receiving a user selection of the parsed document; and presenting the audio summary for the parsed document. 11. The computer-implemented method of claim 10 , wherein the parsed document is selected from the group consisting of: an audio file, a webpage, a PDF, a text file, an RSS feed, an e-mail, a list of e-mail headers, a list of hyperlinks, and metadata information. 12. The computer-implemented method of claim 10 , wherein user selection of the parsed document occurs upon a mouse-over event or a mouse-click event. 13. The computer-implemented method of claim 10 , wherein user selection of the parsed document occurs when the parsed document is selected using a portable media player. 14. The computer-implemented method of claim 8 , wherein the audio summary includes audio content generated by converting at least a portion of the extracted metadata to speech. 15. A computer-implemented method of generating an audio summary for a document, the method comprising: parsing a document to extract metadata from the document; generating an audio summary for the parsed document based on the extracted metadata; and associating the audio summary with the parsed document by creating a software pointer from the parsed document to the audio summary, and embedding the software pointer into the parsed document. 16. The computer-implemented method of claim 15 , wherein the audio summary includes audio content generated by converting at least a portion of the extracted metadata to speech. 17. A non-transitory computer readable storage medium including at least computer program code for converting text to speech, comprising: computer program code for selecting a document to be converted to speech, the selected document including base text and one or more links located within the base text; computer program code for parsing the selected document, wherein the computer program code for parsing comprises: computer program code for resolving at least one of the one or more links in the selected document; and computer program code for retrieving pre-existing text from one or more documents obtained by the said resolving; computer program code for appending at least a portion of the retrieved pre-existing text to the base text; computer program code for generating speech by converting to speech the base text and the portion of the retrieved pre-existing text appended to the base text; and computer program code for creating an audio file based on the converted text, wherein the audio file includes at least one audio cue configured to be beneficial to visually impaired listeners. 18. The non-transitory computer readable storage medium as recited in claim 17 , further comprising: computer program code for copying the audio file from a host computer to a portable media player where the audio file is stored on the portable media player in a predetermined organization.

Assignees

Inventors

Classifications

  • Parsing · CPC title

  • G10L13/00Primary

    Speech synthesis; Text to speech systems · CPC title

  • Audio watermarking, i.e. embedding inaudible data in the audio signal · CPC title

  • G10L13/027Primary

    Concept to speech synthesisers; Generation of natural phrases from machine-based concepts (generation of parameters for speech synthesis out of text G10L13/08) · CPC title

  • Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US8996376B2 cover?
Techniques for improved text-to-speech processing are disclosed. The improved text-to-speech processing can convert text from an electronic document into an audio output that includes speech associated with the text as well as audio contextual cues. One aspect provides audio contextual cues to the listener when outputting speech (spoken text) pertaining to a document. The audio contextual cues …
Who is the assignee on this patent?
Fleizach Christopher Brian, Hudson Reginald Dean, Apple Inc
What technology area does this patent fall under?
Primary CPC classification G10L13/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 31 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).