Multi-component viewing tool for contact center agents
US-9880807-B1 · Jan 30, 2018 · US
US10198160B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10198160-B2 |
| Application number | US-201615171705-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 2, 2016 |
| Priority date | Jun 2, 2016 |
| Publication date | Feb 5, 2019 |
| Grant date | Feb 5, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Several approaches are provided for processing audio data to generate transcription data that is supplemented with visual content items. The visual content items may be any type of data that may vary depending upon a particular implementation. Examples of visual content items include, without limitation, images, videos, symbols, etc. Embodiments include adding visual content items to transcription data based upon user input, specialized keywords contained in the transcription data and various correspondences with the audio data, including time-based correspondence and correspondences based upon a common user, storage location or logical entity.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: one or more processors; one or more memories storing instructions which, when processed by the one or more processors, cause the apparatus to: retrieve audio data that represents a plurality of spoken words, cause the audio data to be processed to generate transcription data that provides a textual representation of the audio data, identify one or more specified keywords contained in the transcription data, wherein each specified keyword from the one or more specified keywords indicates a location in the transcription data where a visual content item is to be added to the transcription data, display, via a user interface, the transcription data and visually indicate one or more locations in the transcription data that correspond to the one or more specified keywords, provide user interface controls that allow a user to specify a visual content item or a link to a visual content item for each of the one or more locations in the transcription data, and generate revised transcription data that includes the visual content item or a reference to the visual content item at each of the one or more locations in the transcription data. 2. The apparatus of claim 1 , wherein the transcription data contains visual content identification data that identifies a visual content item to be added to the revised transcription data. 3. The apparatus of claim 2 , wherein the visual content identification data is adjacent to a specified keyword. 4. The apparatus of claim 2 , wherein the visual content identification data is included in the visual content item or was generated by a device that acquired the visual content item. 5. The apparatus of claim 1 , wherein a location in the revised transcription data of the added visual content item or link to the visual content item corresponds to one or more locations of the one or more specified keywords. 6. The apparatus of claim 1 , wherein the user interface is implemented on a client device that is separate from the apparatus. 7. The apparatus of claim 1 , wherein: the audio data corresponds to a plurality of visual content items or a plurality of references to visual content items based upon time, and the user interface controls allow the user to select the visual content item or the reference to the visual content item for each of the one or more locations in the transcription data from the plurality of visual content items or the plurality of references to visual content items. 8. The apparatus of claim 7 , wherein the audio data corresponds to the plurality of visual content items or the plurality of references to the visual content items based upon a time for each visual content item from the plurality of visual content items having a specified time that is within a time range covered by the audio data. 9. The apparatus of claim 1 , wherein: the audio data corresponds to a plurality of visual content items or a plurality of references to visual content items, and the user interface controls allow the user to select the visual content item or the reference to the visual content item for each of the one or more locations in the transcription data from the plurality of visual content items or the plurality of references to visual content items. 10. The apparatus of claim 9 , wherein the audio data corresponds to the plurality of visual content items or the plurality of references to visual content items based upon one or more of a user in common, a storage location in common, or a logical entity in common. 11. The apparatus of claim 1 , wherein the one or more memories store additional instructions which, when processed by the one or more processors, cause the apparatus to: remove the one or more specified keywords from the revised transcription data. 12. The apparatus of claim 1 , wherein the plurality of visual content items includes one or more of one or more images, or one or more video clips. 13. One or more non-transitory computer-readable media storing instructions which, when processed by one or more processors, cause: at a computing device retrieving audio data that represents a plurality of spoken words, causing the audio data to be processed to generate transcription data that provides a textual representation of the audio data, identifying one or more specified keywords contained in the transcription data, wherein each specified keyword from the one or more specified keywords indicates a location in the transcription data where a visual content item is to be added to the transcription data, displaying, via a user interface, the transcription data and visually indicate one or more locations in the transcription data that correspond to the one or more specified keywords, providing user interface controls that allow a user to specify a visual content item or a link to a visual content item for each of the one or more locations in the transcription data, and generating revised transcription data that includes the visual content item or a reference to the visual content item at each of the one or more locations in the transcription data. 14. The one or more non-transitory computer-readable media of claim 13 , wherein the transcription data contains visual content identification data that identifies a visual content item to be added to the revised transcription data. 15. The one or more non-transitory computer-readable media of claim 13 , wherein a location in the revised transcription data of the added visual content item or link to the visual content item corresponds to one or more locations of the one or more specified keywords. 16. The one or more non-transitory computer-readable media of claim 13 , wherein: the audio data corresponds to a plurality of visual content items or a plurality of references to visual content items based upon time, and the user interface controls allow the user to select the visual content item or the reference to the visual content item for each of the one or more locations in the transcription data from the plurality of visual content items or the plurality of references to visual content items. 17. A computer-implemented method comprising: at a computing device retrieving audio data that represents a plurality of spoken words, causing the audio data to be processed to generate transcription data that provides a textual representation of the audio data, identifying one or more specified keywords contained in the transcription data, wherein each specified keyword from the one or more specified keywords indicates a location in the transcription data where a visual content item is to be added to the transcription data, displaying, via a user interface, the transcription data and visually indicate one or more locations in the transcription data that correspond to the one or more specified keywords, providing user interface controls that allow a user to specify a visual content item or a link to a visual content item for each of the one or more locations in the transcription data, and generating revised transcription data that includes the visual content item or a reference to the visual content item at each of the one or more locations in the transcription data. 18. The computer-implemented method of claim 17 , wherein the transcription data contains visual content identification data that identifies a visual content item to be added to the revised transcription data. 19. The computer-implemented method of claim 17 , wherein a location in the revised transcription data of the added visual content
Speech to text systems (G10L15/08 takes precedence) · CPC title
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
Word spotting · CPC title
Selection of displayed objects or displayed text elements (G06F3/0482 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.