Editing text in video captions
US-10917607-B1 · Feb 9, 2021 · US
US11272137B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11272137-B1 |
| Application number | US-202117170314-A |
| Country | US |
| Kind code | B1 |
| Filing date | Feb 8, 2021 |
| Priority date | Oct 14, 2019 |
| Publication date | Mar 8, 2022 |
| Grant date | Mar 8, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This disclosure describes techniques that include modifying text associated with a sequence of images or a video sequence to thereby generate new text and overlaying the new text as captions in the video sequence. In one example, this disclosure describes a method that includes receiving a sequence of images associated with a scene occurring over a time period; receiving audio data of speech uttered during the time period; transcribing into text the audio data of the speech, wherein the text includes a sequence of original words; associating a timestamp with each of the original words during the time period; generating, responsive to input, a sequence of new words; and generating a new sequence of images by overlaying each of the new words on one or more of the images.
Opening claim text (preview).
What is claimed is: 1. A system comprising a storage system and processing circuitry having access to the storage system, wherein the processing circuitry is configured to: receive audio data associated with a scene occurring over a time period, wherein the audio data includes data representing speech uttered during the time period; transcribe the audio data of the speech into text, wherein the text includes a sequence of original words; associate a timestamp with each of the original words during the time period; receive, responsive to user input, a sequence of new words; and associate a timestamp with each of the new words in the sequence of new words by using the timestamps associated with the original words to determine a corresponding time during the time period for each of the new words. 2. The system of claim 1 , wherein the processing circuitry is further configured to: present the sequence of new words according to the timestamp associated with each of the new words. 3. The system of claim 2 , wherein to receive audio data, the processing circuitry is further configured to receive a sequence of images and the audio data occurring over the time period; and wherein to present the sequence of new words, the processing circuitry is further configured to generate a new sequence of images with each of the new words overlaid on one or more of the images according to the timestamp associated with each of the new words. 4. The system of claim 3 , wherein the sequence of original words includes a plurality of unchanged original words representing original words not changed in the sequence of new words, and wherein to generate the new sequence of images, the processing circuitry is further configured to: overlay each of the unchanged original words as a caption on one or more of the images in the sequence of images based on the respective timestamps associated with the unchanged original words. 5. The system of claim 1 , wherein to receive the sequence of new words, the processing circuitry is further configured to: receive, for each of the new words, information identifying one or more original words that the new word replaces. 6. The system of claim 5 , wherein to associate a timestamp with each of the new words, the processing circuitry is further configured to: associate the timestamp of the one or more original words that the new word replaces. 7. The system of claim 1 , wherein to receive the sequence of new words, the processing circuitry is further configured to: receive a sequence of foreign language words by translating the sequence of original words. 8. The system of claim 1 , wherein to associate the timestamp with each of the original words, the processing circuitry is further configured to: associate, for each of the original words, a starting timestamp corresponding to the start of the original word during the time period and an ending timestamp corresponding to the end of the original word during the time period. 9. The system of claim 1 , wherein to associate the timestamp with each of the original words, the processing circuitry is further configured to: associate each of the original words with a time period corresponding to a pause between each of the original words. 10. The system of claim 1 , wherein to receive the sequence of new words, the processing circuitry is further configured to receive a sequence of words in which a deleted original word has been removed from the sequence of original words; and wherein to associate each of the new words with one or more corresponding original words, the processing circuitry is further configured to associate each of the new words with one or more original words without associating any of the new words with the deleted original word. 11. A method comprising: receiving, by a computing system, audio data associated with a scene occurring over a time period, wherein the audio data includes data representing speech uttered during the time period; transcribing, by the computing system, the audio data of the speech into text, wherein the text includes a sequence of original words; associating, by the computing system, a timestamp with each of the original words during the time period; receiving, by the computing system and responsive to user input, a sequence of new words; and associating, by the computing system, a timestamp with each of the new words in the sequence of new words by using the timestamps associated with the original words to determine a corresponding time during the time period for each of the new words. 12. The method of claim 11 , further comprising: presenting, by the computing system, the sequence of new words according to the timestamp associated with each of the new words. 13. The method of claim 12 , wherein receiving audio data includes receiving a sequence of images and the audio data occurring over the time period, and wherein presenting the sequence of new words includes: generating a new sequence of images with each of the new words overlaid on one or more of the images according to the timestamp associated with each of the new words. 14. The method of claim 13 , wherein the sequence of original words includes a plurality of unchanged original words representing original words not changed in the sequence of new words, and wherein generating the new sequence of images includes: overlaying each of the unchanged original words as a caption on one or more of the images in the sequence of images based on the respective timestamps associated with the unchanged original words. 15. The method of claim 11 , wherein receiving the sequence of new words includes: receiving, for each of the new words, information identifying one or more original words that the new word replaces. 16. The method of claim 15 , wherein associating a timestamp with each of the new words includes: associating the timestamp of the one or more original words that the new word replaces. 17. The method of claim 11 , wherein receiving the sequence of new words includes: receiving a sequence of foreign language words by translating the sequence of original words. 18. The method of claim 11 , wherein associating the timestamp with each of the original words includes: associating, for each of the original words, a starting timestamp corresponding to the start of the original word during the time period and an ending timestamp corresponding to the end of the original word during the time period. 19. The method of claim 11 , wherein associating the timestamp with each of the original words includes: associating each of the original words with a time period corresponding to a pause between each of the original words. 20. A non-transitory computer-readable storage medium comprising instructions that, when executed, configure processing circuitry of a computing system to: receive audio data associated with a scene occurring over a time period, wherein the audio data includes data representing speech uttered during the time period; transcribe the audio data of the speech into text, wherein the text includes a sequence of original words; associate a timestamp with each of the original words during the time period; receive, responsive to user input, a sequence of new words; and associate a timestamp with each of the new words in the sequence of new words by using the timestamps associated with the original words to determine a corresponding time during the time period for each of the new words.
Insert-editing · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Video hosting of uploaded data from client · CPC title
Indicating arrangements {(indicating means incorporated in magazine or cassette G11B23/046 and G11B23/0875; indicating measured values in general G01D)} · CPC title
involving the mixing of the reproduced video signal with a non-recorded signal, e.g. a text signal · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.