Editing text in video captions

US11272137B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11272137-B1
Application numberUS-202117170314-A
CountryUS
Kind codeB1
Filing dateFeb 8, 2021
Priority dateOct 14, 2019
Publication dateMar 8, 2022
Grant dateMar 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This disclosure describes techniques that include modifying text associated with a sequence of images or a video sequence to thereby generate new text and overlaying the new text as captions in the video sequence. In one example, this disclosure describes a method that includes receiving a sequence of images associated with a scene occurring over a time period; receiving audio data of speech uttered during the time period; transcribing into text the audio data of the speech, wherein the text includes a sequence of original words; associating a timestamp with each of the original words during the time period; generating, responsive to input, a sequence of new words; and generating a new sequence of images by overlaying each of the new words on one or more of the images.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising a storage system and processing circuitry having access to the storage system, wherein the processing circuitry is configured to: receive audio data associated with a scene occurring over a time period, wherein the audio data includes data representing speech uttered during the time period; transcribe the audio data of the speech into text, wherein the text includes a sequence of original words; associate a timestamp with each of the original words during the time period; receive, responsive to user input, a sequence of new words; and associate a timestamp with each of the new words in the sequence of new words by using the timestamps associated with the original words to determine a corresponding time during the time period for each of the new words. 2. The system of claim 1 , wherein the processing circuitry is further configured to: present the sequence of new words according to the timestamp associated with each of the new words. 3. The system of claim 2 , wherein to receive audio data, the processing circuitry is further configured to receive a sequence of images and the audio data occurring over the time period; and wherein to present the sequence of new words, the processing circuitry is further configured to generate a new sequence of images with each of the new words overlaid on one or more of the images according to the timestamp associated with each of the new words. 4. The system of claim 3 , wherein the sequence of original words includes a plurality of unchanged original words representing original words not changed in the sequence of new words, and wherein to generate the new sequence of images, the processing circuitry is further configured to: overlay each of the unchanged original words as a caption on one or more of the images in the sequence of images based on the respective timestamps associated with the unchanged original words. 5. The system of claim 1 , wherein to receive the sequence of new words, the processing circuitry is further configured to: receive, for each of the new words, information identifying one or more original words that the new word replaces. 6. The system of claim 5 , wherein to associate a timestamp with each of the new words, the processing circuitry is further configured to: associate the timestamp of the one or more original words that the new word replaces. 7. The system of claim 1 , wherein to receive the sequence of new words, the processing circuitry is further configured to: receive a sequence of foreign language words by translating the sequence of original words. 8. The system of claim 1 , wherein to associate the timestamp with each of the original words, the processing circuitry is further configured to: associate, for each of the original words, a starting timestamp corresponding to the start of the original word during the time period and an ending timestamp corresponding to the end of the original word during the time period. 9. The system of claim 1 , wherein to associate the timestamp with each of the original words, the processing circuitry is further configured to: associate each of the original words with a time period corresponding to a pause between each of the original words. 10. The system of claim 1 , wherein to receive the sequence of new words, the processing circuitry is further configured to receive a sequence of words in which a deleted original word has been removed from the sequence of original words; and wherein to associate each of the new words with one or more corresponding original words, the processing circuitry is further configured to associate each of the new words with one or more original words without associating any of the new words with the deleted original word. 11. A method comprising: receiving, by a computing system, audio data associated with a scene occurring over a time period, wherein the audio data includes data representing speech uttered during the time period; transcribing, by the computing system, the audio data of the speech into text, wherein the text includes a sequence of original words; associating, by the computing system, a timestamp with each of the original words during the time period; receiving, by the computing system and responsive to user input, a sequence of new words; and associating, by the computing system, a timestamp with each of the new words in the sequence of new words by using the timestamps associated with the original words to determine a corresponding time during the time period for each of the new words. 12. The method of claim 11 , further comprising: presenting, by the computing system, the sequence of new words according to the timestamp associated with each of the new words. 13. The method of claim 12 , wherein receiving audio data includes receiving a sequence of images and the audio data occurring over the time period, and wherein presenting the sequence of new words includes: generating a new sequence of images with each of the new words overlaid on one or more of the images according to the timestamp associated with each of the new words. 14. The method of claim 13 , wherein the sequence of original words includes a plurality of unchanged original words representing original words not changed in the sequence of new words, and wherein generating the new sequence of images includes: overlaying each of the unchanged original words as a caption on one or more of the images in the sequence of images based on the respective timestamps associated with the unchanged original words. 15. The method of claim 11 , wherein receiving the sequence of new words includes: receiving, for each of the new words, information identifying one or more original words that the new word replaces. 16. The method of claim 15 , wherein associating a timestamp with each of the new words includes: associating the timestamp of the one or more original words that the new word replaces. 17. The method of claim 11 , wherein receiving the sequence of new words includes: receiving a sequence of foreign language words by translating the sequence of original words. 18. The method of claim 11 , wherein associating the timestamp with each of the original words includes: associating, for each of the original words, a starting timestamp corresponding to the start of the original word during the time period and an ending timestamp corresponding to the end of the original word during the time period. 19. The method of claim 11 , wherein associating the timestamp with each of the original words includes: associating each of the original words with a time period corresponding to a pause between each of the original words. 20. A non-transitory computer-readable storage medium comprising instructions that, when executed, configure processing circuitry of a computing system to: receive audio data associated with a scene occurring over a time period, wherein the audio data includes data representing speech uttered during the time period; transcribe the audio data of the speech into text, wherein the text includes a sequence of original words; associate a timestamp with each of the original words during the time period; receive, responsive to user input, a sequence of new words; and associate a timestamp with each of the new words in the sequence of new words by using the timestamps associated with the original words to determine a corresponding time during the time period for each of the new words.

Assignees

Inventors

Classifications

  • Insert-editing · CPC title

  • G10L15/26Primary

    Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Video hosting of uploaded data from client · CPC title

  • Indicating arrangements  {(indicating means incorporated in magazine or cassette G11B23/046 and G11B23/0875; indicating measured values in general G01D)} · CPC title

  • involving the mixing of the reproduced video signal with a non-recorded signal, e.g. a text signal · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11272137B1 cover?
This disclosure describes techniques that include modifying text associated with a sequence of images or a video sequence to thereby generate new text and overlaying the new text as captions in the video sequence. In one example, this disclosure describes a method that includes receiving a sequence of images associated with a scene occurring over a time period; receiving audio data of speech ut…
Who is the assignee on this patent?
Facebook Tech Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).