Video processing method and apparatus, device, and medium
US-2024402902-A1 · Dec 5, 2024 · US
US2025166417A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025166417-A1 |
| Application number | US-202418957195-A |
| Country | US |
| Kind code | A1 |
| Filing date | Nov 22, 2024 |
| Priority date | Nov 22, 2023 |
| Publication date | May 22, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method including obtaining, during a communication session between a first device and a second device, video data that includes sign language content. In these and other embodiments, the sign language content may include one or more video frames of a figure performing sign language. The method may further include obtaining audio data that represents the sign language content in the video data and providing, during the communication session, the video data and the audio data to a sign language processing system that includes a machine learning model. In these and other embodiments, the video data and the audio data may be generated independent of the sign language processing system. The method may also include training the machine learning model during the communication session using the video data and the audio data.
Opening claim text (preview).
1 . A method comprising: obtaining, during a communication session between a first device and a second device, video data that includes sign language content, the sign language content including one or more video frames of a figure performing sign language; obtaining audio data that represents the sign language content in the video data; providing, during the communication session, the video data and the audio data to a sign language processing system that includes a machine learning model, the video data and the audio data being generated independent of the sign language processing system; and training the machine learning model during the communication session using the video data and the audio data. 2 . The method of claim 1 , wherein the audio data and the video data are obtained from different devices. 3 . The method of claim 1 , wherein one of the first device and the second device provides one of the audio data and the video data and the other of the first device and the second device does not provide the video data and does not provide the audio data. 4 . The method of claim 1 , wherein the machine learning model is part of a sign language generation system or a sign language recognition system. 5 . The method of claim 1 , wherein the audio data is obtained before the video data. 6 . The method of claim 1 , wherein training the machine learning model during the communication session using the video data and the audio data includes directing the audio data to an automatic speech recognition system configured to generate first text data that includes a transcription of spoken words in the audio data, the first text data used in training the machine learning model. 7 . The method of claim 6 , wherein training the machine learning model during the communication session includes: generating, by the sign language processing system, second text data by providing the video data to the machine learning model, the second text data representing the sign language content in the video data; comparing the first text data and the second text data; and adjusting the machine learning model based on the comparison. 8 . The method of claim 7 , wherein the steps of generating, comparing, and adjusting occur before an end of the communication session. 9 . The method of claim 6 , wherein training the machine learning model during the communication session includes: generating, by the sign language processing system, second video data by providing the first text data to the machine learning model, the second video data including sign language representing the first text data; comparing the video data and the second video data; and adjusting the machine learning model based on the comparison. 10 . The method of claim 1 , wherein training the machine learning model during the communication session using the video data and the audio data includes training the machine learning model using data that is not obtained from the communication session in conjunction with the video data and the audio data from the communication session. 11 . The method of claim 1 , wherein the video data and the audio data are deleted at an end of the communication session. 12 . The method of claim 1 , wherein the video data and the audio data are deleted after the machine learning model is trained using the video data and the audio data. 13 . The method of claim 1 , wherein the video data and the audio data are deleted within a predetermined amount of time after an end of the communication session. 14 . At least one non-transitory computer-readable media configured to store one or more instructions that, in response to being executed by a system, cause or direct the system to perform the method of claim 1 . 15 . A system comprising: one or more computer readable mediums including instructions; one or more computing systems coupled to the one or more computer readable mediums and configured to execute the instructions to cause or direct the system to perform operations, the operations comprising: obtaining, during a communication session between a first device and a second device, video data that includes sign language content, the sign language content including one or more video frames of a figure performing sign language; obtaining audio data that represents the sign language content in the video data; providing, during the communication session, the video data and the audio data to a sign language processing system that includes a machine learning model, the video data and the audio data being generated independent of the sign language processing system; and training the machine learning model during the communication session using the video data and the audio data. 16 . The system of claim 15 , wherein the machine learning model is part of a sign language generation system or a sign language recognition system. 17 . The system of claim 15 , wherein training the machine learning model during the communication session using the video data and the audio data includes directing the audio data to an automatic speech recognition system configured to generate first text data that includes a transcription of spoken words in the audio data, the first text data used in training the machine learning model. 18 . The system of claim 17 , wherein training the machine learning model during the communication session includes: generating, by the sign language processing system, second text data by providing the video data to the machine learning model, the second text data representing the sign language content in the video data; comparing the first text data and the second text data; and adjusting the machine learning model based on the comparison. 19 . The system of claim 18 , wherein the steps of generating, comparing, and adjusting occur before an end of the communication session. 20 . The system of claim 17 , wherein training the machine learning model during the communication session includes: generating, by the sign language processing system, second video data by providing the first text data to the machine learning model, the second video data including sign language representing the first text data; comparing the video data and the second video data; and adjusting the machine learning model based on the comparison.
Data-driven translation · CPC title
Machine-assisted translation, e.g. using translation memory · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Active pattern-learning, e.g. online learning of image or video features · CPC title
Transforming into visible information · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.