Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06V20/47. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Media management system for video data processing and adaptation data generation

US11501546B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11501546-B2
Application number	US-202016940209-A
Country	US
Kind code	B2
Filing date	Jul 27, 2020
Priority date	Jan 27, 2018
Publication date	Nov 15, 2022
Grant date	Nov 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various embodiments, methods and systems for implementing a media management system, for video data processing and adaptation data generation, are provided. At a high level, a video data processing engine relies on different types of video data properties and additional auxiliary data resources to perform video optical character recognition operations for recognizing characters in video data. In operation, video data is accessed to identify recognized characters. A video OCR operation to perform on the video data for character recognition is determined from video character processing and video auxiliary data processing. Video auxiliary data processing includes processing an auxiliary reference object; the auxiliary reference object is an indirect reference object that is a derived input element used as a factor in determining the recognized characters. The video data is processed based on the video OCR operation and based on processing the video data, at least one recognized character is communicated.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system for providing video data processing, the system comprising: a video data processing engine configured to: access video data; execute video auxiliary data processing, wherein video auxiliary data processing comprises processing an auxiliary reference object, wherein the auxiliary reference object is an indirect reference object that is a derived input element used in determining recognized characters; based on a video optical character recognition (“OCR”) operation, determine a recognized character; and communicate the recognized character. 2. The system of claim 1 , wherein executing video auxiliary data processing is responsive to determining the video OCR operation to perform on the video data based on properties of the video data or based on a user indication of the video OCR operation to perform on the video data. 3. The system of claim 1 , further comprising video cluster processing, wherein video cluster processing is video character processing of a plurality of characters across a video to determine the recognized characters. 4. The system of claim 1 , wherein the video auxiliary data processing comprises language detection and selection, wherein the language detection and selection includes detecting two or more languages corresponding to the video data and executing character detection based on the selected two or more languages, while excluding other potential OCR languages corresponding to the video data. 5. The system of claim 1 , wherein the indirect reference object is determined based on an auxiliary data source from a plurality of auxiliary data sources comprising the following: an entity linking knowledgebase, a facial recognition knowledgebase, an object recognition knowledgebase, a user account profile comprising an account language model, and an audio recognition knowledgebase. 6. The system of claim 1 , further comprising: executing video character processing, wherein video character processing comprises two or more of aggregating, aligning, weighting, and voting on a plurality of characters across a plurality of video character processing scenes to determine the recognized characters, wherein weighting the plurality of characters comprises adjusting a weighted score corresponding to each of the plurality of characters based on a plurality of weighting attributes. 7. The system of claim 6 , wherein voting on the plurality of characters, provided as candidate variants, is based on majority voting or weighted voting. 8. The system of claim 1 , wherein the video data processing engine further comprises an adaptation data generator configured to: access the video data comprising a visual signal and audio data; identify a first visual signal in the video data; determine an adaptation data item that corresponds to the first visual signal, based on one of a public visual knowledgebase or a private visual knowledgebase; identify a speech recognition model feature of the adaptation data item; modify a parameter of the speech recognition model based on the speech recognition model feature of the adaptation data item; based on the speech recognition model, determining a speech recognition data item in the audio data; and communicate the speech recognition data item. 9. A computer-implemented method for providing video data processing, the method comprising: accessing video data; determining, based on the video data, two or more optical character recognition (“OCR”) reference objects, wherein each of the two or more OCR reference objects comprises a frame of a video character processing scene of the video data, wherein a first frame of a first OCR reference object is different from a second frame of a second OCR reference object; and executing a video character processing operation based on the two or more OCR reference objects comprising frames, wherein the video character processing operation comprises determining a recognized character from the video. 10. The computer-implemented method of claim 9 , wherein the first frame of the first OCR reference object is in a first video different from a second video, the second video having the second frame of the second OCR reference object. 11. The computer-implemented method of claim 9 , wherein determining the recognized character from the video is based on: aggregating a plurality of characters from the frames of the two or more OCR reference objects; and weighting the plurality of characters, wherein weighting comprises adjusting a weighted score corresponding to each of the plurality of characters based on a plurality of weighting attributes; and determining the recognized character from the video data based on voting on the plurality of characters, provided as candidate variants, in the frames of the two or more OCR reference objects, wherein voting on the plurality of characters, provided as the candidate variants, is based on majority voting or weighted voting. 12. The computer-implemented method of claim 9 , wherein determining the recognized character is further based on accessing an auxiliary reference object, wherein the auxiliary reference object identifies two or more languages corresponding to the video data, wherein executing the video character processing is based on the selected two or more languages, while excluding other potential OCR languages corresponding to the video data. 13. The computer-implemented method of claim 9 , wherein determining the recognized character is further based on accessing an auxiliary reference object, wherein the auxiliary reference object is an indirect reference object, wherein the indirect reference object is a derived input element that is factor in determining the recognized character, wherein the indirect reference object is determined based on any of the following: an entity linking knowledgebase, a facial recognition knowledgebase, an object recognition knowledgebase, a user account profile comprising an account language model, and an audio recognition knowledgebase. 14. The computer-implemented method of claim 9 , the method further comprising: accessing the video data comprising audio data; identifying metadata corresponding to the video data; selecting a secondary data source based on the metadata; accessing an adaptation data item from the secondary data source; identifying a speech recognition model feature of the adaptation data item; modifying a parameter of the speech recognition model based on the speech recognition model feature of the adaptation data item; based on the speech recognition model, determining a speech recognition data items in the audio data; and communicating the speech recognition data item. 15. One or more computer storage media having computer-executable instructions embodied thereon that, when executed, by one or more processors, cause the one or more processors to perform a method for providing video data processing, the method comprising: accessing video data; determining, based on the video data, two or more optical character recognition (“OCR”) reference objects, wherein a first OCR reference object comprises a frame of a video character processing scene of the video data, and wherein a second reference object comprises an auxiliary reference object based on the video data, wherein the auxiliary reference object is an indirect reference object that is a derived input element used in determining recognized characters; executing a video OCR operation on the video data based on the two or more OCR reference objects; and determining a recognized character from the video data based at least on the first OCR r

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06V30/1908
Region based matching · CPC title
G06V20/47Primary
Detecting features for summarising video content · CPC title
G06V30/153Primary
using recognition of characters or words · CPC title
G06V30/10
Character recognition · CPC title
G06V20/635
Overlay text, e.g. embedded captions in a TV programme · CPC title

Patent family

Related publications grouped by family.

View patent family 67393535

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11501546B2 cover?: In various embodiments, methods and systems for implementing a media management system, for video data processing and adaptation data generation, are provided. At a high level, a video data processing engine relies on different types of video data properties and additional auxiliary data resources to perform video optical character recognition operations for recognizing characters in video data…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06V20/47. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).