Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06F16/287. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 02 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Inferring topics with entity linking and ontological data

US10936630B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10936630-B2
Application number	US-201816130992-A
Country	US
Kind code	B2
Filing date	Sep 13, 2018
Priority date	Sep 13, 2018
Publication date	Mar 2, 2021
Grant date	Mar 2, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are disclosed for inferring topics from a file containing both audio and video, for example a multimodal or multimedia file, in order to facilitate video indexing. A set of entities is extracted from the file and linked to produce a graph, and reference information is also obtained for the set of entities. Entities may be drawn, for example, from Wikipedia categories, or other large ontological data sources. Analysis of the graph, using unsupervised learning, permits determining clusters in the graph. Extracting features from the clusters, possibly using supervised learning, provides for selection of topic identifiers. The topic identifiers are then used for indexing the file.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of inferring topics from a multimodal file, the method comprising: receiving a multimodal file; extracting a set of entities from the multimodal file; linking the set of entities to produce a set of linked entities; obtaining reference information for the set of entities; based at least on the reference information, generating a graph of the set of linked entities, the graph comprising nodes and edges; based at least on the nodes and edges of the graph, determining clusters in the graph; based at least on the clusters in the graph, identifying topic candidates; extracting features from the clusters in the graph; based at least on the extracted features, mapping the topic candidates into a probability interval for at least one cluster; based at least on the mapping, ranking the topic candidates within the at least one cluster; based at least on the ranking, selecting at least one TopicID from among the topic candidates to represent the at least one cluster; and indexing the multimodal file with the at least one TopicID. 2. The method of claim 1 wherein the multimodal file comprises a video portion and an audio portion and wherein extracting a set of entities from the multimodal file comprises: detecting objects in the video portion of the multimodal file; and detecting text in the audio portion of the multimodal file. 3. The method of claim 2 wherein detecting objects comprises performing face recognition. 4. The method of claim 2 wherein detecting text comprises performing a speech to text process. 5. The method of claim 4 further comprising: identifying a language used in the audio portion of the multimodal file, and wherein performing a speech to text process comprises performing a speech to text process in the identified language. 6. The method of claim 4 further comprising: translating the detected text. 7. The method of claim 1 wherein extracting a set of entities from the multimodal file further comprises disambiguating among a set of detected entity names. 8. The method of claim 1 further comprising: extracting categories from the reference information for the set of entities. 9. The method of claim 1 further comprising: determining significant clusters and insignificant clusters in the determined clusters, and wherein extracting features from the clusters in the graph comprises extracting features from the significant clusters in the graph. 10. The method of claim 1 wherein extracting features from the clusters in the graph comprises at least one process selected from the list consisting of: determining a graph diameter and determining a Jaccard coefficient. 11. The method of claim 1 wherein indexing the multimodal file with the at least one TopicID further comprises: generating index data for the multimodal file to produce an indexed file, the index data including the at least one TopicID and corresponding time indexing information. 12. The method of claim 1 wherein ranking topic candidates comprises at least one process selected from the list consisting of: logistic regression and support vector machine (SVM). 13. The method of claim 1 further comprising: translating the at least one TopicID, and wherein indexing the multimodal file with the at least one TopicID comprises indexing the multimodal file with the at least one translated TopicID. 14. A system for inferring topics from a multimodal file, the system comprising: an entity extraction component comprising an object detection component and a speech to text component, that extracts a set of entities from a multimodal file comprising a video portion and an audio portion; an entity linking component that links the extracted set of entities to produce a set of linked entities; an information retrieval component that obtains reference information for the extracted set of entities; a graphing and analysis component that: generates a graph of the set of linked entities, the graph comprising nodes and edges; based at least on the nodes and edges of the graph, determines clusters in the graph; based at least on the clusters in the graph, identifies topic candidates; and extracts features from the clusters in the graph; a TopicID selection component that: maps the topic candidates into a probability interval for at least one cluster based at least on the extracted features; ranks the topic candidates within the at least one cluster based at least on the mapping; and based at least on the ranking, selects at least one TopicID from among the topic candidates to represent the at least one cluster; and a video indexer that indexes the multimodal file with the at least one TopicID. 15. The system of claim 14 wherein the object detection component further performs face recognition. 16. The system of claim 14 wherein the speech to text component extracts entity information in at least two different languages. 17. The system of claim 14 further comprising: a disambiguation component that disambiguates among a set of detected entity names. 18. The system of claim 14 further comprising: a training component for training classifiers used for ranking topic candidates. 19. One or more computer storage devices having computer-executable instructions stored thereon for inferring topics from a multimodal file, which, on execution by a computer, cause the computer to perform operations comprising: receiving a multimodal file comprising a video portion and an audio portion; extracting a set of entities from the multimodal file, wherein extracting a set of entities from the multimodal file comprises: detecting objects in the video portion of the multimodal file with face recognition; detecting text in the audio portion of the multimodal file with a speech to text process; and disambiguating among a set of detected entity names; linking the set of entities to produce a set of linked entities; obtaining reference information for the set of entities; based at least on the reference information, generating a graph of the set of linked entities, the graph comprising nodes and edges; based at least on the nodes and edges of the graph, determining clusters in the graph; determining significant clusters and insignificant clusters in the determined clusters; based at least on the significant clusters in the graph, identifying topic candidates; extracting features from the significant clusters in the graph; based at least on the extracted features, mapping the topic candidates into a probability interval; based at least on the mapping, ranking the topic candidates within at least one significant cluster, based on the ranking, selecting at least one TopicID from among the topic candidates to represent the at least one significant cluster; and indexing the multimodal file with the at least one TopicID. 20. The one or more computer storage devices of claim 19 wherein the operations further comprise: identifying a language used in the audio portion of the multimodal file, and detecting text in the audio portion of the multimodal file with a speech to text process comprises performing a speech to text process in the identified language.

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06V40/16
Human faces, e.g. facial parts, sketches or expressions · CPC title
G10L15/26
Speech to text systems (G10L15/08 takes precedence) · CPC title
G06F16/483
using metadata automatically derived from the content · CPC title
G06N5/046
Forward inferencing; Production systems · CPC title
G06F17/18
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title

Patent family

Related publications grouped by family.

View patent family 67470636

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10936630B2 cover?: Systems and methods are disclosed for inferring topics from a file containing both audio and video, for example a multimodal or multimedia file, in order to facilitate video indexing. A set of entities is extracted from the file and linked to produce a graph, and reference information is also obtained for the set of entities. Entities may be drawn, for example, from Wikipedia categories, or oth…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06F16/287. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 02 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).