Ontology customization for indexing digital content

US12222974B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12222974-B2
Application numberUS-202217853086-A
CountryUS
Kind codeB2
Filing dateJun 29, 2022
Priority dateJun 29, 2022
Publication dateFeb 11, 2025
Grant dateFeb 11, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for automatically classifying terms of a first ontology into categories of a classification scheme defined with respect to a second ontology includes generating, for each term in the first ontology and each term in the second ontology, an embedding encoding the term and a description of the term. The method further includes adding the generated embeddings to a transformer model and computing, for each pair of the embeddings consisting of a first term from the first ontology and a second term from the second ontology, a similarity metric quantifying a similarity of the first term and the second term. The method still further provides for determining a matching scheme based on the similarity metric computed with respect to each pair of the embeddings, where the matching scheme associates term of the first ontology with one or more relevant categories of the classification scheme defined with respect to the second ontology. The method further provides for returning the one or more relevant categories of the classification scheme that are matched, by the determined matching scheme, to a term of the second ontology received as an input.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor-implemented method, comprising: generating an embedding encoding each term of a plurality of terms and a corresponding description, the plurality of terms comprising terms from a first ontology and terms from a second ontology; adding the embeddings to a transformer model; generating a plurality of pairs of the embeddings, each pair of the plurality of pairs comprising a first term from the first ontology and a second term from the second ontology; computing a similarity metric quantifying a similarity of the first term and the second term for each pair of the plurality of pairs; determining, based on the similarity metric, a stable matching scheme that matches each term of the first ontology to a single best match category of a classification scheme defined with respect to the second ontology; and responsive to receiving as input a particular term of the first ontology, returning a corresponding best match category of the classification scheme based on a match, by the stable matching scheme, to the particular term. 2. The processor-implemented method of claim 1 , wherein determining the stable matching scheme further comprises: constructing a graph that includes: a first set of nodes corresponding to terms of the first ontology, a second set of nodes corresponding to terms of the second ontology, and a third set of nodes corresponding to categories of the classification scheme; a first set of edges linking the first set of nodes to the second set of nodes, each edge in the first set of edges having an edge weight based on the similarity metric computed with respect to endpoints of the edge; and a second set of edges linking the second set of nodes to the third set of nodes such that paths extend between the first set of nodes and the third set of nodes, each path of the paths incorporating an edge selected from the first set of edges and an edge selected from the second set of edges. 3. The processor-implemented method of claim 2 , wherein determining the stable matching scheme further comprises: assigning each path of the paths a path weight based on the edge weight of a particular edge of the first set of edges that is within the path; and producing a modified graph by generating merged paths by merging select paths that extend between common endpoints and assigning a merged path weight to the merged paths; wherein the stable matching scheme is determined based on the merged path weights of the modified graph. 4. The processor-implemented method of claim 3 , wherein determining the stable matching scheme is based on the modified graph. 5. The processor-implemented method of claim 1 , further comprising: using the similarity metric computed between a first term of the first ontology and of a second term of the second ontology to quantify a strength of association between the first term of the first ontology and one or more categories of the classification scheme defined with respect to the second ontology. 6. The processor-implemented method of claim 2 , wherein determining the matching scheme further comprises executing a stable marriage algorithm or a Hungarian matching algorithm with respect to the paths connecting each of the categories of the classification scheme to one or more of the terms of the first ontology. 7. The processor-implemented method of claim 1 , wherein the particular term received as input of the first ontology is associated with digital content and wherein the method further comprises adding the particular relevant category to metadata that is used to index the digital content. 8. The processor-implemented method of claim 1 , wherein the transformer model is a Bidirectional Encoder Representations from Transformers (BERT) model. 9. A system for ontology matching, comprising: memory; a processing system; an encoder stored in the memory and executable by the processing system to generate a plurality of embeddings that correspond to terms of a first ontology or a second ontology, each embedding of the plurality of embeddings encoding a term and a description of the term; a transformer model stored in the memory and executable by the processing system to add the plurality of embeddings to a vector space in which similarity between terms is correlated with distance between respective pairs of the embeddings; a similarity determination engine stored in the memory and executable by the processing system to compute a similarity metric for each of the respective pairs of the embeddings that comprise a first term from the first ontology and a second term from the second ontology, the similarity metric quantifying a semantic similarity of the first term and the second term; and a stable match identifier stored in the memory and executable by the processing system to determine, based on the similarity metric computed with respect to each pair of the embeddings, a stable matching scheme that matches each term of the second ontology to a single best match category of a classification scheme used to classify the terms of the first ontology; and an ontology translation engine stored in memory and configured to: receive as input a particular term of the second ontology; and utilize the stable matching scheme to identify and return a particular relevant category classifying the particular term, the particular relevant category being selected from categories of the classification scheme. 10. The system of claim 9 , wherein the similarity determination engine is further executable to construct a graph that includes: a first set of nodes corresponding to the terms of the first ontology, a second set of nodes corresponding to the terms of the second ontology, and a third set of nodes corresponding to categories of the classification scheme; a first set of edges linking the first set of nodes to the second set of nodes, each edge in the first set of edges having an edge weight based on the similarity metric computed with respect to endpoints of the edge; and a second set of edges linking the first set of nodes to third set of nodes such that there exist a number of paths extending between the first set of nodes and the third set of nodes, each of the number of paths incorporating an edge selected from the first set of edges and an edge selected from the second set of edges. 11. The system of claim 10 , wherein the stable match identifier is further executable to: assign each path of the number of paths a path weight based on the edge weight of a particular edge of the first set of edges that is within the path; and produce a modified graph by generating merged paths by merging select paths that extend between common endpoints and assigning a merged path weight to the merged paths; wherein the stable matching scheme is computed based on the path weights of the modified graph. 12. The system of claim 10 , wherein the stable match identifier executes a stable marriage algorithm or a Hungarian matching algorithm with respect to the number of paths connecting each of the categories of the classification scheme to one or more of the terms of the second ontology. 13. The system of claim 9 , wherein the particular term of the second ontology is associated with digital content and wherein the system further includes an indexing engine stored in memory that is executable to add the particular relevant category to metadata that is used to index the digital content. 14. The system of claim 9 , wherein the transformer model is a Bidirectional Encoder Representations from Transformers (BERT) model. 15. A tangible computer-readable storage media encoding compu

Assignees

Inventors

Classifications

  • involving differential geometry, e.g. embedding of pattern manifold · CPC title

  • based on graph theory, e.g. minimum spanning trees [MST] or graph cuts · CPC title

  • Matching criteria, e.g. proximity measures · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Clustering; Classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12222974B2 cover?
A method for automatically classifying terms of a first ontology into categories of a classification scheme defined with respect to a second ontology includes generating, for each term in the first ontology and each term in the second ontology, an embedding encoding the term and a description of the term. The method further includes adding the generated embeddings to a transformer model and com…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).