Topic set refinement

US11157539B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11157539-B2
Application numberUS-201816016352-A
CountryUS
Kind codeB2
Filing dateJun 22, 2018
Priority dateJun 22, 2018
Publication dateOct 26, 2021
Grant dateOct 26, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing system including one or more processors generates a topic set for a domain. A taxonomic evaluator is executed by the one or more processors to evaluate a set of category clusters generated from domain-specific textual data against a domain-specific taxonomic tree based on a coherency condition and to identify the category clusters that satisfy the coherency condition. The domain-specific taxonomic tree is generated from hierarchical structures of documents relating to the domain. Each identified category cluster is labeled with a label. A topic set creator is executed by the one or more processors to insert the labels of the set of identified category clusters into the topic set for the domain.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of generating a topic set for a domain, the method comprising: generating a set of category clusters from domain-specific textual data; labeling each category cluster with a label; evaluating the labeled category clusters against a domain-specific taxonomic tree based on a coherency condition, the domain-specific taxonomic tree being generated from hierarchical structures of documents relating to the domain, the coherency condition being satisfied based on relative positions of the labeled category clusters within the domain-specific taxonomic tree, wherein the domain-specific taxonomic tree includes multiple unique coherent sets, each unique coherent set of the domain-specific taxonomic tree including node categories that are siblings in the domain-specific taxonomic tree and node categories sharing an ancestor-descendant relationship of the domain-specific taxonomic tree; identifying the labeled category clusters that satisfy the coherency condition, wherein the coherency condition is satisfied by a category cluster having all cluster members semantically matched to node categories in the same unique coherent set of the domain-specific taxonomic tree; and inserting the label of each of the identified category clusters into the topic set for the domain, responsive to identifying the labeled category clusters that satisfy the coherency condition. 2. The method of claim 1 wherein the evaluating operation comprises: identifying semantic matches between each cluster member of each labeled category cluster of the set and node categories of the domain-specific taxonomic tree. 3. The method of claim 1 wherein the evaluating operation further comprises: embedding the domain-specific textual data as phrase vectors in a multidimensional vector space; and generating the set of category clusters from the domain-specific textual data by clustering the phrase vectors into phrase clusters based on a similarity condition, the set of category clusters being selected from the phrase clusters. 4. The method of claim 1 further comprising: generating a different set of category clusters from the domain-specific textual data; labeling each category cluster of the different set with a label; evaluating the labeled category clusters of the different set against the domain-specific taxonomic tree based on the coherency condition; identifying the labeled category clusters of the different set that satisfy the coherency condition; and inserting the label of each of the identified category clusters of the different set into the topic set for the domain, responsive to identifying the labeled category clusters that satisfy the coherency condition. 5. The method of claim 4 wherein the set of category clusters and the different set of category clusters are generated using a different clustering parameter value. 6. The method of claim 1 wherein at least some of the domain-specific textual data is extracted from a set of domain-specific websites. 7. The method of claim 1 wherein at least some of the domain-specific textual data is extracted from a query-URL click graph. 8. A computing system for generating a topic set for a domain, the computing system comprising: one or more processors; a taxonomic evaluator executed by the one or more processors and configured to evaluate a set of category clusters generated from domain-specific textual data against a domain-specific taxonomic tree based on a coherency condition, wherein the domain-specific taxonomic tree includes multiple unique coherent sets, each unique coherent set of the domain-specific taxonomic tree including node categories that are siblings in the domain-specific taxonomic tree and node categories sharing an ancestor-descendant relationship of the domain-specific taxonomic tree, and to identify the category clusters that satisfy the coherency condition, wherein the coherency condition is satisfied by a category cluster having all cluster members semantically matched to node categories in the same unique coherent set of the domain-specific taxonomic tree, the domain-specific taxonomic tree being generated from hierarchical structures of documents relating to the domain, each identified category cluster being labeled with a label, the coherency condition being satisfied based on relative positions of the category clusters within the domain-specific taxonomic tree; and a topic set creator executed by the one or more processors and configured to insert each of the identified category clusters into the topic set for the domain. 9. The computing system of claim 8 wherein the taxonomic evaluator is further configured to identify semantic matches between each cluster member of each labeled category cluster of the set and node categories of the domain-specific taxonomic tree. 10. The computing system of claim 8 wherein the taxonomic evaluator is further configured to generate a different set of category clusters from the domain-specific textual data, evaluate the labeled category clusters of the different set against the domain-specific taxonomic tree based on the coherency condition, and identify the labeled category clusters of the different set that satisfy the coherency condition, each identified category cluster of the different set being labeled with a label, and the topic set creator is further configured to insert the label of each of the identified category clusters of the different set into the topic set for the domain, responsive to identification of the labeled category clusters that satisfy the coherency condition. 11. The computing system of claim 10 wherein the set of category clusters and the different set of category clusters are generated using a different clustering parameter value. 12. One or more tangible processor-readable storage media devices of a tangible article of manufacture encoding processor-executable instructions for executing on an electronic computing system a process of generating a topic set for a domain, the process comprising: generating a set of category clusters from domain-specific textual data; labeling each category cluster with a label; evaluating the labeled category clusters against a domain-specific taxonomic tree based on a coherency condition, the domain-specific taxonomic tree being generated from hierarchical structures of documents relating to the domain, the coherency condition being satisfied based on relative positions of the labeled category clusters within the domain-specific taxonomic tree, wherein the domain-specific taxonomic tree includes multiple unique coherent sets, each unique coherent set of the domain-specific taxonomic tree including node categories that are siblings in the domain-specific taxonomic tree and node categories sharing an ancestor-descendant relationship of the domain-specific taxonomic tree; identifying the labeled category clusters that satisfy the coherency condition, wherein the coherency condition is satisfied by a category cluster having all cluster members semantically matched to node categories in the same unique coherent set of the domain-specific taxonomic tree; and inserting the label of each of the identified category clusters into the topic set for the domain, responsive to identifying the labeled category clusters that satisfy the coherency condition. 13. The one or more tangible processor-readable storage media devices of claim 12 wherein the domain-specific taxonomic tree includes multiple unique coherent sets, each unique coherent set of the domain-specific taxonomic tree including node categories that are siblings in the domain-specific taxonomic tree and node categories sharing an ance

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11157539B2 cover?
A computing system including one or more processors generates a topic set for a domain. A taxonomic evaluator is executed by the one or more processors to evaluate a set of category clusters generated from domain-specific textual data against a domain-specific taxonomic tree based on a coherency condition and to identify the category clusters that satisfy the coherency condition. The domain-spe…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/353. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 26 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).