Automatic taxonomy construction from keywords

US9501569B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9501569-B2
Application numberUS-201313868758-A
CountryUS
Kind codeB2
Filing dateApr 23, 2013
Priority dateApr 23, 2013
Publication dateNov 22, 2016
Grant dateNov 22, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system, method or computer readable storage device to derive a taxonomy from keywords is described herein. A domain-dependent taxonomy from a set of keywords may be automatically derived by leveraging both a general knowledgebase and keyword search. For example, concepts may be deduced with the technique of conceptualization, and context information may be extracted from a search engine. Then, the taxonomy may be constructed using a tree algorithm.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a set of keywords; determining a set of concepts corresponding to the set of keywords, wherein the determining the set of concepts comprises utilizing a general purpose knowledgebase and one or more of the set of concepts are associated with a score to indicate a probability that a term from the general purpose knowledgebase is a concept of a keyword of the set of keywords; obtaining context information corresponding to the set of keywords by: collecting snippets from search results obtained from a search engine; ranking a predetermined number of snippet words based at least in part on frequency of occurrence; and storing the predetermined number of highest ranked snippet words as the context information; determining a weight for a term based at least in part on the set of concepts and the context information; and performing hierarchical clustering to automatically generate a taxonomy based at least in part on the weight, the set of concepts and the context information. 2. The method of claim 1 , wherein the taxonomy comprises a multi-branch hierarchy clustering. 3. The method of claim 1 , wherein the performing the hierarchical clustering comprises storing nearest neighbor information. 4. The method of claim 1 , wherein the obtaining the context information further comprises analyzing a set of the snippet words as a bag of words. 5. The method of claim 1 , wherein the hierarchical clustering is performed using a Bayesian approach. 6. A system comprising: one or more processors; a memory, accessible by the one or more processors; a keyword module stored in the memory and executable on the one or more processors to receive a set of keywords; a concepts module stored in the memory and executable on the one or more processors to determine a set of concepts for the keywords, wherein the concepts module: determines the set of concepts for the keywords with a general purpose knowledgebase such that one or more of the set of concepts are associated with a score to indicate a probability that a term from the general purpose knowledgebase is a concept of the keyword; a context module stored in the memory and executable on the one or more processors to obtain context information for the keywords, wherein the context module: accesses a search engine; collects snippets from search results obtained from the search engine; ranks a predetermined number of snippet words based at least in part on frequency of occurrence; and stores the predetermined number of highest ranked snippet words as the context information; and a taxonomy module stored in the memory and executable on the one or more processors to determine a weight for a term based at least in part on the set of concepts and the context information and perform hierarchical clustering based at least in part on the weight, the set of concepts and the context information. 7. The system of claim 6 , wherein the set of concepts is based at least in part on terms returned by the general purpose knowledgebase. 8. The system of claim 6 , wherein the concepts module parses the set of keywords. 9. The system of claim 6 , wherein the taxonomy module generates a multi-branch hierarchy clustering. 10. The system of claim 6 , wherein context module further analyzes a set of the snippet words as a bag of words. 11. The system of claim 6 , wherein the taxonomy module caches nearest neighbor information in response to performing the hierarchical clustering. 12. A computer-readable storage device storing a plurality of executable instructions configured to program a computing device to perform operations comprising: receiving a set of keywords; parsing the set of keywords to provide a keyword; determining a set of concepts for the keyword, wherein the set of concepts is determined with a general purpose knowledgebase such that one or more of the concepts are associated with a score to indicate a probability that a term from the general purpose knowledgebase is a concept of the keyword; obtaining context information for the keyword by: collecting snippets from search results obtained from a search engine; ranking a predetermined number of snippet words based at least in part on frequency of occurrence; and storing the predetermined number of highest ranked snippet words as the context information; and performing hierarchical clustering to automatically generate a taxonomy with the keyword based at least in part on the set of concepts and the context information. 13. The device of claim 12 , wherein the operations further comprise caching nearest neighbor information in response to the performing of the hierarchical clustering. 14. The device of claim 12 , wherein the performing hierarchical clustering includes generating a multi-branch tree. 15. The device of claim 12 , wherein the operations further comprise determining a weight for the term based at least in part on the set of concepts and the context information. 16. The device of claim 12 , wherein the obtaining context information for the keyword further comprises analyzing a set of the snippet words as a bag of words. 17. The device of claim 12 , wherein the hierarchical clustering is a Bayesian-based hierarchical clustering.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9501569B2 cover?
A system, method or computer readable storage device to derive a taxonomy from keywords is described herein. A domain-dependent taxonomy from a set of keywords may be automatically derived by leveraging both a general knowledgebase and keyword search. For example, concepts may be deduced with the technique of conceptualization, and context information may be extracted from a search engine. Then…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 22 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).