Categorizing objects, such as documents and/or clusters, with respect to a taxonomy and data structures derived from such categorization

US9971813B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9971813-B2
Application numberUS-201414560067-A
CountryUS
Kind codeB2
Filing dateDec 4, 2014
Priority dateApr 22, 2005
Publication dateMay 15, 2018
Grant dateMay 15, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A Website may be automatically categorized by accepting Website information, determining a set of scored clusters for the Website using the Website information, and determining at least one category of a predefined taxonomy using at least some of the set of clusters.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for categorizing a property into one or more categories of a predefined taxonomy, the method comprising: a) receiving, by a computer system, information about a property; b) identifying, by the computer system using the received information about the property, multiple semantic clusters of re-occurring terms within the information; c) identifying, by the computer system, a set of one or more categories for the property from among the multiple semantic clusters based on a frequency of occurrence of the re-occurring terms in the information, including: for each level of multiple different levels of a hierarchical taxonomy of categories, determining whether a cluster score for a category at that level of the hierarchical taxonomy meets a pre-specified cluster score threshold; identifying, based on the determination, a deepest level from a top level of the hierarchical taxonomy that includes a given category having the cluster score that was determined to meet the pre-specified threshold, wherein the cluster score of a given category at a given level of the hierarchical taxonomy is a combination of the cluster score for the given category at that given level and cluster scores of one or more lower level categories that are subsumed by the given category at that level; and assigning the given category of the most specific deepest level from the top level of the hierarchical taxonomy having the cluster score that meets the pre-specified threshold value as an assigned category for the property; d) generating, using the identified set of categories, a mapping of the property to at least some of the one or more categories, including generating a mapping of the property to the assigned category; e) receiving, by the computer system, a term submitted by a user; f) identifying, by the computer system and using a mapping of terms to categories, the assigned category as a category that is mapped to the term; and g) providing, to the user, information identifying the property based on the property being assigned to the assigned category that is mapped to the term. 2. The method of claim 1 , wherein each of the re-occurring terms has a re-occurrence frequency, and wherein step (c) further comprises identifying, by the computer system based on the clusters of re-occurring terms within the information, one or more concepts for the property, each concept identifying different re-occurring terms having identical meanings. 3. The method of claim 2 , wherein step (c) further comprises scoring, by the computer system, the identified one or more concepts based on the re-occurrence frequencies of each of the re-occurring terms identified by said concept. 4. The method of claim 3 , wherein identifying one or more concepts for the property further comprises comparing the score of each of the identified one or more concepts to a threshold and identifying a subset of the identified one or more concepts with scores above the threshold. 5. The method of claim 3 , wherein step (c) further comprises identifying, by the computer system, the set of one or more categories by identifying categories in a concept-category index responsive to the concept scores of the identified one or more concepts. 6. The method of claim 1 , wherein a category corresponds to a node of the hierarchical taxonomy defining a structured set of categories. 7. The method of claim 1 , wherein the property is a Webpage or a Website including a plurality of Webpages. 8. The method of claim 1 , wherein step (d) further comprises generating and storing an index entry mapping the received information about the property to each of the at least some of the one or more categories. 9. The method of claim 1 , wherein step (c) further comprises determining the cluster score for each of the set of one or more categories based on a sum of values including (1) an intra-category cluster score of the category, and (2) intra-category cluster scores of categories that are descendants of the category in a hierarchical taxonomy. 10. A system for associating a property with one or more categories of a predefined taxonomy, the system comprising: a computer system comprising a processor and a memory storing an advertising targeting database, the processor configured to perform operations including: receiving information about a property; identifying, by the computer system using the received information about the property, multiple semantic clusters of re-occurring terms within the information; identifying, by the computer system, a set of one or more categories using the multiple semantic clusters, including: for each level of multiple different levels of a hierarchical taxonomy of categories, determining whether a cluster score for a category at that level of the hierarchical taxonomy meets a pre-specified cluster score threshold; identifying, based on the determination, a deepest level from a top level of the hierarchical taxonomy that includes a given category having the cluster score that was determined to meet the pre-specified threshold, wherein the cluster score of a given category at a given level of the hierarchical taxonomy is a combination of the cluster score for the given category at that given level and cluster scores of one or more lower level categories that are subsumed by the given category at that level; and assigning the given category of the deepest level from the top level of the hierarchical taxonomy having the cluster score that meets the pre-specified threshold value as an assigned category for the property; generating a mapping of the property to at least some of the one or more categories, including generating a mapping of the property to the assigned category; receiving, by the computer system, a term submitted by a user; identifying, by the computer system and using a mapping of terms to categories, the assigned category as a category that is mapped to the term; and providing, to the user, information identifying the property based on the property being assigned to the assigned category that is mapped to the term. 11. The system of claim 10 , wherein each of the re-occurring terms has a re-occurrence frequency, and wherein the processor is further configured to identify, based on the clusters of re-occurring terms within the information, one or more concepts for the property, each concept identifying different re-occurring terms having identical meanings. 12. The system of claim 11 , wherein the processor is further configured to score the identified one or more concepts based on the re-occurrence frequencies of each of the reoccurring terms identified by said concept. 13. The system of claim 12 , wherein the processor is further configured to compare the score of each of the identified one or more concepts to a threshold and identifying a subset of the identified one or more concepts with scores above the threshold. 14. The system of claim 12 , wherein the processor is further configured to identify the set of one or more categories by identifying categories in a concept-category index responsive to the concept scores of the identified one or more concepts. 15. The system of claim 10 , wherein a category corresponds to a node of the hierarchical taxonomy defining a structured set of categories. 16. The system of claim 10 , wherein the property is a Webpage or a Website including a plurality of Webpages. 17. The system of claim 10 , wherein the processor is further configured to generate and store, in the memory, an index entry mapping the received information about the property to each o

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9971813B2 cover?
A Website may be automatically categorized by accepting Website information, determining a set of scored clusters for the Website using the Website information, and determining at least one category of a predefined taxonomy using at least some of the set of clusters.
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/353. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 15 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).