Categorizing keywords

US10606944B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10606944-B2
Application numberUS-201715705302-A
CountryUS
Kind codeB2
Filing dateSep 15, 2017
Priority dateFeb 12, 2014
Publication dateMar 31, 2020
Grant dateMar 31, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A keyword to be categorized is received. A category dictionary including categories having associated registered keywords, and a text corpus are received. Registered keywords are identified in the category dictionary having a degree of similarity to the keyword to be categorized that is equal to or greater than a predetermined value, and the categories associated with the identified registered keywords are extracted. Registered keywords are identified that are co-occurring in the text corpus with the keyword to be categorized, and the categories associated with the identified co-occurring registered keywords are extracted. A degree of importance is determined for each extracted category based on a function of the identified registered keywords in the category dictionary and/or a function of the identified co-occurring registered keywords. The extracted categories are outputted, with at least an indication of each category's relative importance, as category candidates for categorizing the keyword to be categorized.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for categorizing keywords, the method comprising: receiving, by a computer, a keyword to be categorized; receiving, by the computer, a category dictionary including categories having associated respective pluralities of registered keywords; receiving, by the computer, a text corpus; identifying, by the computer, one or more registered keywords in the category dictionary having a degree of similarity to the keyword to be categorized that is equal to or greater than a predetermined value, and extracting the categories associated with the identified registered keywords; identifying, by the computer, one or more registered keywords co-occurring in the text corpus with the keyword to be categorized, and extracting the categories associated with the identified co-occurring registered keywords; determining, by the computer, a degree of importance of each extracted category based on a function of the identified registered keywords in the category dictionary and/or a function of the identified co-occurring registered keywords; and outputting, by the computer, the extracted categories, with at least an indication of each category's relative importance, as category candidates for categorizing the keyword to be categorized. 2. A method in accordance with claim 1 , wherein the degree of similarity is determined on the basis of the number of insertion, deletion, and/or substitution edits required to be performed on the keyword to be categorized for the resulting edited word to match a registered keyword. 3. A method in accordance with claim 1 , wherein the degree of importance of the extracted categories is determined on the basis of the number of identified registered keywords associated with each extracted category. 4. A method in accordance with claim 1 , wherein the degree of importance of an extracted category is determined on the basis of the number of identified registered keywords associated with the category that are identified registered keywords associated with another category. 5. A computer program product for categorizing keywords, the computer program product comprising: one or more computer-readable storage devices and program instructions stored on the one or more computer-readable storage devices, the program instructions comprising: program instructions to receive a keyword to be categorized; program instructions to receive a category dictionary including categories having associated respective pluralities of registered keywords; program instructions to receive a text corpus; program instructions to identify one or more registered keywords in the category dictionary having a degree of similarity to the keyword to be categorized that is equal to or greater than a predetermined value, and to extract the categories associated with the identified registered keywords; program instructions to identify or more registered keywords co-occurring in the text corpus with the keyword to be categorized, and to extract the categories associated with the identified co-occurring registered keywords; program instructions to determine a degree of importance of each extracted category based on a function of the identified registered keywords in the category dictionary and/or a function of the identified co-occurring registered keywords; and program instructions to output the extracted categories, with at least an indication of each category's relative importance, as category candidates for categorizing the keyword to be categorized. 6. A computer program product in accordance with claim 5 , wherein the degree of similarity is determined on the basis of the number of insertion, deletion, and/or substitution edits required to be performed on the keyword to be categorized for the resulting edited word to match a registered keyword. 7. A computer program product in accordance with claim 5 , wherein the degree of importance of the extracted categories is determined on the basis of the number of identified registered keywords associated with each extracted category. 8. A computer program product in accordance with claim 5 , wherein the degree of importance of an extracted category is determined on the basis of the number of identified registered keywords associated with the category that are identified registered keywords associated with another category. 9. A computer system for categorizing keywords, the computer system comprising: one or more computer processors, one or more computer-readable storage devices, and program instructions stored on one or more of the computer-readable storage devices for execution by at least one of the one or more processors, the program instructions comprising: program instructions to receive a keyword to be categorized; program instructions to receive a category dictionary including categories having associated respective pluralities of registered keywords; program instructions to receive a text corpus; program instructions to identify one or more registered keywords in the category dictionary having a degree of similarity to the keyword to be categorized that is equal to or greater than a predetermined value, and to extract the categories associated with the identified registered keywords; program instructions to identify or more registered keywords co-occurring in the text corpus with the keyword to be categorized, and to extract the categories associated with the identified co-occurring registered keywords; program instructions to determine a degree of importance of each extracted category based on a function of the identified registered keywords in the category dictionary and/or a function of the identified co-occurring registered keywords; and program instructions to output the extracted categories, with at least an indication of each category's relative importance, as category candidates for categorizing the keyword to be categorized. 10. A computer system in accordance with claim 9 , wherein the degree of similarity is determined on the basis of the number of insertion, deletion, and/or substitution edits required to be performed on the keyword to be categorized for the resulting edited word to match a registered keyword. 11. A computer system in accordance with claim 9 , wherein the degree of importance of the extracted categories is determined on the basis of the number of identified registered keywords associated with each extracted category. 12. A computer system in accordance with claim 9 , wherein the degree of importance of an extracted category is determined on the basis of the number of identified registered keywords associated with the category that are identified registered keywords associated with another category.

Assignees

Inventors

Classifications

  • G06F40/242Primary

    Dictionaries · CPC title

  • Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10606944B2 cover?
A keyword to be categorized is received. A category dictionary including categories having associated registered keywords, and a text corpus are received. Registered keywords are identified in the category dictionary having a degree of similarity to the keyword to be categorized that is equal to or greater than a predetermined value, and the categories associated with the identified registered …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/242. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 31 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).