Who is the assignee on this patent?

Zhou Bao-Yao, Luo Ping, Yang sheng-wen, and 3 more

What technology area does this patent fall under?

Primary CPC classification G06F40/205. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 13 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Document key phrase extraction method

US8935260B2 · US · B2

Patent metadata
Field	Value
Publication number	US-8935260-B2
Application number	US-200913264806-A
Country	US
Kind code	B2
Filing date	May 12, 2009
Priority date	May 12, 2009
Publication date	Jan 13, 2015
Grant date	Jan 13, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method of extracting key phrases from a document is disclosed comprising the steps of accessing a repository comprising linked subjects, the repository comprising first and second data structures representing the relationship between said subjects using different representation criteria; pruning the first data structure by removing links between subjects based on a further relationship between said subjects in the second data structure; matching phrases in said document to subjects in the pruned first data structure; further pruning the pruned first data structure by removing unmatched subjects that are not linked to matched subjects; determining a ranking for each matched subject; and selecting key phrases using the determined subject rankings. A computer program for implementing the steps of this method when executed on a computer is also disclosed.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method of extracting key phrases from a document comprising: accessing a repository comprising hyperlinked subjects, the repository comprising first and second data structures representing the relationship between said hyperlinked subjects using different representation criteria; pruning the first data structure by removing hyperlinks between subjects based on a further relationship between said subjects in the second data structure; matching phrases in said document to said subjects in the pruned first data structure; further pruning the pruned first data structure by removing unmatched subjects that are not hyperlinked to matched subjects; determining a ranking for each matched subject; and selecting key phrases using the determined subject rankings, wherein the first data structure is a directional graph comprising the subjects as nodes and the hyperlinks between subjects as edges between nodes; the second data structure is a directional graph comprising organized subject categories; and the further relationship comprises the shortest distance between respective categories to which respective subjects belong in the second data structure, the hyperlink between said subjects being removed if the shortest distance exceeds a threshold value. 2. The method of claim 1 , wherein the threshold value is configurable. 3. The method of claim 2 , further comprising restoring a hyperlink between subjects in said pruned first data structure if a bidirectional hyperlink exists between the subjects in said repository. 4. The method of claim 1 , wherein the phrase matching step includes a disambiguation evaluation step. 5. The method of claim 1 , further comprising adding a bi-directional hyperlink between matched subjects prior to said further pruning step, wherein said bi-directional hyperlink is added if the phrases matched to said subjects occur in the document within a defined distance from each other. 6. The method of claim 5 , wherein the defined distance is configurable. 7. The method of claim 1 , wherein the matched subject ranking step utilizes an algorithm considering the number of hyperlinks to a subject and the ranking of the subjects from which said hyperlinks originate. 8. The method of claim 1 , wherein the subject ranking, step further comprises determining an initial ranking based on the number of occurrences of the corresponding phrase in the document. 9. The method of claim 1 , wherein the repository is an Internet-accessible database. 10. The hod of claim 9 , wherein the database is Wikipedia. 11. The method of claim 1 , further comprising extracting key phrases from a further document by repeating the phrase matching, further pruning, subject ranking, and key phrase selection steps for the further document. 12. The method of claim 1 , further comprising inserting the hyperlinks to the respective subjects corresponding to the selected key phrases into the document. 13. A non-transitory computer-readable data storage device comprising instructions which cause the computer program to: access a repository corn rising hyperlinked subjects, the repository comprising first and second data structures representing the relationship between said hyperlinked subjects using different representation criteria; prune the first data structure by removing hyperlinks between subjects based on a further relationship between said subjects in the second data structure; match phrases in said document to said subjects in the pruned first data structure; further prune the pruned first data structure by removing unmatched subjects that are not determine a ranking for each matched subject; and select key phrases using the determined subject rankings, wherein the first data structure is a directional graph comprising the subjects as nodes and the hyperlinks between subjects as edges between nodes; the second data structure is a directional graph comprising organized subject categories; and the further relationship comprises the shortest distance between respective categories to which respective subjects belong in the second data structure, the hyperlink between said subjects being removed if the shortest distance exceeds a threshold value.

Assignees

Inventors

Classifications

G06F16/2246
Trees, e.g. B+trees · CPC title
G06F16/9027
Trees · CPC title
G06F40/258
Heading extraction; Automatic titling; Numbering · CPC title
G06F40/205Primary
Parsing · CPC title
G06N5/022Primary
Knowledge engineering; Knowledge acquisition · CPC title

Patent family

Related publications grouped by family.

View patent family 43084601

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US8935260B2 cover?: A computer-implemented method of extracting key phrases from a document is disclosed comprising the steps of accessing a repository comprising linked subjects, the repository comprising first and second data structures representing the relationship between said subjects using different representation criteria; pruning the first data structure by removing links between subjects based on a furthe…
Who is the assignee on this patent?: Zhou Bao-Yao, Luo Ping, Yang sheng-wen, and 3 more
What technology area does this patent fall under?: Primary CPC classification G06F40/205. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 13 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).