What technology area does this patent fall under?

Primary CPC classification G06F16/35. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 19 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method for disambiguated features in unstructured text

US9239875B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9239875-B2
Application number	US-201414557794-A
Country	US
Kind code	B2
Filing date	Dec 2, 2014
Priority date	Dec 2, 2013
Publication date	Jan 19, 2016
Grant date	Jan 19, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for disambiguating features in unstructured text is provided. The disclosed method may not require pre-existing links to be present. The method for disambiguating features in unstructured text may use co-occurring features derived from both the source document and a large document corpus. The disclosed method may include multiple modules, including a linking module for linking the derived features from the source document to the co-occurring features of an existing knowledge base. The disclosed method for disambiguating features may allow identifying unique entities from a knowledge base that includes entities with a unique set of co-occurring features, which in turn may allow for increased precision in knowledge discovery and search results, employing advanced analytical methods over a massive corpus, employing a combination of entities, co-occurring entities, topic IDs, and other derived features.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: in response to receiving a search query from an end user device: searching, by a node of a system, a set of candidate records including co-occurring features to identify one or more candidate records matching one or more extracted features, wherein an extracted feature that matches a candidate record is a primary feature, wherein the node comprises a main memory hosting an in-memory database, wherein the in-memory database stores a knowledge base of clusters, each cluster comprises a disambiguated primary feature with a unique identifier (“unique ID”), and a set of associated secondary features; associating, by the node, each of the extracted features with one or more machine-generated topic identifiers (“topic IDs”); disambiguating, by the node, each of the primary features from one another based on relatedness of topic IDs; identifying, by the node, a set of secondary features associated with each primary feature based upon the relatedness of topic IDs; disambiguating, by the node, each of the primary features from each of the secondary features in the associated set of secondary features based on relatedness of topic IDs; linking, by the node, in real-time, as data is retrieved from the knowledgebase from the in-memory database, each primary feature to the associated set of secondary features to form a new cluster; determining, by a disambiguation module of the in-memory database of the node, whether each of the new cluster matches an existing knowledgebase cluster by assignment of relative matching scores to existing knowledge clusters with disambiguated primary features, wherein, when there is a match, determining, an existing unique ID corresponding to each matching primary feature in the existing knowledgebase cluster and updating the existing knowledgebase cluster to include the new cluster; when there is no match, creating, a new knowledgebase cluster and assigning a new unique ID to the primary feature of the new knowledgebase cluster; and transmitting, one of the existing unique ID and the new unique ID for the primary feature to the user device. 2. The method according to claim 1 , further comprising: comparing, by the node, each of the candidate records matching an extracted feature; and assigning, by the node, a weighted match score result to each of the extracted features based upon the comparison. 3. The method according to claim 2 , further comprising associating, by the node, each of the extracted features with a set of weighted feature attributes. 4. The method according to claim 3 , further comprising determining, by the node, relatedness of each of the extracted features based on one or more weighted feature attributes. 5. The method according to claim 1 , further comprising: recognizing and extracting, by an extraction module of the node, one or more extracted features, wherein one or more primary features are identified in the one or more extracted features; and storing, by the extraction module of the node, each of the extracted features in a database. 6. The method according to claim 5 , further comprising assigning, by the extraction module of the node, an extraction certainty score to each of the features. 7. The method according to claim 1 , wherein each primary feature is associated with a set of one or more feature attributes. 8. The method according to claim 7 , wherein a feature attribute is selected from the group consisting of: a topic ID, a document identifier (“document ID”), a feature type, a feature name, a confidence score, and a feature position. 9. The method according to claim 1 , wherein each associated feature is associated with a set of lower-ordinal features according to a pre-defined cluster hierarchy. 10. The method according to claim 1 , further comprising performing, by a node, a fuzzy key search of the set of candidate records. 11. The method according to claim 7 , further comprising linking, by a link-on-the fly module of the node, two or more data sources based on co-occurrence of related topic IDs and one or more feature attributes. 12. The method according to claim 1 , further comprising: determining, by the node, whether an extracted feature in a data source co-occurs in a second data source by comparing the extracted feature with a feature in the second data source; and linking, by the node, each of the data sources based upon the comparison. 13. The method according to claim 1 , further comprising analyzing, by the node, co-occurrence of an extracted feature from different data sources to improve accuracy of disambiguating extracted features. 14. The method according to claim 1 , further comprising: continuously receiving, by the node, one or more new data sources; continuously extracting, by the node, one or more extracted features; continuously performing, by the node, candidate searching on the one or more extracted features; continuously disambiguating, by the node, the extracted features; and continuously linking, by the node, the extracted features into one or more new clusters. 15. A non-transitory computer readable medium having stored thereon computer executable instructions when executed by a processor performs functions comprising: in response to receiving a search query from an end user device: searching, by a node of a system, a set of candidate records including co-occurring features to identify one or more candidates records matching one or more extracted features, wherein the node comprises a main memory hosting the in-memory database, wherein the node comprises a main memory hosting an in-memory database, wherein the in-memory database stores a knowledge base of clusters, each cluster comprises a disambiguated primary feature with a unique identifier (“unique ID”), and a set of associated secondary features; associating, by the node, each of the extracted features with one or more machine-generated topic identifiers (“topic IDs”); disambiguating, by the node, each of the primary features from one another based on relatedness of topic IDs; identifying, by the node, a set of secondary features associated with each primary feature based upon the relatedness of topic IDs; disambiguating, by the node, each of the primary features from each of the secondary features in the associated set of secondary features based on relatedness of topic IDs; linking, by the node, in real-time, as data is retrieved from the knowledgebase from the in-memory database, each primary feature to the associated set of secondary features to form a new cluster; determining, by a disambiguation module of the in-memory database of the node, whether each of the new cluster matches an existing knowledgebase cluster by assignment of relative matching scores to existing knowledge clusters with disambiguated primary features, wherein, when there is a match, determining, an existing unique ID corresponding to each matching primary feature in the existing knowledgebase cluster and updating the existing knowledgebase cluster to include the new cluster; when there is no match, creating, a new knowledgebase cluster and assigning a new unique ID to the primary feature of the new knowledgebase cluster; and transmitting, one of the existing unique ID and the new unique ID for the primary feature to the user device. 16. The non-transitory computer readable medium according to claim 15 , wherein the instructions further comprise: comparing, by the node, each of the candidate records matching an extracted feature; and assigning a weighted match score result to each

Assignees

Qbase Llc

Inventors

Classifications

G06F16/3344
using natural language analysis · CPC title
G16C20/70
Machine learning, data mining or chemometrics · CPC title
G16B40/00
ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title
G06F16/35Primary
Clustering; Classification · CPC title
G06F40/279
Recognition of textual entities · CPC title

Patent family

Related publications grouped by family.

View patent family 53265533

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9239875B2 cover?: A method for disambiguating features in unstructured text is provided. The disclosed method may not require pre-existing links to be present. The method for disambiguating features in unstructured text may use co-occurring features derived from both the source document and a large document corpus. The disclosed method may include multiple modules, including a linking module for linking the deri…
Who is the assignee on this patent?: Qbase Llc
What technology area does this patent fall under?: Primary CPC classification G06F16/35. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 19 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).