Method for disambiguated features in unstructured text

US2016110446A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016110446-A1
Application numberUS-201514979703-A
CountryUS
Kind codeA1
Filing dateDec 28, 2015
Priority dateDec 2, 2013
Publication dateApr 21, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for disambiguating features in unstructured text is provided. The disclosed method may not require pre-existing links to be present. The method for disambiguating features in unstructured text may use co-occurring features derived from both the source document and a large document corpus. The disclosed method may include multiple modules, including a linking module for linking the derived features from the source document to the co-occurring features of an existing knowledge base. The disclosed method for disambiguating features may allow identifying unique entities from a knowledge base that includes entities with a unique set of co-occurring features, which in turn may allow for increased precision in knowledge discovery and search results, employing advanced analytical methods over a massive corpus, employing a combination of entities, co-occurring entities, topic IDs, and other derived features.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: in response to receiving, by a server, a search query from a client: searching, by the server, a set of records comprising a co-occurring feature, wherein the server comprises a main memory hosting a database, wherein the database stores a first cluster, wherein the first cluster comprises a disambiguated primary feature with a unique identifier and a set of secondary features, wherein the first cluster comprises a first score; identifying, by the server, a record in the set of records, wherein the record matches an extracted feature such that the extracted feature is a primary feature; associating, by the server, the extracted feature with a topic identifier; disambiguating, by the server, the primary feature based on a relatedness of the topic identifier; identifying, by the server, the set of secondary features based on the relatedness; disambiguating, by the server, the primary feature from the set of secondary features based on the relatedness; accessing, by the server, the database; linking, by the server, in real-time, during the accessing, the primary feature to the set of secondary features; forming, by the server, a second cluster based on the linking, wherein the second cluster comprises a second score; comparing, by the server, the first score against the second score; determining, by the server, whether the first score matches the second score; identifying, by the server, the unique identifier related to the primary feature in the first cluster based on the first score matching the second score; amending, by the server, based on the identifying the unique identifier, the first cluster such that the first cluster includes the second cluster; and sending, by the server, the unique identifier to the client. 2 . The method of claim 1 , further comprising: comparing, by the server, each member of the set of records which matches the extracted feature against a data item; assigning, by the server, a third score to the extracted feature based on the comparing of the each of the member. 3 . The method of claim 2 , further comprising: associating, by the server, the extracted feature with a feature attribute. 4 . The method of claim 3 , wherein the feature attribute is weighted. 5 . The method of claim 2 , further comprising: determining, by the server, a relatedness of the extracted feature based on the feature attribute. 6 . The method of claim 1 , wherein the primary feature is associated with a feature attribute. 7 . The method of claim 1 , wherein the extracted feature is associated with a lower-ordinal feature in accordance with a cluster hierarchy. 8 . The method of claim 1 , wherein the searching is in a fuzzy manner. 9 . The method of claim 1 , further comprising: comparing, by the server, a first feature against a second feature, wherein the first feature comprises the extracted feature, wherein the first feature is provided via a first data source, wherein the second feature is provided via a second data source; determining, by the server, if the first feature co-occurs in the second data source based on the comparing of the first feature against the second feature; linking, by the server, at least one of the first data source or the second data source. 10 . The method of claim 1 , further comprising: determining, by the server, a co-occurrence of the extracted feature in a plurality of data sources; improving, by the server, a rate of accuracy of the disambiguating based on the determining of the co-occurrence of the extracted feature. 11 . A method comprising: in response to receiving, by a server, a search query from a client: searching, by the server, based on the receiving, a set of records comprising a co-occurring feature, wherein the server comprises a main memory hosting a database, wherein the database stores a first cluster, wherein the first cluster comprises a disambiguated primary feature with a first unique identifier and a set of secondary features, wherein the first cluster comprises a first score; identifying, by the server, a record in the set of records, wherein the record matches an extracted feature such that the extracted feature is a first primary feature; associating, by the server, the extracted feature with a topic identifier; disambiguating, by the server, the first primary feature based on a relatedness of the topic identifier; identifying, by the server, the set of secondary features based on the relatedness; disambiguating, by the server, the first primary feature from the set of secondary features based on the relatedness; accessing, by the server, the database; linking, by the server, in real-time, during the accessing, the first primary feature to the set of secondary features; forming, by the server, a second cluster based on the linking, wherein the second cluster comprises a second score; comparing, by the server, the first score against the second score; determining, by the server, whether the first score matches the second score; generating, by the server, a third cluster based on the first score not matching the second score, wherein the third cluster comprises a second primary feature; assigning, by the server, a second unique identifier to the second primary feature; sending, by the server, the second unique identifier to the client. 12 . The method of claim 1 , further comprising: comparing, by the server, each member of the set of records which matches the extracted feature against a data item; assigning, by the server, a third score to the extracted feature based on the comparing of the each of the member. 13 . The method of claim 2 , further comprising: associating, by the server, the extracted feature with a feature attribute. 14 . The method of claim 3 , wherein the feature attribute is weighted. 15 . The method of claim 2 , further comprising: determining, by the server, a relatedness of the extracted feature based on the feature attribute. 16 . The method of claim 1 , wherein at least one of the first primary feature or the second primary feature is associated with a feature attribute. 17 . The method of claim 1 , wherein the extracted feature is associated with a lower-ordinal feature in accordance with a cluster hierarchy. 18 . The method of claim 1 , wherein the searching is in a fuzzy manner. 19 . The method of claim 1 , further comprising: comparing, by the server, a first feature against a second feature, wherein the first feature comprises the extracted feature, wherein the first feature is provided via a first data source, wherein the second feature is provided via a second data source; determining, by the server, if the first feature co-occurs in the second data source based on the comparing of the first feature against the second feature; linking, by the server, at least one of the first data source or the second data source. 20 . The method of claim 1 , further comprising: determining, by the server, a co-occurrence of the extracted feature in a plurality of data sources; improving, by the server, a rate of accuracy of the disambiguating based on the determining of the co-occurrence of the extracted feature.

Assignees

Inventors

Classifications

  • ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title

  • using natural language analysis · CPC title

  • Recognition of textual entities · CPC title

  • Machine learning, data mining or chemometrics · CPC title

  • G06F16/35Primary

    Clustering; Classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016110446A1 cover?
A method for disambiguating features in unstructured text is provided. The disclosed method may not require pre-existing links to be present. The method for disambiguating features in unstructured text may use co-occurring features derived from both the source document and a large document corpus. The disclosed method may include multiple modules, including a linking module for linking the deri…
Who is the assignee on this patent?
Qbase Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/35. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 21 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).