Sorting documents according to comprehensibility scores determined for the documents
US-2024119078-A1 · Apr 11, 2024 · US
US2016110446A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016110446-A1 |
| Application number | US-201514979703-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 28, 2015 |
| Priority date | Dec 2, 2013 |
| Publication date | Apr 21, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for disambiguating features in unstructured text is provided. The disclosed method may not require pre-existing links to be present. The method for disambiguating features in unstructured text may use co-occurring features derived from both the source document and a large document corpus. The disclosed method may include multiple modules, including a linking module for linking the derived features from the source document to the co-occurring features of an existing knowledge base. The disclosed method for disambiguating features may allow identifying unique entities from a knowledge base that includes entities with a unique set of co-occurring features, which in turn may allow for increased precision in knowledge discovery and search results, employing advanced analytical methods over a massive corpus, employing a combination of entities, co-occurring entities, topic IDs, and other derived features.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: in response to receiving, by a server, a search query from a client: searching, by the server, a set of records comprising a co-occurring feature, wherein the server comprises a main memory hosting a database, wherein the database stores a first cluster, wherein the first cluster comprises a disambiguated primary feature with a unique identifier and a set of secondary features, wherein the first cluster comprises a first score; identifying, by the server, a record in the set of records, wherein the record matches an extracted feature such that the extracted feature is a primary feature; associating, by the server, the extracted feature with a topic identifier; disambiguating, by the server, the primary feature based on a relatedness of the topic identifier; identifying, by the server, the set of secondary features based on the relatedness; disambiguating, by the server, the primary feature from the set of secondary features based on the relatedness; accessing, by the server, the database; linking, by the server, in real-time, during the accessing, the primary feature to the set of secondary features; forming, by the server, a second cluster based on the linking, wherein the second cluster comprises a second score; comparing, by the server, the first score against the second score; determining, by the server, whether the first score matches the second score; identifying, by the server, the unique identifier related to the primary feature in the first cluster based on the first score matching the second score; amending, by the server, based on the identifying the unique identifier, the first cluster such that the first cluster includes the second cluster; and sending, by the server, the unique identifier to the client. 2 . The method of claim 1 , further comprising: comparing, by the server, each member of the set of records which matches the extracted feature against a data item; assigning, by the server, a third score to the extracted feature based on the comparing of the each of the member. 3 . The method of claim 2 , further comprising: associating, by the server, the extracted feature with a feature attribute. 4 . The method of claim 3 , wherein the feature attribute is weighted. 5 . The method of claim 2 , further comprising: determining, by the server, a relatedness of the extracted feature based on the feature attribute. 6 . The method of claim 1 , wherein the primary feature is associated with a feature attribute. 7 . The method of claim 1 , wherein the extracted feature is associated with a lower-ordinal feature in accordance with a cluster hierarchy. 8 . The method of claim 1 , wherein the searching is in a fuzzy manner. 9 . The method of claim 1 , further comprising: comparing, by the server, a first feature against a second feature, wherein the first feature comprises the extracted feature, wherein the first feature is provided via a first data source, wherein the second feature is provided via a second data source; determining, by the server, if the first feature co-occurs in the second data source based on the comparing of the first feature against the second feature; linking, by the server, at least one of the first data source or the second data source. 10 . The method of claim 1 , further comprising: determining, by the server, a co-occurrence of the extracted feature in a plurality of data sources; improving, by the server, a rate of accuracy of the disambiguating based on the determining of the co-occurrence of the extracted feature. 11 . A method comprising: in response to receiving, by a server, a search query from a client: searching, by the server, based on the receiving, a set of records comprising a co-occurring feature, wherein the server comprises a main memory hosting a database, wherein the database stores a first cluster, wherein the first cluster comprises a disambiguated primary feature with a first unique identifier and a set of secondary features, wherein the first cluster comprises a first score; identifying, by the server, a record in the set of records, wherein the record matches an extracted feature such that the extracted feature is a first primary feature; associating, by the server, the extracted feature with a topic identifier; disambiguating, by the server, the first primary feature based on a relatedness of the topic identifier; identifying, by the server, the set of secondary features based on the relatedness; disambiguating, by the server, the first primary feature from the set of secondary features based on the relatedness; accessing, by the server, the database; linking, by the server, in real-time, during the accessing, the first primary feature to the set of secondary features; forming, by the server, a second cluster based on the linking, wherein the second cluster comprises a second score; comparing, by the server, the first score against the second score; determining, by the server, whether the first score matches the second score; generating, by the server, a third cluster based on the first score not matching the second score, wherein the third cluster comprises a second primary feature; assigning, by the server, a second unique identifier to the second primary feature; sending, by the server, the second unique identifier to the client. 12 . The method of claim 1 , further comprising: comparing, by the server, each member of the set of records which matches the extracted feature against a data item; assigning, by the server, a third score to the extracted feature based on the comparing of the each of the member. 13 . The method of claim 2 , further comprising: associating, by the server, the extracted feature with a feature attribute. 14 . The method of claim 3 , wherein the feature attribute is weighted. 15 . The method of claim 2 , further comprising: determining, by the server, a relatedness of the extracted feature based on the feature attribute. 16 . The method of claim 1 , wherein at least one of the first primary feature or the second primary feature is associated with a feature attribute. 17 . The method of claim 1 , wherein the extracted feature is associated with a lower-ordinal feature in accordance with a cluster hierarchy. 18 . The method of claim 1 , wherein the searching is in a fuzzy manner. 19 . The method of claim 1 , further comprising: comparing, by the server, a first feature against a second feature, wherein the first feature comprises the extracted feature, wherein the first feature is provided via a first data source, wherein the second feature is provided via a second data source; determining, by the server, if the first feature co-occurs in the second data source based on the comparing of the first feature against the second feature; linking, by the server, at least one of the first data source or the second data source. 20 . The method of claim 1 , further comprising: determining, by the server, a co-occurrence of the extracted feature in a plurality of data sources; improving, by the server, a rate of accuracy of the disambiguating based on the determining of the co-occurrence of the extracted feature.
ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title
using natural language analysis · CPC title
Recognition of textual entities · CPC title
Machine learning, data mining or chemometrics · CPC title
Clustering; Classification · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.