Entity Display Priority in a Distributed Geographic Information System
US-2015169588-A1 · Jun 18, 2015 · US
US10025846B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10025846-B2 |
| Application number | US-201514853823-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 14, 2015 |
| Priority date | Sep 14, 2015 |
| Publication date | Jul 17, 2018 |
| Grant date | Jul 17, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Entity mappings that produce matching entities for a first data asset having attributes and a second data asset having attributes are generated by: generating entity mappings that produce matching entities for a first data asset having attributes with attribute values and a second data asset having attributes with attribute values by: matching the attribute values of the attributes of the first data asset with the attribute values of the attributes of the second data asset, using the matching attribute values to generate matching attribute pairs, and using the matching attribute pairs to identify entity mappings; computing an entity mapping score for each of the entity mappings based on a combination of factors; ranking the entity mappings based on each entity mapping score; and using some of the ranked entity mappings to determine whether a same real-world entity is described by the first data asset and the second data asset.
Opening claim text (preview).
What is claimed is: 1. A computer program product, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code executable by at least one processor to perform: generating entity mappings that produce matching entities for a first data asset having attributes with attribute values and a second data asset having attributes with attribute values by: matching the attribute values of the attributes of the first data asset with the attribute values of the attributes of the second data asset; using the matching attribute values to generate matching attribute pairs; and using the matching attribute pairs to identify entity mappings; computing an entity mapping score for each of the entity mappings based on a combination of factors; ranking the entity mappings based on each entity mapping score; and using the ranked entity mappings to determine which of the entity mappings are to be used to determine whether a same real-world entity is described by the first data asset and the second data asset. 2. The computer program product of claim 1 , wherein the program code is executable by the at least one processor to perform: generating a first inverted index of entity identifier pairs for the first data asset; generating a second inverted index of entity identifier pairs for the second data asset; and using the first inverted index and the second inverted index to generate the matching attribute pairs based on matching attribute values that form the entity mappings. 3. The computer program product of claim 1 , wherein values match fuzzily for the matching entities. 4. The computer program product of claim 1 , wherein, for computing the entity mapping score for each of the entity mappings comprises, the program code is executable by the at least one processor to perform wherein: generating an entity mapping score for factors selected from: a number of attributes involved in an entity mapping, a cardinality of that individual entity mapping, support of that entity mapping, a probability of one to one matching for that entity mapping, a join utility measure for that entity mapping, and a probability of previous user selections for that entity mapping; and adding the entity mapping score for each of the factors to generate the entity mapping score for that entity mapping. 5. The computer program product of claim 1 , wherein one of the first data asset and the second data asset is semi-structured data having hierarchical data that is flattened. 6. The computer program product of claim 1 , wherein one of the first data asset and the second data asset is an unstructured data asset formed by a collection of documents and is modelled based one of a bag of words and annotated words. 7. The computer program product of claim 1 , wherein the program code is executable by the at least one processor to perform: integrating the first data asset and the second data asset using ranked entity mappings by performing one of a join operation, a merge operation, and a union operation. 8. The computer program product of claim 1 , wherein a Software as a Service (SaaS) is configured to perform computer program product operations. 9. A computer system, comprising: one or more processors, one or more computer-readable memories and one or more computer-readable, tangible storage devices; and program instructions, stored on at least one of the one or more computer-readable, tangible storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to perform operations comprising: generating entity mappings that produce matching entities for a first data asset having attributes with attribute values and a second data asset having attributes with attribute values by: matching the attribute values of the attributes of the first data asset with the attribute values of the attributes of the second data asset; using the matching attribute values to generate matching attribute pairs; and using the matching attribute pairs to identify entity mappings; computing an entity mapping score for each of the entity mappings based on a combination of factors; ranking the entity mappings based on each entity mapping score; and using the ranked entity mappings to determine which of the entity mappings are to be used to determine whether a same real-world entity is described by the first data asset and the second data asset. 10. The computer system of claim 9 , wherein the operations further comprise: generating a first inverted index of entity identifier pairs for the first data asset; generating a second inverted index of entity identifier pairs for the second data asset; and using the first inverted index and the second inverted index to generate the matching attribute pairs based on matching attribute values that form the entity mappings. 11. The computer system of claim 9 , wherein values match fuzzily for the matching entities. 12. The computer system of claim 9 , wherein the operations for computing the entity mapping score for each of the entity mappings further comprise: generating an entity mapping score for factors selected from: a number of attributes involved in an entity mapping, a cardinality of that individual entity mapping, support of that entity mapping, a probability of one to one matching for that entity mapping, a join utility measure for that entity mapping, and a probability of previous user selections for that entity mapping; and adding the entity mapping score for each of the factors to generate the entity mapping score for that entity mapping. 13. The computer system of claim 9 , wherein one of the first data asset and the second data asset is semi-structured data having hierarchical data that is flattened. 14. The computer system of claim 9 , wherein one of the first data asset and the second data asset is an unstructured data asset formed by a collection of documents and is modelled based one of a bag of words and annotated words. 15. The computer system of claim 9 , wherein the operations further comprise: integrating the first data asset and the second data asset using ranked entity mappings by performing one of a join operation, a merge operation, and a union operation. 16. The computer system of claim 9 , wherein a Software as a Service (SaaS) is configured to perform computer system operations.
Inverted lists · CPC title
of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML (content-based retrieval of web data G06F16/95) · CPC title
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Mapping to a database · CPC title
Search customisation based on user profiles and personalisation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.