Data extraction and transformation method and system

US9898515B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9898515-B1
Application numberUS-201414527345-A
CountryUS
Kind codeB1
Filing dateOct 29, 2014
Priority dateOct 29, 2014
Publication dateFeb 20, 2018
Grant dateFeb 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for processing raw transaction records received from multiple data sources. The system and method receive multiple raw transaction records from multiple data sources. Transaction pair records are generated from the raw transaction records. Location and entity fields including raw information are identified from the transaction pair records. The raw location and entity information is resolved to generate resolved location and entity information capable of aggregation and further processing, such as the deriving of analytics.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, by a processing device, a plurality of raw transaction records from a plurality of data sources; identifying transaction pairs from the raw transaction records, transaction pairs including multiple transactions relating to a common transaction between a transaction source and a transaction destination, at least some transaction pairs including a source transaction, an intermediate transaction, and a destination transaction; generating a plurality of transaction pair records from the identified transaction pairs, wherein each transaction pair record comprises a plurality of related raw transaction records; identifying one or more selected fields corresponding to one or more selected data categories from each of the plurality of transaction pair records, wherein the one or more selected fields comprise raw information; wherein the format of the one or more selected fields varies among the transaction pair records such that selected fields are identified based on the use of at least one field identification technique that applies transaction record rules to determine selected fields in at least some transaction pair records; wherein the one or more selected fields includes at least an entity field; determining pair match scores corresponding to a plurality of candidate entity names using a similarity measure; identifying a set of top candidate entity names having similar pair match scores; performing list matching on the set of top candidate entity names using an adjusted similarity measure to identify a top match; establishing the top match as the resolved entity information; resolving the raw information in the one or more identified selected fields to generate resolved information corresponding to the one or more data categories; and aggregating the resolved information for storing in a data store. 2. The computer-implemented method of claim 1 , wherein the one or more selected fields corresponding to one or more data categories comprise at least one location field and at least one entity field. 3. The computer-implemented method of claim 2 , wherein the at least one location field comprises raw location information and the at least one entity field comprises raw entity information. 4. The computer-implemented method of claim 3 , wherein resolving the raw location information further comprises: extracting the raw location information from the one or more location fields; searching one or more geographic databases based on the extracted raw location information; identifying, based on the search, a plurality of candidate locations comprising city information and state information; determining a score for each of the plurality of candidate locations; and identifying a resolved location based on the scores for each of the plurality of candidate locations. 5. The computer-implemented method of claim 3 , wherein resolving the raw entity information further comprises: searching an entity database based on an entity query associated with the raw entity information to identify a plurality of candidate entity names; and performing pairwise matching based on the identified plurality of candidate entity names to generate a pair match score for each of the identified plurality of candidate entity names. 6. A system comprising: a memory; and a processing device coupled to the memory, the processing device configured to: receive a plurality of raw transaction records from a plurality of data sources; identify transaction pairs from the raw transaction records, the transaction pairs including multiple transactions relating to a common transaction between a transaction source and a transaction destination, at least some transaction airs including a source transaction, an intermediate transaction, and a destination transaction; generate a plurality of transaction pair records from the identified transaction pairs, wherein each transaction pair record comprises a plurality of related raw transaction records; identify one or more selected fields corresponding to one or more selected data categories from each of the plurality of transaction pair records, wherein the one or more selected fields comprise raw information; wherein the format of the one or more selected fields varies among the transaction pair records such that selected fields are identified based on the use of at least one field identification technique that applies transaction record rules to determine selected fields in at least some transaction pair records; wherein the one or more selected fields includes at least an entity field; determine pair match scores corresponding to a plurality of candidate entity names using a similarity measure; identify a set of top candidate entity names having similar pair match scores; perform list matching on the set of top candidate entity names using an adjusted similarity measure to identify a top match; establish the top match as the resolved entity information; resolve the raw information in the one or more identified selected fields to generate resolved information corresponding to the one or more data categories; and aggregate the resolved information for storing in a data store. 7. The system of claim 6 , wherein the one or more selected fields corresponding to one or more selected data categories comprise at least one location field and at least one entity field. 8. The system of claim 7 , wherein the at least one location field comprises raw location information and the at least one entity field comprises raw entity information. 9. The system of claim 8 , wherein the processing device is configured to resolve the raw location information by: extracting the raw location information from the one or more location fields; searching one or more geographic databases based on the extracted raw location information; identifying, based on the search, a plurality of candidate locations comprising city information and state information; determining a score for each of the plurality of candidate locations; and identifying a resolved location based on the scores for each of the plurality of candidate locations. 10. The system of claim 8 , wherein the processing device is configured to resolve the raw entity information by: searching an entity database based on an entity query associated with the raw entity information to identify a plurality of candidate entity names; and performing pairwise matching based on the identified plurality of candidate entity names to generate a pair match score for each of the identified plurality of candidate entity names. 11. A non-transitory computer-readable storage medium comprising instructions that, when executed by a processing device, cause the processing device to perform operations comprising: receiving a plurality of raw transaction records from a plurality of data sources; identifying transaction pairs from the raw transaction records, the transaction pairs including multiple transactions relating to a common transaction between a transaction source and a transaction destination, at least some transaction airs including a source transaction, an intermediate transaction, and a destination transaction; generating a plurality of transaction pair records from the identified transaction pairs, wherein each transaction pair record comprises a plurality of related raw transaction records; identifying one or more selected fields corresponding to one or more selected data categories from each of the plurality of transaction pair records, wherein the one or more selected fields comprise raw information; wherein the format of the one or more selected fields varie

Assignees

Inventors

Classifications

  • Geographical information databases · CPC title

  • Database migration support · CPC title

  • G06F16/254Primary

    Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

  • Clustering or classification · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9898515B1 cover?
A system and method for processing raw transaction records received from multiple data sources. The system and method receive multiple raw transaction records from multiple data sources. Transaction pair records are generated from the raw transaction records. Location and entity fields including raw information are identified from the transaction pair records. The raw location and entity inform…
Who is the assignee on this patent?
Jpmorgan Chase Bank Na
What technology area does this patent fall under?
Primary CPC classification G06F16/254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).