Matching data based on numeric difference

US9229971B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9229971-B2
Application numberUS-97396110-A
CountryUS
Kind codeB2
Filing dateDec 21, 2010
Priority dateDec 21, 2010
Publication dateJan 5, 2016
Grant dateJan 5, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for matching data based on numeric difference are described herein. Input data elements are parsed to identify a first number and a second number. A difference between the first number and the second number is calculated based on a predefined formula. Based on the difference, a matching score between the input data elements is evaluated. The matching score is proportional to a base matching score corresponding to a threshold difference, and a maximum score corresponding to a match between the first number and the second number. A similarity between the input data elements is reported based on the evaluated matching score.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system for matching data based on numeric difference, the computer system comprising a processor, the processor communicating with one or more memory devices storing instructions, the instructions operable to: receive a first data element from a first data source of multiple data sources communicatively accessible at said computer system, wherein said multiple data sources include one or more data sources selected from a group consisting of a file, a database table and an electronic message; receive a second data element from a second data source of said multiple data sources; parse said first data element to identify and convert numeric characters to a first number, wherein said first number includes at least one digit; parse said second data element to identify and convert numeric characters to a second number, wherein said second number includes at least one digit; at a runtime environment of said computer system, generate a matching score based on a numeric difference between the magnitudes of said first number and said second number, wherein said numeric difference is calculated at said computer system based on metadata accessible by said runtime environment; and consolidate said first data element with said second data element into a master record when said matching score is greater than or equal to a base score, wherein said master record includes information selected from one or more of said first data element and said second data element based on one or more priorities selected from a group consisting of source, frequency, completeness and recency. 2. The system of claim 1 , wherein generating said matching score further comprises: calculating said numeric difference based on a predefined proximity formula. 3. The system of claim 1 further comprising: assigning said matching score to said base score when said numeric difference equals a threshold value; assigning said matching score to a maximum score when said numeric difference is zero; and assigning said matching score to a proportional value between said base score and said maximum score when said numeric difference is between said threshold value and zero. 4. The system of claim 1 , wherein generating said matching score comprises: calculating a first numeric difference between the magnitudes of said first number and a predefined master value; calculating a second difference between the magnitudes of said second number and said predefined master value; and evaluating said matching score based on said first numeric difference and said second numeric difference. 5. An article of manufacture including non-transitory computer readable storage medium to tangibly store instructions for matching data based on numeric difference, which when executed by a computer, cause the computer to: receive a first string from a first data source of multiple data sources communicatively accessible at said computer, wherein said multiple data sources include one or more data sources selected from a group consisting of a file, a database table and an electronic message; receive a second string from a second data source of said multiple data sources; parse said first string to identify and convert at least one numeric character to a first number; parse said second string to identify and convert at least one numeric character to a second number; at a runtime environment of said computer, generate a matching score based on a numeric difference between the magnitudes of said first number and said second number, wherein said numeric difference is calculated at said computer based on metadata accessible by said runtime environment; report a level of similarity between said first string and said second string based on said matching score according to a base score; and consolidate said first string and said second string into a master record when said matching score is greater than or equal to a base score, wherein said master record includes information selected from one or more of said first string and said second string based on one or more priorities selected from a group consisting of source, frequency, completeness and recency. 6. The article of manufacture of claim 5 , wherein parsing said first string comprises: interpreting at least one non-numeric character of said first string as a separator between numbers. 7. The article of manufacture of claim 5 , wherein parsing said first string comprises: performing quality check to determine invalid data in said first string according to predefined criteria; and upon determining, replacing an invalid numerical fraction with at least one predefined numerical character. 8. The article of manufacture of claim 5 , wherein generating said matching score further comprises: calculating said numeric difference based on a formula corresponding to information type of said at least one numerical character of said first string. 9. The article of manufacture of claim 5 , wherein the non-transitory computer readable storage medium tangibly stores further instructions, which when executed by the computer cause the computer to: assign said matching score to said base score when said numeric difference equals a threshold value; assign said matching score to a maximum score when said numeric difference equals or is less than a minimum value; and assign said matching score to a proportional value between said base score and said maximum score when said numeric difference is between said threshold value and said minimum value. 10. The article of manufacture of claim 5 , wherein the non-transitory computer readable storage medium tangibly stores further instructions, which when executed by the computer cause the computer to: assign said matching score to a minimum score when said numeric difference is greater than a threshold value. 11. The article of manufacture of claim 5 , wherein generating said matching score comprises: calculating a first numeric difference between the magnitudes of said first number and a predefined master value; calculating a second numeric difference between the magnitudes of said second number and said predefined master value; and evaluating said matching score based on said first numeric difference and said second numeric difference. 12. The article of manufacture of claim 5 , wherein reporting the level of similarity between said first string and said second string comprises: displaying a message indicating duplicate data when said matching score is greater than or equal to said base score. 13. A computer implemented method for matching data based on numeric difference comprising: loading a first string in a computer memory from a first data source of multiple data sources, wherein said multiple data sources include one or more data sources selected from a group consisting of a file, a database table and an electronic message; loading a second string in said computer memory from a second data source of said multiple data sources; parsing said first string to identify and convert numeric characters to a first number; parsing said second string to identify and convert numeric characters to a second number; generating by a processor coupled to said memory a matching score based on a numeric difference between the magnitudes of said first number and said second number, wherein said numeric difference is calculated at said computer based on metadata accessible by said processor; indicating a level of correspondence between said first string and said second string based on said matching score according to a base score; and consolidating said first string and said second string int

Assignees

Inventors

Classifications

  • Ensuring data consistency and integrity · CPC title

  • Redundancy elimination performed by the file system (error detection or correction of the data by redundancy in operations G06F11/14) · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9229971B2 cover?
Systems and methods for matching data based on numeric difference are described herein. Input data elements are parsed to identify a first number and a second number. A difference between the first number and the second number is calculated based on a predefined formula. Based on the difference, a matching score between the input data elements is evaluated. The matching score is proportional to…
Who is the assignee on this patent?
Woody Jeffrey, Gujjewar Abhiram, Spiess Mark, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F16/2365. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 05 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).