Just-in-Time Data Quality Assessment for Best Record Creation
US-2015006491-A1 · Jan 1, 2015 · US
US9779146B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9779146-B2 |
| Application number | US-201414175161-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 7, 2014 |
| Priority date | Feb 7, 2014 |
| Publication date | Oct 3, 2017 |
| Grant date | Oct 3, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The subject matter disclosed herein provides methods for identifying duplicate data records using a graphical user interface. One or more data records may be accessed from one or more source files. The data records may have one or more data fields associated with one or more data types. One or more match themes may be proposed based on the data types. The match themes may have one or more rules for identifying duplicate data records. A selection of a match theme and at least one rule associated with the selected match theme may be received. The data records may be processed using the selected match theme and rules to identify the duplicate data records. A graphical user interface previewing the duplicate data records may be displayed. The duplicate data records may be organized into match groups. Related apparatus, systems, techniques, and articles are also described.
Opening claim text (preview).
What is claimed is: 1. A method comprising: accessing, by at least one processor, one or more data records from one or more source files, the one or more data records having one or more data fields associated with one or more data types; determining, based on the one or more data types of the one or more data records, one or more match themes; generating, by the at least one processor, a graphical user interface for presentation at a display, the graphical user interface being configured to enable selection of the determined one or more match themes; receiving, by the at least one processor, a selection of a match theme being presented via the graphical user interface, wherein the generated graphical user interface includes, in response to the received selection of the match theme, one or more match rules for the selected match theme, the one or more match rules enabling identification of one or more duplicate data records from the accessed one or more data records; receiving, by the at least one processor, a selection of at least one match rule being presented in response to the received selection of the match theme and presented via the graphical user interface; processing, by the at least one processor, the one or more data records using the selected match theme and the selected at least one match rule to identify the one or more duplicate data records; and displaying a preview graphical user interface that provides a preview of the one or more duplicate data records, the one or more duplicate data records organized into a match group, wherein the match group includes matching records selected based on the match theme and the selected at least one match rule, the match group including the one or more duplicate data records further including a near matching record representing a possible duplicate record, wherein the one or more duplicate data records including the near matching record being previewed is adjusted in response to a selection at a graphical slider, wherein the adjustment of the graphical slider adjusts a match strictness of the selected at least one match rule such that moving the graphical slider in a first direction tightens one or more options associated with matching in accordance with the selected at least one match rule and removes at least the near matching record from the match group being displayed via the preview graphical user interface, wherein moving the graphical slider in a second, opposite direction loosens the one or more options associated with matching in accordance with the selected at least one match rule and adds at least the near matching record from the match group being displayed via the preview graphical user interface, and wherein the preview graphical user interface further displays a message identifying a status of a removed duplicate data record, the status comprising one or more of a transfer of the removed duplicate data record to a new match group and/or a failure of the removed duplicate record to match with any of the one or more match groups. 2. The method of claim 1 , wherein the preview graphical user interface further displays one or more change indicators identifying a change to one or more duplicate data records in the at least one match group, and wherein the change to the one or more duplicate data records comprise one or more of a removal of a duplicate data record from the at least one match group and an addition of a new duplicate data record to the at least one match group. 3. The method of claim 2 , wherein the preview graphical user interface further displays the message to identify the change in a floating window in response to hovering over the change indicator. 4. The method of claim 1 further comprising displaying one or more statistics relating to the one or more data records, the one or more statistics including one or more of the following: a first quantity representing a number of the match groups after the processing, a second quantity representing a number of changes to the match groups after the processing, and a list of changes to the match groups based on the processing. 5. The method of claim 1 , wherein the preview graphical user interface further displays a list of one or more advanced matching options defining additional match conditions, the one or more advanced matching options based on the one or more data. 6. The method of claim 1 , wherein the match groups comprise two or more data records satisfying one or more of the following conditions: the two or more data records have identical data values in each of the data types associated with the selected match theme and the at least one rule; the two or more data records are a near match; the two or more data records are a suspect match; and the two more data records are a conflicting match. 7. The method of claim 6 , wherein the preview graphical user interface further displays one or more review indicators associated with the one or more match groups, the one or more review indicators flagging the near match, the suspect match, or the conflicting match. 8. A non-transitory computer-readable medium containing instructions to configure a processor to perform operations comprising: accessing, by at least one processor, one or more data records from one or more source files, the one or more data records having one or more data fields associated with one or more data types; determining, based on the one or more data types of the one or more data records, one or more match themes; generating, by the at least one processor, a graphical user interface for presentation at a display, the graphical user interface being configured to enable selection of the determined one or more match themes; receiving, by the at least one processor, a selection of a match theme being presented via the graphical user interface, wherein the generated graphical user interface includes, in response to the received selection of the match theme, one or more match rules for the selected match theme, the one or more match rules enabling identification of one or more duplicate data records from the accessed one or more data records; receiving, by the at least one processor, a selection of at least one match rule being presented in response to the received selection of the match theme and presented via the graphical user interface; processing, by the at least one processor, the one or more data records using the selected match theme and the selected at least one match rule to identify the one or more duplicate data records; and displaying a preview graphical user interface that provides a preview of the one or more duplicate data records, the one or more duplicate data records organized into a match group, wherein the match group includes matching records selected based on the match theme and the selected at least one match rule, the match group including the one or more duplicate data records further including a near matching record representing a possible duplicate record, wherein the one or more duplicate data records including the near matching record being previewed is adjusted in response to a selection at a graphical slider, wherein the adjustment of the graphical slider adjusts a match strictness of the selected at least one match rule such that moving the graphical slider in a first direction tightens one or more options associated with matching in accordance with the selected at least one match rule and removes at least the near matching record from the match group being displayed via the preview graphical user interface, wherein moving the graphical slider in a second, opposite direction loosens the one or more options associated with matching in accordance with the selected at least one match rule and adds at least the nea
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Querying · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.