Resolving pairwise links to groups

US9298753B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9298753-B2
Application numberUS-201313907723-A
CountryUS
Kind codeB2
Filing dateMay 31, 2013
Priority dateMay 31, 2013
Publication dateMar 29, 2016
Grant dateMar 29, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention extends to methods, systems, and computer program products for resolving pairwise links to groups. Embodiments of the invention use an iterative algorithm to transform a collection of pairwise links to groups of records that correspond to the same entity. The algorithm can be stopped after any number of iterations for an increasing accurate approximation result. The algorithm essentially guarantees a correct solution for groups of size up to the number of iterations. This algorithm scales linearly on the size of the record set, with little impact from the number of links.

First claim

Opening claim text (preview).

What is claimed: 1. A method for resolving a set of pairwise links to groups, the method comprising for one or more iterations over a plurality of records until a stop condition is satisfied, each iteration including: accessing a plurality of neighborhood data structures that are active, each neighborhood data structure of the plurality of neighborhood data structures corresponding to a different specified record from among the plurality of records, each neighborhood data structure of the plurality of neighborhood data structures including a record identifier as a label, the record identifier corresponding to a specified record included in the plurality of records, each neighborhood data structure of the plurality of neighborhood data structures also including an activity state indicator and a neighborhood, the activity state indicator indicating whether or not the neighborhood data structure is active, the neighborhood including a list of record identifiers for other records expressly linked to the specified record by a pairwise link, the other records also included in the plurality of records; for each neighborhood data structure of the plurality of neighborhood data structures, creating a message for each of the other records expressly linked to the specified record, the message including the label and the activity state indicator for the neighborhood data structure; subsequent to creating the messages, grouping the plurality of neighborhood data structures and the messages by the record identifiers such that each group includes one neighborhood data structure of the plurality of neighborhood data structures and one or more messages; for each grouping of the plurality of neighborhood data structures and the one or more messages, setting the label for the neighborhood data structures in the group to a minimum record identifier included in the group of the plurality of neighborhood data structures and the one or more messages; and when the minimum record identifier for a group of the plurality of neighborhood data structures and the one or more messages is the record identifier for the specified record of a particular neighborhood data structure, removing the particular neighborhood data structure from the plurality of neighborhood data structures that are active by setting the activity state indicator to indicate that the particular neighborhood data structure is not active. 2. The method as recited in claim 1 , further comprising subsequent to the stop condition being satisfied, for each neighborhood data structure of the plurality of neighborhood data structures: writing out the label. 3. The method of claim 1 , wherein the minimum record identifier is based on a lexicographical order associated with the record identifiers. 4. The method of claim 1 , wherein the stop condition indicates a specified maximum number of the one or more iterations to perform. 5. The method of claim 1 , wherein the stop condition is based on detecting that no neighborhood data structures of the plurality of neighborhood data structures are active. 6. The method claim 1 , wherein one or more of the plurality of records is associated with a retail purchase transaction. 7. A computer program product for implementing a method for resolving a set of pairwise links to groups, the computer program product comprising one or more computer storage devices having stored thereon computer-executable instructions that, when executed at a processor, perform the method including one or more iterations over a plurality of records until a stop condition is satisfied, each iteration including: access a plurality of neighborhood data structures that are active, each neighborhood data structure of the plurality of neighborhood data structures corresponding to a different specified record from among the plurality of records, each neighborhood data structure of the plurality of neighborhood data structures including a record identifier as a label, the record identifier corresponding to a specified record included in the plurality of records, each neighborhood data structure of the plurality of neighborhood data structures also including an activity state indicator and a neighborhood, the activity state indicator indicating whether or not the neighborhood data structure is active, the neighborhood including a list of record identifiers for other records expressly linked to the specified record by a pairwise link, the other records also included in the plurality of records; for each neighborhood data structure of the plurality of neighborhood data structures, create a message for each of the other records expressly linked to the specified record, the message including the label and the activity state indicator for the neighborhood data structure; subsequent to creating the messages, group the plurality of neighborhood data structures and the messages by the record identifiers such that each group includes one neighborhood data structure of the plurality of neighborhood data structures and one or more messages; for each grouping of the plurality of neighborhood data structures and the one or more messages, set the label for the neighborhood data structures in the group to a minimum record identifier included in the group of the plurality of neighborhood data structures and the one or more messages; and when the minimum record identifier for a group of the plurality of neighborhood data structures and the one or more messages is the record identifier for the specified record of a particular neighborhood data structures, remove the particular neighborhood data structure from the plurality of neighborhood data structures that are active by setting the activity state indicator to indicate that the particular neighborhood data structure is not active. 8. The computer program product of claim 7 , further comprising computer-executable instructions that, when executed subsequent to the stop condition being satisfied, for each neighborhood data structure of the plurality of neighborhood data structures, perform the method of: write out the label. 9. The computer program product of claim 7 , wherein the minimum record identifier is based on a lexicographical order associated with the record identifiers. 10. The computer program product of claim 7 , wherein the stop condition indicates a specified maximum number of the one or more iterations to perform. 11. The computer program product of claim 7 , wherein the stop condition is based on detecting that no neighborhood data structures of the plurality of neighborhood data structures are active. 12. At a computer system, the computer system including one or more processors, a system memory, and one or more local storage devices, the computer system communicatively coupled to a cluster of servers, a plurality of pairwise links of records being stored across the cluster of servers, each pairwise link of records of the plurality of pairwise links of records including a pair of records that correspond to a same entity, each record of the pair of records being identified by an assigned identifier, the system memory being insufficient to store an entirety of the plurality of pairwise links of records, a method for resolving the plurality of pairwise links of records into one or more groups of the records corresponding to the same entity, the method comprising: a neighborhood creation phase, including for each of the records: creating a neighborhood data structure for the record, the neighborhood data structure comprising a list of neighbors, the neighbors being other records linked to the record in at least one pairwise link of the plurality of pairwise links of records, the other records included i

Assignees

Inventors

Classifications

  • G06Q40/12Primary

    Accounting · CPC title

  • Geographical information databases · CPC title

  • Clustering or classification · CPC title

  • Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9298753B2 cover?
The present invention extends to methods, systems, and computer program products for resolving pairwise links to groups. Embodiments of the invention use an iterative algorithm to transform a collection of pairwise links to groups of records that correspond to the same entity. The algorithm can be stopped after any number of iterations for an increasing accurate approximation result. The algori…
Who is the assignee on this patent?
Wal Mart Stores Inc
What technology area does this patent fall under?
Primary CPC classification G06Q40/12. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).