Automatic discovery of related data records

US12111870B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12111870-B2
Application numberUS-202117213946-A
CountryUS
Kind codeB2
Filing dateMar 26, 2021
Priority dateMar 26, 2021
Publication dateOct 8, 2024
Grant dateOct 8, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are provided for automatic discovery of data records. One method comprises obtaining data records each corresponding to a different item and comprising features extracted from a data source, wherein the data records identify related items identified using a collaborative filter that relates items based on user preferences; generating an item network comprising multiple nodes each corresponding to a different item, where two nodes are connected by an edge based on: (i) an item type of the two nodes, (ii) a ratio of numerical values associated with the two nodes, and/or (iii) a pairwise configuration similarity score for the two nodes; clustering the nodes into node clusters based on topological properties of the item network; and identifying items related to a given item that (i) share an edge with the given item and (ii) are in a node cluster comprising a node of the given item.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: obtaining a plurality of data records, wherein each data record corresponds to a different one of a plurality of items and comprises a plurality of features extracted from at least one data source, wherein at least one data record associated with a first item identifies at least one related item that is related to the first item, wherein the at least one related item is identified in the plurality of data records using a collaborative filter that relates at least some of the items of the plurality of items based at least in part on preferences of a plurality of users, and wherein the collaborative filter identifies, for a given item, one or more additional items obtained or researched by one or more users that also obtained or researched, respectively, the given item; generating, using the plurality of data records, an item network comprising a plurality of nodes, wherein each node in the item network corresponds to a different one of the plurality of items, wherein two nodes in the item network are selectively connected by an edge in response to an evaluation of: (i) an item type of the items associated with the two nodes, (ii) a ratio of numerical values associated with the two nodes, and (iii) a pairwise configuration similarity score for the two nodes, wherein the pairwise configuration similarity score for the two nodes is based at least in part on a similarity analysis of a textual description of a configuration of each of the items associated with the two nodes, extracted from the at least one data source, for each of the two nodes, wherein the two nodes in the item network are selectively connected by the edge in response to the evaluation determining that: (i) the respective item types of the items associated with the two nodes satisfy one or more similarity criteria, (ii) the ratio of the numerical values associated with the two nodes satisfies a first designated threshold, and (iii) the pairwise configuration similarity score for the two nodes satisfies a second designated threshold, wherein the first designated threshold and the second designated threshold are distinct and wherein the ratio of the numerical values is distinct from the pairwise configuration similarity score; clustering the plurality of nodes in the item network into a plurality of node clusters based at least in part on an analysis of one or more topological properties of the item network; identifying one or more items related to a given item by querying the item network to return the one or more identified related items having a corresponding node in the item network that (i) shares an edge with a node in the item network corresponding to the given item and (ii) are in at least one node cluster comprising a node corresponding to the given item; and initiating an automated processing of at least a given one of the plurality of data records associated with the given item using at least some of the identified one or more items related to the given item; wherein the method is performed by at least one processing device comprising a processor coupled to a memory. 2. The method of claim 1 , wherein the plurality of items comprises a plurality of products and wherein the features extracted from the at least one data source comprise one or more of a product type, a product name, a product price, a product configuration and a product family. 3. The method of claim 1 , wherein the plurality of items comprises a plurality of products and wherein the plurality of features is extracted from the at least one data source for one or more additional products provided by competitors of a provider of a given product. 4. The method of claim 1 , wherein the plurality of items comprises a plurality of products and wherein the collaborative filter identifies, for a given product, one or more additional products purchased or researched by customers that also purchased or researched, respectively, the given product. 5. The method of claim 1 , wherein the plurality of items comprises a plurality of products and wherein the two nodes in the item network are connected by the edge in response to the two corresponding products having a same product type and having a price ratio that satisfies one or more pricing criteria. 6. The method of claim 1 , further comprising adding one or more edges to the item network using a prediction model trained using one or more features of the item network extracted from the item network, wherein the trained prediction model identifies topological link patterns in the item network to predict at least one missing edge to add to the item network, wherein a weight of the at least one added edge is based at least in part on the pairwise configuration similarity score for the two nodes connected by the at least one added edge. 7. The method of claim 1 , wherein the nodes in a given cluster are more closely related to the nodes in the given cluster than to the nodes in other clusters. 8. The method of claim 1 , wherein the similarity analysis of the textual description of the configuration of each of the items associated with the two nodes comprises one or more of determining a Jaccard similarity and determining a cosine similarity of the configuration of each of the items associated with the two nodes. 9. The method of claim 1 , wherein the plurality of items comprises a plurality of products and wherein the identifying one or more items related to the given item comprises identifying, for a given product, one or more additional products that: (i) are associated with nodes in the item network that share an edge with the node associated with the given product and (ii) are found in the same cluster as the given product. 10. The method of claim 1 , further comprising querying the item network for a particular item of interest to a particular organization, wherein the query returns one or more items that (i) share an edge with a node in the item network corresponding to the particular item, wherein the one or more items that share the edge with the node corresponding to the particular item comprise items competing with the particular item of interest and (ii) are in at least one node cluster comprising a node corresponding to the particular item, wherein the one or more items in the at least one node cluster comprise similar items competing with the particular item of interest. 11. The method of claim 10 , further comprising identifying whether the one or more items returned by the query are provided by one or more of the particular organization and a different organization. 12. The method of claim 1 , wherein the evaluation of the pairwise configuration similarity score for the two nodes is performed in response to the evaluation determining that the respective item types of the items associated with the two nodes satisfy the one or more similarity criteria and the ratio of the numerical values associated with the two nodes satisfies the first designated threshold. 13. An apparatus comprising: at least one processing device comprising a processor coupled to a memory; the at least one processing device being configured to implement the following steps: obtaining a plurality of data records, wherein each data record corresponds to a different one of a plurality of items and comprises a plurality of features extracted from at least one data source, wherein at least one data record associated with a first item identifies at least one related item that is related to the first item, wherein the at least one related item is identified in the plurality of data records using a collaborative filter that relates at least some of the items of the plurality of ite

Assignees

Inventors

Classifications

  • G06F16/906Primary

    Clustering; Classification · CPC title

  • H04L45/02Primary

    Topology update or discovery · CPC title

  • Filtering based on additional data, e.g. user or group profiles · CPC title

  • Presentation of query results · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12111870B2 cover?
Techniques are provided for automatic discovery of data records. One method comprises obtaining data records each corresponding to a different item and comprising features extracted from a data source, wherein the data records identify related items identified using a collaborative filter that relates items based on user preferences; generating an item network comprising multiple nodes each cor…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/906. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 08 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).