Efficient generation of embedding vectors of nodes in a corpus graph

US11227012B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11227012-B2
Application numberUS-201916273860-A
CountryUS
Kind codeB2
Filing dateFeb 12, 2019
Priority dateMar 13, 2018
Publication dateJan 18, 2022
Grant dateJan 18, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for generating embeddings for nodes of a corpus graph are presented. More particularly, operations for generation of an aggregated embedding vector for a target node is efficiently divided among operations on a central processing unit and operations on a graphic processing unit. With regard to a target node within a corpus graph, processing by one or more central processing units (CPUs) is conducted to identify the target node's relevant neighborhood (of nodes) within the corpus graph. This information is prepared and passed to one or more graphic processing units (GPUs) that determines the aggregated embedding vector for the target node according to data of the relevant neighborhood of the target node.

First claim

Opening claim text (preview).

What is claimed: 1. A non-transitory computer-readable medium bearing computer-executable instructions which, when executed on a computing system comprising at least a processor retrieving instructions from the medium, carry out a method on the computing system comprising: accessing a corpus graph comprising a plurality of nodes from a data store maintained by the computing system; executing, on the computing system, a framework to process at least a first node of the plurality of nodes, comprising: executing, on the computing system, at least a first mapper function of a plurality of mapper functions of the framework with regard to the first node, the execution of the first mapper function comprising: determining a first relevant neighborhood of nodes corresponding to the first node, the first relevant neighborhood of nodes comprising a first set of nodes of the plurality of nodes, and associating data identifying the first relevant neighborhood of nodes in the data store with the first node; generating first neighborhood information corresponding to the first node according to the nodes of the first set of nodes; and generating a first entry in a queue maintained on the computing system, the first entry including an embedding vector associated with the first node and the first neighborhood information; executing, on the computing system, at least a first reducer function of a plurality of reducer functions with regard to the first entry in the queue, the execution of the first reducer function comprising: accessing the embedding vector of the first node and the first neighborhood information in the first entry; combining the embedding vector of the first node with the first neighborhood information to generate a first aggregated embedding vector corresponding to the first node; and returning the first aggregated embedding vector to the framework; and storing the first aggregated embedding vector in the data store in association with the first node; and in response to determining that a query corresponds to the first node, providing at least one item of content corresponding to at least one node as responsive to the query, the at least one node determined from the corpus graph based at least in part on the first aggregated embedding vector. 2. The non-transitory computer-readable medium of claim 1 , wherein the first mapper function is executed under the control of the framework on a central processing unit (CPU), and the first reducer function is executed asynchronously under the control of the framework on a graphics processing unit (GPU). 3. The non-transitory computer-readable medium of claim 1 , wherein the first neighborhood information is generated from an aggregated embedding vector of each respective node of the first set of nodes. 4. The non-transitory computer-readable medium of claim 1 , wherein the execution of the first mapper function is asynchronous to the execution of the first reducer function. 5. The non-transitory computer-readable medium of claim 1 , wherein the first relevant neighborhood of nodes of the first node is determined according to a Random Walk process among the plurality of nodes originating with the first node. 6. The non-transitory computer-readable medium of claim 5 , wherein the first neighborhood information is generated according to stacking levels corresponding to the nodes of the first set of nodes. 7. The non-transitory computer-readable medium of claim 6 , wherein the first neighborhood information is further generated according to an importance of each node of the first set of nodes to the first node. 8. A computer-implemented method to generate aggregated embedding vectors for nodes of a corpus graph, the method comprising: under the control of one or more processors executing on a computer system: maintaining, in a data store, the corpus graph comprising a plurality of nodes; generating an embedding vector, through the execution of a trained machine learning model, for each node of the plurality of nodes, and storing each embedding vector in the data store in association with the corresponding node; executing, on the computer system, a processing framework to generate, for at least a first node of the plurality of nodes, a first aggregated embedding vector, the execution comprising: executing a first producer task of a plurality of producer tasks of the processing framework with regard to the first node of the plurality of nodes, comprising: determining a first relevant neighborhood corresponding to the first node and associating data identifying the first relevant neighborhood with the first node in the data store, the first relevant neighborhood comprising a first set of nodes of the plurality of nodes; generating first neighborhood information corresponding to the first node according to the nodes of the first set of nodes; and adding the embedding vector of the first node and the first neighborhood information as a first entry in a queue maintained by the processing framework on the computer system; executing a first consumer task of a plurality of consumer tasks of the processing framework with regard to the first entry in the queue, comprising: accessing the embedding vector of the first node and the first neighborhood information from the first entry in the queue; combining the embedding vector of the first node and the first neighborhood information to generate a first aggregated embedding vector for the first node; returning the first aggregated embedding vector to the processing framework; and storing the first aggregated embedding vector in association with the first node in the data store; and in response to determining that a query corresponds to the first node, providing at least one item of content corresponding to at least one node as responsive to the query, the at least one node determined from the corpus graph based at least in part on the first aggregated embedding vector. 9. The computer-implemented method of claim 8 , wherein the first neighborhood information is generated from an aggregated embedding vector of each respective node of the first set of nodes. 10. The computer-implemented method of claim 8 , wherein each aggregated embedding vector of each respective node of the first set of nodes is generated from neighborhood information of a relevant neighborhood of nodes corresponding to each respective node. 11. The computer-implemented method of claim 8 , wherein the first producer task executes asynchronously to the execution of the first consumer task. 12. The computer-implemented method of claim 8 , wherein the first relevant neighborhood is determined according to a Random Walk process among the plurality of nodes originating with the first node. 13. The computer-implemented method of claim 12 , wherein the first neighborhood information is generated according to stacking levels corresponding to the nodes of the first set of nodes. 14. The computer-implemented method of claim 13 , wherein the first neighborhood information is further generated according to an importance of each node of the first set of nodes to the first node. 15. A computer system comprising at least a processor and a memory, wherein the processor, in executing computer-executable instructions retrieved from the memory, configures the computer system to, at least: maintain, in a data store, a corpus graph comprising a plurality of nodes; generate, through execution of one or more processes on the computer system, an embedding vector for each node of the plurality of nodes, and store the embedding vectors in the data store in association with

Assignees

Inventors

Classifications

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • based on distances to training or reference patterns · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Selection of the most significant subset of features · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11227012B2 cover?
Systems and methods for generating embeddings for nodes of a corpus graph are presented. More particularly, operations for generation of an aggregated embedding vector for a target node is efficiently divided among operations on a central processing unit and operations on a graphic processing unit. With regard to a target node within a corpus graph, processing by one or more central processing …
Who is the assignee on this patent?
Pinterest Inc, Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/9024. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 18 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).