What technology area does this patent fall under?

Primary CPC classification G06F16/9024. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 26 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Scalable graph propagation for knowledge expansion

US9852231B1 · US · B1

Patent metadata
Field	Value
Publication number	US-9852231-B1
Application number	US-201414531102-A
Country	US
Kind code	B1
Filing date	Nov 3, 2014
Priority date	Nov 3, 2014
Publication date	Dec 26, 2017
Grant date	Dec 26, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for adding labels to a graph are disclosed. One system includes a plurality of computing devices including processors and memory storing an input graph generated based on a source data set, where an edge represents a similarity measure between two nodes in the input graph, the input graph being distributed across the plurality of computing devices, and some of the nodes are seed nodes associated with one or more training labels from a set of labels, each training label having an associated original weight. The memory may also store instructions that, when executed by the processors, cause the plurality of distributed computing devices to propagate the training labels through the input graph using a sparsity approximation for label propagation, resulting in learned weights for respective node and label pairs, and automatically update the source data set using node and label pairs selected based on the learned weights.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: initializing, for nodes in a distributed graph comprising labeled nodes and unlabeled nodes, wherein an edge between two nodes in the distributed graph represents a similarity measure between the two nodes, learned label weights for at least a non-zero quantity k of labels per node; for a first node in the distributed graph: sending the learned label weights for the node to each neighbor in the distributed graph, receiving a set of at least k learned label weights from each neighbor, determining top-ranked labels for the first node based on a probability-based sparsity approximation using the received sets of learned label weights, and calculating learned label weights for top-ranked labels of the first node based on an aggregation of the received sets of learned label weights from the neighbors; repeating the sending, receiving, determining, and calculating for a quantity of iterations; determining, from the learned label weights for the first node, a first label with a weight that meets or exceeds a threshold; and automatically updating a source data set with the first label, responsive to the determining. 2. The method of claim 1 , wherein the nodes in the distributed graph represent textual information and the method further comprises, prior to initializing the learned label weights, adding additional edges between nodes in the distributed graph based on deep learning of a large corpus of text. 3. The method of claim 2 , wherein adding the additional edges includes: learning a semantic embedding for each node in the distributed graph using the deep learning; generating a signature for each node by applying locality sensitive hashing on the semantic embedding for the node; using the signature of a third node and the signature of a second node to determine a similarity metric between the third node and the second node; and adding an edge between the third node and the second node when the similarity metric meets a second threshold. 4. The method of claim 1 , wherein determining the top-ranked labels for the first node includes, for each of the labels in the sets of labels from the neighbors: determining a probability for the label based on a weighted frequency with which the label is encountered; and determining a maximum error of the weighted frequency for the label, wherein the sum of the probability and the maximum error is used to determine the top-ranked labels. 5. The method of claim 4 , wherein determining the probability and the maximum error includes, as the set of learned label weights for a t th neighbor u t are received: determining whether a probability-estimation entry exists for a label l for the first node, the probability-estimation entry including a label identifier for the label l, a frequency component, and an error component; when the probability-estimation entry exists, adding the product of the learned label weight for the label l and a similarity measure between the neighbor u t and the first node to the frequency component; when the probability-estimation entry does not exist, creating a new probability-estimation entry for the label by: setting the frequency component of the new probability-estimation entry to the product of the learned label weight for label l and a similarity measure between the neighbor u t and the first node, and setting the error component of the new probability-estimation entry to a probability threshold; and repeating the determining, adding and creating for each label l with a learned label weight for the neighbor u t . 6. The method of claim 5 , wherein the probability threshold is a dynamic threshold calculated by adding the product, calculated for each previously received neighbor u, of a similarity measure between the previously received neighbor u and the first node and an average probability mass for neighbor u. 7. The method of claim 6 , further comprising: discarding probability-estimation entries for labels where the sum of the frequency component and the error component is less than the sum of, for each of the t neighbors u, the similarity measure between the first node and the neighbor u and the average probability mass for neighbor u. 8. The method of claim 1 , wherein calculating the learned weights of top-ranked labels for the first node includes, for a label l of the top-ranked labels: determining a seed component for the label l that maintains an original weight for labels of labeled nodes; for each neighbor, determining a neighbor component for the label l, the neighbor component being based on similarity of the neighbor to the first node and similarity of the k labels for the neighbor to the label l; calculating a total neighbor component for the label l by adding the neighbor components and multiplying the sum by a component weight; calculating a uniform distribution component for the label l; and setting the learned label weight for the label l to a sum of the seed component, the total neighbor component, and the uniform distribution component, the sum being divided by a normalization component for the first node and the label l. 9. The method of claim 1 , wherein aggregating the received sets of learned label weights from neighbors of the first node includes, for each neighbor u: determining a product by multiplying a sum of learned label weights for neighbor u by a similarity measure between the first node and the neighbor u; adding the products together; and normalizing the added products. 10. The method of claim 9 , wherein the similarity measure is multiplied by an entropy parameter for the neighbor u, the entropy parameter being based on an entropy of label distribution in neighbor u. 11. The method of claim 1 , wherein the source data set includes entities and attributes, a node in the distributed graph represents an entity in the source data set, a label for the node represents an attribute of the entity in the source data set, and updating the source data set includes adding, in the source data set, the attribute represented by the first label to the entity represented by the first node. 12. A system comprising: a plurality of computing devices including processors formed in a substrate and memory storing: an input graph of nodes connected by edges, an edge representing a similarity measure between two nodes, the graph being distributed across the plurality of computing devices, wherein at least some of the nodes are seed nodes associated with one or more training labels from a set of labels, each training label having an associated original weight, the input graph being generated based on a source data set; and instructions that, when executed by the processors, cause the plurality of distributed computing devices to perform operations comprising: propagating the training labels through the input graph using a sparsity approximation for label propagation, resulting in learned weights for respective node and label pairs, and automatically updating the source data set using node and label pairs selected based on the learned weights. 13. The system of claim 12 , wherein the source data set is a knowledge base and a node in the input graph represents a pair of entities in the knowledge base and a label for the node represents a relationship between the pair of entities in the knowledge base. 14. The system of claim 12 , wherein the source data set is a graph-based data store, a node in the graph represents an entity in the graph-based data store, a label for the node represents an attribute of the entity in the graph-based data store, and updating the source dat

Assignees

Inventors

Classifications

G06N5/02
Knowledge representation; Symbolic representation · CPC title
G06F16/9024Primary
Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title
G06N20/00
Machine learning · CPC title
G06F17/30958Primary
Physics · mapped topic
G06N99/005
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 60674694

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9852231B1 cover?: Systems and methods for adding labels to a graph are disclosed. One system includes a plurality of computing devices including processors and memory storing an input graph generated based on a source data set, where an edge represents a similarity measure between two nodes in the input graph, the input graph being distributed across the plurality of computing devices, and some of the nodes are …
Who is the assignee on this patent?: Google Inc, Google Llc
What technology area does this patent fall under?: Primary CPC classification G06F16/9024. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 26 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).