Generating feature vectors from RDF graphs

US11775859B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11775859-B2
Application numberUS-201916354648-A
CountryUS
Kind codeB2
Filing dateMar 15, 2019
Priority dateAug 28, 2015
Publication dateOct 3, 2023
Grant dateOct 3, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The technology disclosed describes systems and methods for generating feature vectors from resource description framework (RDF) graphs. Machine learning tasks frequently operate on vectors of features. Available systems for parsing multiple documents often generate RDF graphs. Once a set of interesting features to be considered has been established, the disclosed technology describes systems and methods for generating feature vectors from the RDF graphs for the documents. In one example setting, a machine learning system can use generated feature vectors to determine how interesting a news article might be, or to learn information-of-interest about a specific subject reported in multiple articles. In another example setting, viable interview candidates for a particular job opening can be identified using feature vectors generated from a resume database, using the disclosed systems and methods for generating feature vectors from RDF graphs.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for identifying a first document relevant to a topic of interest, the method comprising: producing, by a processor, a Resource Description Framework (RDF) graph of a second document, wherein the RDF graph includes a plurality of nodes and at least one edge node; receiving the topic of interest to be evaluated; determining, by the processor and based on the RDF graph, key-attributes identified from nodes in the RDF graph responsive to the topic of interest to be evaluated, root node-attributes collected from root nodes in the RDF graph pointing to the topic of interest to be evaluated, and a Boolean value additional-attribute of interest derived from the identified key-attributes in one of the responsive node or a node connected by a single edge to the responsive node, wherein the Boolean value additional-attribute of interest is one of a true or false feature value; generating, by the processor, feature vectors based on a feature value of the identified key-attributes, a feature subject of the root node-attributes, and a determined “true” value of the Boolean value additional-attribute of interest; and producing, by the processor and using a machine learning algorithm and the feature vectors, computer instructions configured to identify, in response to a receipt of the first document, that the first document is relevant to the received topic of interest. 2. The method of claim 1 , further comprising storing, by the processor, the feature vectors. 3. The method of claim 1 , further comprising storing, by the processor, the RDF graph. 4. The method of claim 1 , wherein the producing the RDF graph comprises parsing the second document. 5. The method of claim 4 , wherein the parsing comprises: extracting, using a natural language processing technology, semantic information from unstructured text in the second document; and generating, from the semantic information, a serialized representation of the RDF graph. 6. The method of claim 5 , wherein the serialized representation comprises a markup language file. 7. The method of claim 5 , further comprising converting the serialized representation of the RDF graph to a form that is capable of being queried. 8. The method of claim 1 , wherein the generating comprises: determining a set of features relevant to the topic of interest; and identifying, in the RDF graph, a first node, the first node being identified by a feature of the set of features. 9. The method of claim 8 , wherein the determining the set of features comprises receiving the set of features. 10. The method of claim 8 , wherein the generating further comprises determining, from the first node, a value of the feature. 11. The method of claim 10 , wherein the value is based on a degree of relevance of the feature to the topic of interest. 12. The method of claim 8 , wherein the generating further comprises identifying, in the RDF graph, a second node, the second node being a subject node of a triple in the RDF graph, the first node being an object node of the triple. 13. The method of claim 12 , wherein the identifying the second node comprises: querying the RDF graph; and receiving, in response to a query of the RDF graph, the second node. 14. The method of claim 12 , wherein a subgraph of interest comprises the first node and the second node. 15. The method of claim 14 , wherein the attribute determined from the information external to the second document is based on a node of the subgraph of interest. 16. The method of claim 12 , wherein the generating further comprises identifying, in the RDF graph, a third node, the third node being connected, in the RDF graph, to a node of the subgraph of interest by an edge. 17. The method of claim 16 , wherein the edge is a sequence of edges. 18. The method of claim 1 , further comprising searching, by the processor, for a third document, wherein: the third document is associated with a feature vector having a feature that identifies an entity associated with the third document; at least one of the feature vectors associated the second document includes a feature that identifies an entity associated with the second document; and the entity associated with the second document is the entity associated with the third document. 19. A non-transitory computer-readable medium storing computer code for identifying a first document relevant to a topic of interest, the computer code including instructions to cause the processor to: produce a Resource Description Framework (RDF) graph of a second document, wherein the RDF graph includes a plurality of nodes and at least one edge node; receive the topic of interest to be evaluated; determine, based on the RDF graph, key-attributes identified from nodes in the RDF graph responsive to the topic of interest to be evaluated, root node-attributes collected from root nodes in the RDF graph pointing to the topic of interest to be evaluated, and a Boolean value additional-attribute of interest derived from the identified key-attributes in one of the responsive node or a node connected by a single edge to the responsive node, wherein the Boolean value additional-attribute of interest is one of a true or false feature value; generate feature vectors based on a feature value of the identified key-attributes, a feature subject of the root node-attributes, and a determined “true” value of the Boolean value additional-attribute of interest; and produce, using a machine learning algorithm and the feature vectors, computer instructions configured to identify, in response to a receipt of the first document, that the first document is relevant to the received topic of interest. 20. A system identifying a first document relevant to a topic of interest, the system comprising: a memory configured to store the first document, a second document, a Resource Description Framework (RDF) graph, and feature vectors; and a processor configured to: produce the RDF graph of the second document, wherein the RDF graph includes a plurality of nodes and at least one edge node; receive the topic of interest to be evaluated; determine, based on the RDF graph, key-attributes identified from nodes in the RDF graph responsive to the topic of interest to be evaluated, root node-attributes collected from root nodes in the RDF graph pointing to the topic of interest to be evaluated, and a Boolean value additional-attribute of interest derived from the identified key-attributes in one of the responsive node or a node connected by a single edge to the responsive node, wherein the Boolean value additional-attribute of interest is one of a true or false feature value; generate feature vectors based on a feature value of the identified key-attributes, a feature subject of the root node-attributes, and a determined “true” value of the Boolean value additional-attribute of interest; and produce, using a machine learning algorithm and the feature vectors, computer instructions configured to identify, in response to a receipt of the first document, that the first document is relevant to the received topic of interest.

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

  • Clustering; Classification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11775859B2 cover?
The technology disclosed describes systems and methods for generating feature vectors from resource description framework (RDF) graphs. Machine learning tasks frequently operate on vectors of features. Available systems for parsing multiple documents often generate RDF graphs. Once a set of interesting features to be considered has been established, the disclosed technology describes systems an…
Who is the assignee on this patent?
Salesforce Com Inc, Salesforce Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).