Structured graph-to-text generation with two step fine-tuning

US11727210B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11727210-B2
Application numberUS-202117162040-A
CountryUS
Kind codeB2
Filing dateJan 29, 2021
Priority dateAug 14, 2020
Publication dateAug 15, 2023
Grant dateAug 15, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments described herein provide systems and methods for data-to-text generation. The embodiments receive input data that includes a resource description framework (RDF) triples in an RDF graph. A data-to-text generation system generates position aware embeddings, including position embeddings, triple role embeddings, and tree-level embeddings. Using the position aware embeddings and the RDF graph, the data-to-text generation system generates a textual description for the RDF graph.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a non-transitory memory; and one or more hardware processors coupled to the non-transitory memory and configured to read instructions from the non-transitory memory to cause the system to perform operations comprising: receiving, at a data-to-text generation system that includes a generative language model, an input data that includes resource description framework (RDF) triples in an RDF graph; generating, using the data-to-text generation system, embeddings from the RDF graph based on tokens of the input data, wherein the embeddings include a position aware embedding that identifies a position of an RDF triple of the RDF triples in the RDF graph; and generating, using the data-to-text generation system, a textual description of the input data based on the embeddings and the RDF graph, wherein the position aware embedding includes a position embedding that identifies a position of a token indicating whether a word in the RDF triple of the RDF triples is a subject, a relation, or an object. 2. The system of claim 1 , wherein the RDF triple includes words that correspond to a subject, a relation, or an object. 3. The system of claim 1 , wherein the position aware embedding includes a position embedding that identifies a position of a token that stores a word in the RDF triple from the RDF triples. 4. The system of claim 1 , wherein the position aware embedding includes a triple role embedding that identifies that a token includes a word or an indication of a role of the word in the RDF triple from the RDF triples that corresponds to a subject, an object, or a relation. 5. The system of claim 1 , wherein the position aware embedding includes a tree-level embedding that identifies a tree distance from a root of a parsing tree to a level in the parsing tree that includes a token, wherein the token stores a word or an indication of a role of the word in the RDF triple from the RDF triples. 6. The system of claim 1 , wherein the generating the embeddings further comprises generating a token embedding that identifies a token that stores a word or an indication of a role of the word in the RDF triple from the RDF triples. 7. A method comprising: receiving, at a data-to-text generation system that includes a generative language model, the data-to-text generation system configured to execute on a processor, an input data that includes resource description framework (RDF) triples in an RDF graph; generating, using the data-to-text generation system, embeddings from the RDF graph based on tokens of the input data, wherein the embeddings include a position aware embedding that identifies a position of an RDF triple of the RDF triples in the RDF graph; and generating, using the data-to-text generation system, a textual description of the input data based on the position aware embedding and the RDF graph, wherein the position aware embedding includes a position embedding that identifies a position of a token indicating whether a word in the RDF triple of the RDF triples is a subject, a relation, or an object. 8. The method of claim 7 , further comprising: training the generative language model to generate the position aware embeddings. 9. The method of claim 7 , wherein the position aware embedding includes a position embedding that identifies a position of a token that stores a word in the RDF triple from the RDF triples. 10. The method of claim 7 , wherein the position aware embedding includes a triple role embedding that identifies that a token includes a word or an indication of a role of the word in the RDF triple from the RDF triples that corresponds to a subject, an object, or a relation. 11. The method of claim 7 , wherein the position aware embedding includes a tree-level embedding that identifies a tree distance from a root of a parsing tree to a level in the parsing tree that includes a token, wherein the token stores a word or an indication of a role of the word in the RDF triple from the RDF triples. 12. The method of claim 7 , wherein the generating the embeddings further comprises generating a token embedding that identifies a token that stores a word or an indication of a role of the word in the RDF triple from the RDF triples. 13. A non-transitory machine-readable medium having stored thereon machine-readable instructions executable to cause a machine to perform operations comprising: receiving, at a data-to-text generator that includes a generative language model, an input data that includes structured data triples in a structured graph; generating, using the data-to-graph generator, embeddings from the structured graph based on tokens of the input data, wherein the embeddings include position aware embeddings that identify position of a triple of triples in the structured graph; and generating, using a data-to-text module, a textual description of the input data based on the position aware embeddings and the structured graph, wherein the position aware embeddings include a position embedding that identifies a position of a token indicating whether a word in the triple of the triples is a subject, a relation, or an object. 14. The non-transitory machine-readable medium of claim 13 , wherein the position aware embeddings include a position embedding that identifies a position of a token that stores a word in the triple from the triples. 15. The non-transitory machine-readable medium of claim 13 , wherein the position aware embeddings include a triple role embedding that identifies that a token includes a word or an indication of a role of the word in the triple from the triples that corresponds to a subject, an object, or a relation. 16. The non-transitory machine-readable medium of claim 13 , wherein the position aware embeddings include a tree-level embedding that identifies a tree distance from a root of a parsing tree to a level in the parsing tree that stores a token, wherein the token includes a word or an indication of a role of the word in the triple from the triples. 17. The non-transitory machine-readable medium of claim 13 , wherein the generating the embeddings further comprises generating token embeddings that identify a token that stores a word or an indication of a role of the word in the triple from the triples.

Assignees

Inventors

Classifications

  • G06F40/284Primary

    Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Text processing (natural language analysis G06F40/20; semantic analysis G06F40/30; processing or translation of natural language G06F40/40) · CPC title

  • Parsing · CPC title

  • G06F40/56Primary

    Natural language generation · CPC title

  • Tree-structured documents (parsing G06F40/205; validation G06F40/226) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11727210B2 cover?
Embodiments described herein provide systems and methods for data-to-text generation. The embodiments receive input data that includes a resource description framework (RDF) triples in an RDF graph. A data-to-text generation system generates position aware embeddings, including position embeddings, triple role embeddings, and tree-level embeddings. Using the position aware embeddings and the RD…
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 15 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).