Bridging textual and tabular data for cross domain text-to-query language semantic parsing with a pre-trained transformer language encoder and anchor text

US11720559B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11720559-B2
Application numberUS-202017064466-A
CountryUS
Kind codeB2
Filing dateOct 6, 2020
Priority dateJun 2, 2020
Publication dateAug 8, 2023
Grant dateAug 8, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A text-to-database neural network architecture is provided. The architecture receives a natural language question and a database schema and generates a serialized question-schema representation that includes a question and at least one table and at least one field from the database schema. The serialized question-schema representation is appended with at least one value that matches a word in the natural language question and at least one field in a database picklist. An encoder in the architecture generates question and schema encodings from the appended question-schema representation. Schema encodings are associated with metadata that indicates a data type of the fields and whether fields are associated with primary or foreign keys. A decoder in the architecture generates an executable query from the question encodings and schema encodings.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a natural language question and a database schema; generating a serialized question-schema representation from the natural language question and the database schema, wherein the serialized question-schema representation includes at least one word from the natural language question, at least one table name of a table in the database schema and at least one field name of a field associated with the table; generating, using a fuzzy string match, a set of multiple values from a picklist associated with the field, wherein the picklist links at least one word in the natural language question to the field, and wherein the set of multiple values includes words that match the at least one word in the natural language question and at least one value in the picklist; separating values in the set of multiple values with a value token; appending the set of multiple values from the picklist to the serialized question-schema representation; generating, using an encoder and at least one bi-directional long-short term memory (LSTM), question encodings and schema encodings from the serialized question-schema representation; and generating, using a decoder, an executable query from the question encodings and the schema encodings. 2. The method of claim 1 , wherein generating the serialized question-schema representation further comprises: separating the natural language question and the database schema with a separator token; separating a table name in the at least one table name of the table with a table token; and separating a field name in the at least one field name of the field with a field token. 3. The method of claim 1 , wherein the appending further comprises: appending the one or more values after the field name of the field; and separating a value in the one or more values with a value token. 4. The method of claim 1 , wherein generating the question encodings further comprises: generating, using the encoder and a first bi-directional LSTM in the at least one bi-directional LSTM, base question-schema encodings; and generating, using a second bi-directional LSTM in the at least one bi-directional LSTM and a question segment of the base question-schema encodings the question encodings. 5. The method of claim 1 , wherein generating the schema encodings further comprises: generating, using the encoder and a first bi-directional LSTM in the at least one bi-directional LSTM, base question-schema encodings; and generating, using a schema segment of the base question-schema encodings and a projection layer, the schema encodings. 6. The method of claim 5 , wherein generating the schema encodings using the schema segment of the base question-schema encodings further comprises: determining, using the projection layer that includes a fusion neural network with a rectifier linear unit, that the schema encodings include the field that corresponds to a primary key. 7. The method of claim 5 , wherein generating the schema encodings using the schema segment of the base question-schema encodings further comprises: determining, using the projection layer that includes a fusion neural network with a rectifier linear unit, that the schema encodings include the field that corresponds to a foreign key. 8. The method of claim 5 , wherein generating the schema encodings using the schema segment of the base question-schema encodings further comprises: determining, using the projection layer that includes a fusion neural network with a rectifier linear unit, a data type of the field in the schema encodings. 9. The method of claim 1 , wherein generating the executable query further comprises: selecting, using the decoder, the question encodings, and the schema encodings, and an internal state of the decoder, a token from the natural language question, a token from the database schema or a token from a vocabulary for inclusion into the executable query. 10. A system comprising: a memory; a processor coupled to the memory and configured to: receive a natural language question and a database schema; generate a serialized question-schema representation from the natural language question and the database schema, wherein the serialized question-schema representation includes at least one word from the natural language question, at least one table name of a table in the database schema and at least one field name of a field associated with the table; generate, using a fuzzy string match, a set of multiple values from a picklist associated with the field, wherein the picklist links at least one word in the natural language question to the field, and wherein the set of multiple values includes words that match the at least one word in the natural language question and at least one value in the picklist; separate values in the set of multiple values with a value token; append the set of multiple values from the picklist to the serialized question-schema representation; generate, using an encoder and at least one bi-directional long-short term memory (LSTM) stored in the memory, question encodings and schema encodings from the serialized question-schema representation; and generate, using a decoder, an executable query from the question encodings and the schema encodings. 11. The system of claim 10 , wherein to generate the serialized question-schema representation the processor is further configured to: separate the natural language question and the database schema with a separator token; separate a table name in the at least one table name of the table with a table token; and separate a field name in the at least one field name of the field with a field token. 12. The system of claim 10 , wherein to generate the question encodings the processor is further configured to: generate, using the encoder and a first bi-directional LSTM in the at least one bi-directional LSTM, base question-schema encodings; and generate, using a second bi-directional LSTM in the at least one bi-directional LSTM and a question segment of the base question-schema encodings, the question encodings. 13. The system of claim 10 , wherein to generate the schema encodings, the processor is further configured to: generate, using the encoder and a first bi-directional LSTM in the at least one bi-directional LSTM, base question-schema encodings; and generate, using a schema segment of the base question-schema encodings and a projection layer, the schema encodings. 14. The system of claim 13 , wherein to generate the schema encodings using the schema segment of the base question-schema encodings, the processor is further configured to: determine, using the projection layer that includes a fusion neural network with a rectifier linear unit, that the schema encodings include the field that corresponds to a primary key. 15. The system of claim 13 , wherein to generate the schema encodings using the schema segment of the base question-schema encodings, the processor is further configured to: determine, using the projection layer that includes a fusion neural network with a rectifier linear unit, that the schema encodings include the field that corresponds to a foreign key. 16. The system of claim 13 , wherein to generate the schema encodings using the schema segment of the base question-schema encodings, the processor is further configured to: determine, using the projection layer that includes a fusion neural network with a rectifier linear unit, a data type of the field in the schema encodings. 17. The system of claim 10 , wherein to generat

Assignees

Inventors

Classifications

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Translation of natural language queries to structured queries · CPC title

  • with details for data modelling support · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11720559B2 cover?
A text-to-database neural network architecture is provided. The architecture receives a natural language question and a database schema and generates a serialized question-schema representation that includes a question and at least one table and at least one field from the database schema. The serialized question-schema representation is appended with at least one value that matches a word in t…
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/24522. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 08 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).