Neural network based translation of natural language queries to database queries
US-2018336198-A1 · Nov 22, 2018 · US
US11720559B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11720559-B2 |
| Application number | US-202017064466-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 6, 2020 |
| Priority date | Jun 2, 2020 |
| Publication date | Aug 8, 2023 |
| Grant date | Aug 8, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A text-to-database neural network architecture is provided. The architecture receives a natural language question and a database schema and generates a serialized question-schema representation that includes a question and at least one table and at least one field from the database schema. The serialized question-schema representation is appended with at least one value that matches a word in the natural language question and at least one field in a database picklist. An encoder in the architecture generates question and schema encodings from the appended question-schema representation. Schema encodings are associated with metadata that indicates a data type of the fields and whether fields are associated with primary or foreign keys. A decoder in the architecture generates an executable query from the question encodings and schema encodings.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving a natural language question and a database schema; generating a serialized question-schema representation from the natural language question and the database schema, wherein the serialized question-schema representation includes at least one word from the natural language question, at least one table name of a table in the database schema and at least one field name of a field associated with the table; generating, using a fuzzy string match, a set of multiple values from a picklist associated with the field, wherein the picklist links at least one word in the natural language question to the field, and wherein the set of multiple values includes words that match the at least one word in the natural language question and at least one value in the picklist; separating values in the set of multiple values with a value token; appending the set of multiple values from the picklist to the serialized question-schema representation; generating, using an encoder and at least one bi-directional long-short term memory (LSTM), question encodings and schema encodings from the serialized question-schema representation; and generating, using a decoder, an executable query from the question encodings and the schema encodings. 2. The method of claim 1 , wherein generating the serialized question-schema representation further comprises: separating the natural language question and the database schema with a separator token; separating a table name in the at least one table name of the table with a table token; and separating a field name in the at least one field name of the field with a field token. 3. The method of claim 1 , wherein the appending further comprises: appending the one or more values after the field name of the field; and separating a value in the one or more values with a value token. 4. The method of claim 1 , wherein generating the question encodings further comprises: generating, using the encoder and a first bi-directional LSTM in the at least one bi-directional LSTM, base question-schema encodings; and generating, using a second bi-directional LSTM in the at least one bi-directional LSTM and a question segment of the base question-schema encodings the question encodings. 5. The method of claim 1 , wherein generating the schema encodings further comprises: generating, using the encoder and a first bi-directional LSTM in the at least one bi-directional LSTM, base question-schema encodings; and generating, using a schema segment of the base question-schema encodings and a projection layer, the schema encodings. 6. The method of claim 5 , wherein generating the schema encodings using the schema segment of the base question-schema encodings further comprises: determining, using the projection layer that includes a fusion neural network with a rectifier linear unit, that the schema encodings include the field that corresponds to a primary key. 7. The method of claim 5 , wherein generating the schema encodings using the schema segment of the base question-schema encodings further comprises: determining, using the projection layer that includes a fusion neural network with a rectifier linear unit, that the schema encodings include the field that corresponds to a foreign key. 8. The method of claim 5 , wherein generating the schema encodings using the schema segment of the base question-schema encodings further comprises: determining, using the projection layer that includes a fusion neural network with a rectifier linear unit, a data type of the field in the schema encodings. 9. The method of claim 1 , wherein generating the executable query further comprises: selecting, using the decoder, the question encodings, and the schema encodings, and an internal state of the decoder, a token from the natural language question, a token from the database schema or a token from a vocabulary for inclusion into the executable query. 10. A system comprising: a memory; a processor coupled to the memory and configured to: receive a natural language question and a database schema; generate a serialized question-schema representation from the natural language question and the database schema, wherein the serialized question-schema representation includes at least one word from the natural language question, at least one table name of a table in the database schema and at least one field name of a field associated with the table; generate, using a fuzzy string match, a set of multiple values from a picklist associated with the field, wherein the picklist links at least one word in the natural language question to the field, and wherein the set of multiple values includes words that match the at least one word in the natural language question and at least one value in the picklist; separate values in the set of multiple values with a value token; append the set of multiple values from the picklist to the serialized question-schema representation; generate, using an encoder and at least one bi-directional long-short term memory (LSTM) stored in the memory, question encodings and schema encodings from the serialized question-schema representation; and generate, using a decoder, an executable query from the question encodings and the schema encodings. 11. The system of claim 10 , wherein to generate the serialized question-schema representation the processor is further configured to: separate the natural language question and the database schema with a separator token; separate a table name in the at least one table name of the table with a table token; and separate a field name in the at least one field name of the field with a field token. 12. The system of claim 10 , wherein to generate the question encodings the processor is further configured to: generate, using the encoder and a first bi-directional LSTM in the at least one bi-directional LSTM, base question-schema encodings; and generate, using a second bi-directional LSTM in the at least one bi-directional LSTM and a question segment of the base question-schema encodings, the question encodings. 13. The system of claim 10 , wherein to generate the schema encodings, the processor is further configured to: generate, using the encoder and a first bi-directional LSTM in the at least one bi-directional LSTM, base question-schema encodings; and generate, using a schema segment of the base question-schema encodings and a projection layer, the schema encodings. 14. The system of claim 13 , wherein to generate the schema encodings using the schema segment of the base question-schema encodings, the processor is further configured to: determine, using the projection layer that includes a fusion neural network with a rectifier linear unit, that the schema encodings include the field that corresponds to a primary key. 15. The system of claim 13 , wherein to generate the schema encodings using the schema segment of the base question-schema encodings, the processor is further configured to: determine, using the projection layer that includes a fusion neural network with a rectifier linear unit, that the schema encodings include the field that corresponds to a foreign key. 16. The system of claim 13 , wherein to generate the schema encodings using the schema segment of the base question-schema encodings, the processor is further configured to: determine, using the projection layer that includes a fusion neural network with a rectifier linear unit, a data type of the field in the schema encodings. 17. The system of claim 10 , wherein to generat
Auto-encoder networks; Encoder-decoder networks · CPC title
Supervised learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Translation of natural language queries to structured queries · CPC title
with details for data modelling support · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.