Automated question-answer generation system for documents

US12333246B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-12333246-B1
Application numberUS-202117554761-A
CountryUS
Kind codeB1
Filing dateDec 17, 2021
Priority dateDec 17, 2021
Publication dateJun 17, 2025
Grant dateJun 17, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for generating question-answer pairs is disclosed. The system and method can receive a document. A sentence and/or a further sentence in the document may be identified. A syntactic map for the sentence and/or the further sentence may be generated. Noun phrases and prepositional phrases may be identified based on the syntactic map. Sentence level questions may be generated based on phrases identified using natural language processing (NLP) techniques. Document level questions can also be generated based on syntactic maps generated and NLP techniques.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method for question-answer pair generation, the method comprising: receiving, by one or more computing devices, a document; identifying, by the one or more computing devices, a sentence in the document; generating, by the one or more computing devices, a syntactic map for the sentence, wherein the syntactic map represents a grammatical structure of the sentence based on dependencies between words in the sentence; identifying, by the one or more computing devices, a further sentence in the document; generating, by the one or more computing devices, a further syntactic map for the further sentence; generating, by the one or more computing devices, a combined syntactic map from the syntactic map and the further syntactic map by connecting the syntactic map and the further syntactic map using common words found in each of the syntactic map and the further syntactic map; generating, by the one or more computing devices, word vector representations for encoding each word of the sentence and the further sentence, by processing each of the sentence and the further sentence using a Bi-Directional Gated Recurrent Unit (BiGRU) and giving weights to each word of the sentence and the further sentence based on the BiGRU being trained to recognize a relative importance of each word to the sentence and the further sentence based on its part of speech; generating, by the one or more computing devices, a combined vector representation of each word of the sentence and the further sentence by computing a weighted average based on each of the word vector representations; generating, by the one or more computing devices, a structurally aware vector representation of the sentence and the further sentence by processing the combined syntactic map and the combined vector representation of each word of the sentence and the further sentence, using a graph attention network (GAT); generating, by the one or more computing devices, a semantic enriched vector representation of the sentence and the further sentence by processing the structurally aware vector representation of the sentence and the further sentence and the word vector representations for each word of the sentence and the further sentence, using a neural network, wherein the semantic enriched vector representation comprises a value representing the importance of each word in the combined syntactic map; generating, by the one or more computing devices, document level questions based on the semantic enriched vector representation; determining, by the one or more computing devices, a cosine similarity between: a document level question from the document level questions generated, and one or more paragraphs of the document; for a paragraph from the one or more paragraphs determined to have a highest cosine similarity, transmitting, by the one or more computing devices, the document level question and the paragraph to a Question-Answer Model (QA Model); determining, by the one or more computing devices and using the QA Model, an answer to the document level question from the paragraph; and post-processing, by the one or more computing devices, the answer to determine whether the answer is redundant, incorrect, or irrelevant based on using a further trained model trained to recognize correct answers based on patterns of previous answers to similarly posed questions and determining which answers are most similar to the answer and discard answers deemed redundant, incorrect, or irrelevant. 2. The method of claim 1 , wherein generating the document level questions further comprises: (a) identifying, by the one or more computing devices, a word from the semantic enriched vector representation, wherein the word is labeled as very important or important; (b) identifying, by the one or more computing devices, an interrogative word based on the word; (c) inserting, by the one or more computing devices, the interrogative word as a first word of a document level question; (d) determining, by the one or more computing devices, what further word to append to the first word based on calculating a probability of what the further word will be; (e) appending, by the one or more computing devices, to the first word the further word with a highest probability calculated; and (f) repeating (a)-(e) until a maximum sequence length is reached or an end of a sequence token is generated. 3. The method of claim 1 , wherein the QA Model is deployed on a dedicated server as a Flask Application. 4. The method of claim 1 , wherein the further trained model is trained using supervised learning techniques. 5. A non-transitory computer readable medium including instructions stored thereon that when executed by one or more processors of a computing system, cause the computing system to perform operations for question-answer pair generation, the operations comprising: receiving a document; identifying a sentence in the document; generating a syntactic map for the sentence, wherein the syntactic map represents a grammatical structure of the sentence based on dependencies between words in the sentence; identifying a further sentence in the document; generating a further syntactic map for the further sentence; generating a combined syntactic map from the syntactic map and the further syntactic map by connecting the syntactic map and the further syntactic map using common words found in each of the syntactic map and the further syntactic map; generating word vector representations for encoding each word of the sentence and the further sentence, by processing each of the sentence and the further sentence using a Bi-Directional Gated Recurrent Unit (BiGRU) and giving weights to each word of the sentence and the further sentence based on the BiGRU being trained to recognize a relative importance of each word to the sentence and the further sentence based on its part of speech; generating a combined vector representation of each word of the sentence and the further sentence by computing a weighted average based on each of the word vector representations; generating a structurally aware vector representation of the sentence and the further sentence by processing the combined syntactic map and the combined vector representation of each word of the sentence and the further sentence, using a graph attention network (GAT); generating a semantic enriched vector representation of the sentence and the further sentence by processing the structurally aware vector representation of the sentence and the further sentence and the word vector representations for each word of the sentence and the further sentence, using a neural network, wherein the semantic enriched vector representation comprises a value representing the importance of each word in the combined syntactic map; generating document level questions based on the semantic enriched vector representation; determining a cosine similarity between: a document level question from the document level questions generated, and one or more paragraphs of the document; for a paragraph from the one or more paragraphs determined to have a highest cosine similarity, transmitting the document level question and the paragraph to a Question-Answer Model (QA Model); determining, using the QA Model, an answer to the document level question from the paragraph; and post-processing the answer to determine whether the answer is redundant, incorrect, or irrelevant based on using a further trained model trained to recognize correct answers based on patterns of previous answers to similarly posed questions and determining which answers are most similar to the answer and discard answers deemed redundant, incorrect, or irrelevant. 6. The non-transitory computer readable medium of claim 5 , wherein the opera

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • G06F40/211Primary

    Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title

  • Semantic analysis · CPC title

  • Editing, e.g. inserting or deleting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12333246B1 cover?
A system and method for generating question-answer pairs is disclosed. The system and method can receive a document. A sentence and/or a further sentence in the document may be identified. A syntactic map for the sentence and/or the further sentence may be generated. Noun phrases and prepositional phrases may be identified based on the syntactic map. Sentence level questions may be generated ba…
Who is the assignee on this patent?
American Express Travel Related Services Co Inc, American Express India Private Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/211. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 17 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).