Automated cognitive processing of source agnostic data
US-2019102375-A1 · Apr 4, 2019 · US
US12333246B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-12333246-B1 |
| Application number | US-202117554761-A |
| Country | US |
| Kind code | B1 |
| Filing date | Dec 17, 2021 |
| Priority date | Dec 17, 2021 |
| Publication date | Jun 17, 2025 |
| Grant date | Jun 17, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system and method for generating question-answer pairs is disclosed. The system and method can receive a document. A sentence and/or a further sentence in the document may be identified. A syntactic map for the sentence and/or the further sentence may be generated. Noun phrases and prepositional phrases may be identified based on the syntactic map. Sentence level questions may be generated based on phrases identified using natural language processing (NLP) techniques. Document level questions can also be generated based on syntactic maps generated and NLP techniques.
Opening claim text (preview).
What is claimed is: 1. A computer implemented method for question-answer pair generation, the method comprising: receiving, by one or more computing devices, a document; identifying, by the one or more computing devices, a sentence in the document; generating, by the one or more computing devices, a syntactic map for the sentence, wherein the syntactic map represents a grammatical structure of the sentence based on dependencies between words in the sentence; identifying, by the one or more computing devices, a further sentence in the document; generating, by the one or more computing devices, a further syntactic map for the further sentence; generating, by the one or more computing devices, a combined syntactic map from the syntactic map and the further syntactic map by connecting the syntactic map and the further syntactic map using common words found in each of the syntactic map and the further syntactic map; generating, by the one or more computing devices, word vector representations for encoding each word of the sentence and the further sentence, by processing each of the sentence and the further sentence using a Bi-Directional Gated Recurrent Unit (BiGRU) and giving weights to each word of the sentence and the further sentence based on the BiGRU being trained to recognize a relative importance of each word to the sentence and the further sentence based on its part of speech; generating, by the one or more computing devices, a combined vector representation of each word of the sentence and the further sentence by computing a weighted average based on each of the word vector representations; generating, by the one or more computing devices, a structurally aware vector representation of the sentence and the further sentence by processing the combined syntactic map and the combined vector representation of each word of the sentence and the further sentence, using a graph attention network (GAT); generating, by the one or more computing devices, a semantic enriched vector representation of the sentence and the further sentence by processing the structurally aware vector representation of the sentence and the further sentence and the word vector representations for each word of the sentence and the further sentence, using a neural network, wherein the semantic enriched vector representation comprises a value representing the importance of each word in the combined syntactic map; generating, by the one or more computing devices, document level questions based on the semantic enriched vector representation; determining, by the one or more computing devices, a cosine similarity between: a document level question from the document level questions generated, and one or more paragraphs of the document; for a paragraph from the one or more paragraphs determined to have a highest cosine similarity, transmitting, by the one or more computing devices, the document level question and the paragraph to a Question-Answer Model (QA Model); determining, by the one or more computing devices and using the QA Model, an answer to the document level question from the paragraph; and post-processing, by the one or more computing devices, the answer to determine whether the answer is redundant, incorrect, or irrelevant based on using a further trained model trained to recognize correct answers based on patterns of previous answers to similarly posed questions and determining which answers are most similar to the answer and discard answers deemed redundant, incorrect, or irrelevant. 2. The method of claim 1 , wherein generating the document level questions further comprises: (a) identifying, by the one or more computing devices, a word from the semantic enriched vector representation, wherein the word is labeled as very important or important; (b) identifying, by the one or more computing devices, an interrogative word based on the word; (c) inserting, by the one or more computing devices, the interrogative word as a first word of a document level question; (d) determining, by the one or more computing devices, what further word to append to the first word based on calculating a probability of what the further word will be; (e) appending, by the one or more computing devices, to the first word the further word with a highest probability calculated; and (f) repeating (a)-(e) until a maximum sequence length is reached or an end of a sequence token is generated. 3. The method of claim 1 , wherein the QA Model is deployed on a dedicated server as a Flask Application. 4. The method of claim 1 , wherein the further trained model is trained using supervised learning techniques. 5. A non-transitory computer readable medium including instructions stored thereon that when executed by one or more processors of a computing system, cause the computing system to perform operations for question-answer pair generation, the operations comprising: receiving a document; identifying a sentence in the document; generating a syntactic map for the sentence, wherein the syntactic map represents a grammatical structure of the sentence based on dependencies between words in the sentence; identifying a further sentence in the document; generating a further syntactic map for the further sentence; generating a combined syntactic map from the syntactic map and the further syntactic map by connecting the syntactic map and the further syntactic map using common words found in each of the syntactic map and the further syntactic map; generating word vector representations for encoding each word of the sentence and the further sentence, by processing each of the sentence and the further sentence using a Bi-Directional Gated Recurrent Unit (BiGRU) and giving weights to each word of the sentence and the further sentence based on the BiGRU being trained to recognize a relative importance of each word to the sentence and the further sentence based on its part of speech; generating a combined vector representation of each word of the sentence and the further sentence by computing a weighted average based on each of the word vector representations; generating a structurally aware vector representation of the sentence and the further sentence by processing the combined syntactic map and the combined vector representation of each word of the sentence and the further sentence, using a graph attention network (GAT); generating a semantic enriched vector representation of the sentence and the further sentence by processing the structurally aware vector representation of the sentence and the further sentence and the word vector representations for each word of the sentence and the further sentence, using a neural network, wherein the semantic enriched vector representation comprises a value representing the importance of each word in the combined syntactic map; generating document level questions based on the semantic enriched vector representation; determining a cosine similarity between: a document level question from the document level questions generated, and one or more paragraphs of the document; for a paragraph from the one or more paragraphs determined to have a highest cosine similarity, transmitting the document level question and the paragraph to a Question-Answer Model (QA Model); determining, using the QA Model, an answer to the document level question from the paragraph; and post-processing the answer to determine whether the answer is redundant, incorrect, or irrelevant based on using a further trained model trained to recognize correct answers based on patterns of previous answers to similarly posed questions and determining which answers are most similar to the answer and discard answers deemed redundant, incorrect, or irrelevant. 6. The non-transitory computer readable medium of claim 5 , wherein the opera
Recurrent networks, e.g. Hopfield networks · CPC title
Combinations of networks · CPC title
Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title
Semantic analysis · CPC title
Editing, e.g. inserting or deleting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.