Question and answer pair generation using machine learning
US-2019228099-A1 · Jul 25, 2019 · US
US11080598B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11080598-B2 |
| Application number | US-201815979855-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 15, 2018 |
| Priority date | May 15, 2018 |
| Publication date | Aug 3, 2021 |
| Grant date | Aug 3, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In an example embodiment, factual question generation from freeform content is achieved through semantic role labeling and recurrent neural networks (RNNs). Specifically, semantic role labeling is used to identify an answer phrase so that it can be replaced with an appropriate question word. RNNs are then used to extract triples (Subject-Object-Predicate) from the sentence, and each of these triples can be used as an answer phrase/word. An RNN is then fed with training data to generate the questions more efficiently.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a memory; and a computer-readable medium having instructions stored thereon, which, when executed by a processor, cause the system to perform operations comprising: obtain computerized textual input; tokenize the computerized textual input into a plurality of sentences; identify keywords in the plurality of sentences; generate a summary of the computerized textual input by automatically selecting a predetermined percentage of sentences from the plurality of sentences using the identified keywords; for each sentence in the summary: for each keyword in the sentence: replace the keyword with a gap; use a semantic-based approach to transform the gap-filled sentence into a question; and use a recurrent neural network approach to transform the gap-filled sentence into a question; compute an intersection between questions generated using the semantic-based approach and questions generated using the recurrent neural network approach; and cause one or more of the questions in the intersection to be presented in a graphical user interface. 2. The system of claim 1 , wherein the generating the summary comprises: generating a frequency map of words in the plurality of sentences; filtering out words in the frequency map; identifying a number of non-filtered-out words in the frequency map having a highest frequency as keywords; ranking each of the plurality of sentences based on a number of keywords contained in each of the plurality of sentences; and generating the summary by selecting the predetermined percentage of sentences from the plurality of sentences having the highest rank. 3. The system of claim 1 , wherein the instructions further cause the system to: perform co-reference resolution on the plurality of sentences by, for each sentence of the plurality of sentences: parsing the sentence using a natural language processing (NLP) parser to identify pronouns and noun phrases; identifying attributes associated with each pronoun and noun phrase, the attributes including singularity, living/non-living status, and gender, with values assigned to each attribute for each pronoun or noun phrase being selected from a fixed value, a wild-card value, and a non-applicable value; and for each pronoun, mapping the pronoun to a referent noun phrase by identifying a closest noun phrase having matching attributes. 4. The system of claim 1 , wherein the instructions further cause the system to identify a pronoun as expletive if it has no noun phrase having matching attributes within two sentences. 5. The system of claim 1 , wherein a wild-card value for an attribute is considered a limited matching attribute for a particular noun phrase, and the mapping includes mapping the pronoun to the referent noun phrase by identifying a closest noun phrase having limited matching attributes if there are no noun phrases having matching attributes within two sentences. 6. The system of claim 1 , wherein the graphical user interface is designed to present one or more questions in the intersection to a user and solicit answers, wherein incorrect answers by users identified as high confidence are passed to a recurrent neural network used in the recurrent neural network approach to retrain the recurrent neural network. 7. The system of claim 1 , wherein the using the semantic-based approach includes: performing semantic role labeling on the sentence to identify potential answer phrases; for each potential answer phrase, identifying a corresponding verb complex; applying one or more transformations to the sentence based on a predicate and target identified by the semantic role labeling; and applying one or more syntactic transformation rules to the verb complex of the sentence to transform the sentence into a question. 8. A method comprising: obtaining computerized textual input; tokenizing the computerized textual input into a plurality of sentences; identifying keywords in the plurality of sentences; generating a summary of the computerized textual input by automatically selecting a predetermined percentage of sentences from the plurality of sentences using the identified keywords; for each sentence in the summary: for each keyword in the sentence: replacing the keyword with a gap; using a semantic-based approach to transform the gap-filled sentence into a question; and using a recurrent neural network approach to transform the gap-filled sentence into a question; computing an intersection between questions generated using the semantic-based approach and questions generated using the recurrent neural network approach; and causing one or more of the questions in the intersection to be presented in a graphical user interface. 9. The method of claim 8 , wherein the generating the summary comprises: generating a frequency map of words in the plurality of sentences; filtering out words in the frequency map; identifying a number of non-filtered-out words in the frequency map having a highest frequency as keywords; ranking each of the plurality of sentences based on a number of keywords contained in each of the plurality of sentences; and generating the summary by selecting the predetermined percentage of sentences from the plurality of sentences having the highest rank. 10. The method of claim 8 , further comprising: performing co-reference resolution on the plurality of sentences by, for each sentence of the plurality of sentences: parsing the sentence using a natural language processing (NLP) parser to identify pronouns and noun phrases; identifying attributes associated with each pronoun and noun phrase, the attributes including singularity, living/non-living status, and gender, with values assigned to each attribute for each pronoun or noun phrase being selected from a fixed value, a wild-card value, and a non-applicable value; and for each pronoun, mapping the pronoun to a referent noun phrase by identifying a closest noun phrase having matching attributes. 11. The method of claim 8 , further comprising identifying a pronoun as expletive if it has no noun phrase having matching attributes within two sentences. 12. The method of claim 8 , wherein a wild-card value for an attribute is considered a limited matching attribute for a particular noun phrase, and the mapping includes mapping the pronoun to the referent noun phrase by identifying a closest noun phrase having limited matching attributes if there are no noun phrases having matching attributes within two sentences. 13. The method of claim 8 , wherein the graphical user interface is designed to present one or more questions in the intersection to a user and solicit answers, wherein incorrect answers by users identified as high confidence are passed to a recurrent neural network used in the recurrent neural network approach to retrain the recurrent neural network. 14. The method of claim 8 , wherein the using the semantic-based approach includes: performing semantic role labeling on the sentence to identify potential answer phrases; for each potential answer phrase, identifying a corresponding verb complex; applying one or more transformations to the sentence based on a predicate and target identified by the semantic role labeling; and applying one or more syntactic transformation rules to the verb complex of the sentence to transform the sentence into a question. 15. A non-transitory machine-readable storage medium comprising instructions which, when implemented by one or more machines, cause the one or more machines to perform operations comprising: obtaining computerized text
Knowledge-based neural networks; Logical representations of neural networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Learning methods · CPC title
Supervised learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.