Generating high-level questions from sentences

US10769958B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10769958-B2
Application numberUS-201916524798-A
CountryUS
Kind codeB2
Filing dateJul 29, 2019
Priority dateAug 26, 2014
Publication dateSep 8, 2020
Grant dateSep 8, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Questions about a passage of text that includes a sequence of two or more sentences are generated. Each question covers the content of a plurality of sentences in the passage, and includes a context portion of the passage and a question statement that is contextually related to the context portion of the passage. A user is also provided with questions about a passage of text they are reading. Each question is presented to the user, where this presentation includes displaying the context portion of the passage and the question statement that is contextually related to the context portion of the passage.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented process for generating questions about a passage including a sequence of two or more sentences, comprising: receiving computer-readable text data representing the passage; counting occurrences of different phrases in the computer-readable text data; ranking the different phrases by frequency of occurrence; selecting a set of topic phrases based on the ranking of the different phrases; operating a discourse relation prediction model previously trained to predict, for each pair of adjacent clauses in the computer-readable text data, a computer-readable discourse relationship; operating a classifier previously trained to: receive the computer-readable text data, the set of topic phrases, and the computer-readable discourse relationship for each of said pair of adjacent clauses in the computer-readable text data, and output a context clause and a focus clause in the computer-readable text data; translating the context clause into a question statement, wherein the question statement has an answer related to the focus clause; and outputting a question based on the question statement. 2. The process of claim 1 , further comprising outputting an answer text based on the answer related to the focus clause. 3. The process of claim 1 , wherein the classifier is trained on training data including a plurality of annotated passages, wherein each annotated passage includes exemplary computer-readable text data, annotated to indicate a boundary between an exemplary context clause in the computer-readable text data and an exemplary focus clause in the computer-readable text data. 4. The process of claim 1 , wherein the classifier is configured to identify, in the computer-readable text data, an explicit discourse marker indicating a boundary between the context clause and the focus clause in the computer-readable text data. 5. The process of claim 4 , wherein the explicit discourse marker is a phrase from a finite set of phrases. 6. The process of claim 1 , wherein the classifier includes a machine-learning split point boundary classifier previously trained to output a split point boundary location indicating a boundary between the context clause and the focus clause in the computer-readable text data. 7. The process of claim 6 , wherein operating the classifier includes: using the machine-learning split point boundary classifier in conjunction with the set of topic phrases and the computer-readable discourse relationship predicted by the discourse relation prediction model, to identify a set of candidate split point boundaries within said passage; using the machine-learning split point boundary classifier to score each of the candidate split point boundaries; selecting one of the candidate split point boundaries having a highest score; and assigning such selected candidate split point boundary to be the split point boundary location indicating the boundary between the context clause and the focus clause in the computer-readable text data. 8. The process of claim 6 , wherein the machine-learning split point boundary classifier is trained on training data including a plurality of annotated passages, wherein each annotated passage includes exemplary computer-readable text data, an exemplary set of discourse relations for the exemplary computer-readable text data, an exemplary set of topic phrases for the exemplary computer-readable text data, and a split point label indicating an exemplary split point boundary location for the exemplary computer-readable text data. 9. The process of claim 1 , wherein the discourse relation prediction model is trained on training data including a plurality of exemplary adjacent clause pairs, each exemplary adjacent clause pair labelled with an exemplary computer-readable discourse relationship. 10. The process of claim 1 , wherein said passage further comprises one or more noun phrases, and selecting the set of topic phrases based on the ranking of the different phrases includes: identifying each noun phrase in said passage; computing coreference of anaphora in said passage and the identified noun phrases; for each identified noun phrase, determining a syntactic role of the identified noun phrase in one or more syntactic units of said passage that the identified noun phrase appears in; determining the frequency of occurrence of each of the identified noun phrases and anaphora referring thereto in said passage; and ranking the identified noun phrases using the syntactic role of each of the identified noun phrases, and the frequency of occurrence of each of the identified noun phrases and anaphora referring thereto. 11. The process of claim 1 , wherein said passage includes a sequence of word n-grams, and selecting the set of topic phrases based on the ranking of the different phrases includes: identifying each word n-gram in said passage; determining a frequency of occurrence of each identified word n-gram; for each identified word n-gram, adjusting a corresponding frequency of occurrence to account for a length of the identified word n-gram; and ranking the identified word n-grams according to such adjusted frequency of occurrence. 12. The process of claim 1 , wherein the discourse relation prediction model comprises a pre-configured relation template and a pre-trained relation type classifier, and using the discourse relation prediction model to identify the discourse relation between each pair of identified clauses that are adjacent to each other in said passage comprises: whenever the pair of identified clauses that are adjacent to each other in said passage is explicitly connected, using the pre-configured relation template to identify the discourse relationship between said pair; and whenever the pair of the identified clauses that are adjacent to each other in said passage is not explicitly connected, using the pre-trained relation type classifier to identify the discourse relationship between said pair. 13. The process of claim 1 , wherein translating the context clause into the question statement includes: using the discourse relationship predicted for each of said pair of adjacent clauses to compute a computed discourse relation that exists at a boundary between the context clause and the focus clause; selecting a question fragment that corresponds to said computed discourse relation; assigning the selected question fragment to be the question statement; and using the text after the boundary to establish the answer related to the focus clause. 14. The process of claim 13 , wherein selecting the question fragment that corresponds to said computed discourse relation includes using a pre-configured question template to select the question fragment, said template mapping each possible discourse relation to a specific question fragment corresponding thereto. 15. The process of claim 13 , wherein selecting the question fragment that corresponds to said computed discourse relation includes using a pre-trained question type classifier to select said question fragment, said classifier taking into account contextual features of said passage. 16. A system for generating questions about a passage of text, comprising: a logic device; and a storage device holding instructions executable by the logic device to: receive computer-readable text data representing the passage; count occurrences of different phrases in the computer-readable text data; rank the different phrases by frequency of occurrence; select a set of topic phrases based on the ranking of the different phrases; operate a discour

Assignees

Inventors

Classifications

  • Natural language query formulation · CPC title

  • G09B7/00Primary

    Electrically-operated teaching apparatus or devices working with questions and answers (mechanically operated G09B3/00; computing arrangements G06F) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10769958B2 cover?
Questions about a passage of text that includes a sequence of two or more sentences are generated. Each question covers the content of a plurality of sentences in the passage, and includes a context portion of the passage and a question statement that is contextually related to the context portion of the passage. A user is also provided with questions about a passage of text they are reading. E…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/3329. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 08 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).