Generating a summary based on readability

US9727641B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9727641-B2
Application numberUS-201313870267-A
CountryUS
Kind codeB2
Filing dateApr 25, 2013
Priority dateApr 25, 2013
Publication dateAug 8, 2017
Grant dateAug 8, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A technique to generate a summary of a set of sentences. Each sentence in the set can be evaluated based on a criterion, such as informativeness of the sentence. The sentences may also be evaluated for readability based on a readability measure. Sentences can be selected for inclusion in the summary based on the evaluations.

First claim

Opening claim text (preview).

What is claimed is: 1. A method executed by a computer system, comprising: extracting a set of sentences from a digital document; scoring each sentence of the set of sentences using a respective informativeness measure; scoring each sentence of the set of sentences using a readability measure, wherein the readability measure is based at least in part on one of: a number of words in the sentence, a number of syllables per word, a frequency of a word based on a vocabulary frequency, a frequency of a word based on context, or if words of the sentence appear on a reading list; selecting selected sentences in the set of sentences based on the readability measures and informativeness measures, wherein the selecting comprises: determining a subset of sentences from the set of sentences, wherein the sentences in the subset of sentences have informativeness measures greater than a threshold, and selecting, from the subset of sentences, the selected sentences based on a ranking of the sentences in the subset of sentences according to readability measures of the sentences in the subset of sentences, wherein the selected sentences are the to ranked sentences in the subset of sentences; identifying a low readability, high informativeness sentence from the set of sentences, wherein: a low readability sentence includes at least one of fewer syllables per word, fewer words on a reading list, or a lower frequency of words associated with a vocabulary frequency list; and a high informativeness sentence includes greater similarity to other sentences in the set of sentences and more words having term frequency-inverse document frequency (tf-idf) values indicating that the words are key words; generating a concatenated sentence by concatenating at least one contextual sentence with the low readability, high informativeness sentence, wherein the concatenated sentence has a higher readability than the low readability, high informativeness sentence; and generating a readable summary of the digital document, the readable summary including the concatenated sentence and the selected sentences. 2. The method of claim 1 , wherein the contextual sentence comprises a sentence preceding or following the identified low readability, high informativeness sentence in the digital document. 3. The method of claim 1 , wherein the selected sentences are selected using a linear program optimization that maximizes informativeness and readability of the readable summary as measured by the informativeness measures and the readability measures of the sentences in the set of sentences. 4. The method of claim 1 , further comprising: computing a readability measure of the concatenated sentence; and including the concatenated sentence in the readable summary in response to the readability measure of the concatenated sentence satisfying a specified criterion. 5. The method of claim 4 , wherein the specified criterion comprises a specified threshold, and including the concatenated sentence in the readable summary is in response to the readability measure of the concatenated sentence exceeding the specified threshold. 6. The method of claim 4 , wherein the specified criterion comprises a threshold amount greater than a readability measure of the low readability, high informativeness sentence, and including the concatenated sentence in the readable summary is in response to the readability measure of the concatenated sentence exceeding the readability measure of the low readability, high informativeness sentence by greater than the threshold amount. 7. A system comprising: a processor; and a non-transitory storage medium storing instructions executable on the processor to: extract a plurality of sentences from a digital document; identify sentences from the plurality of sentences for inclusion in a summary of the digital document based on a criterion; evaluate a readability of the identified sentences using respective readability measures, wherein each readability measure assigned to each sentence is based at least in part on one of: a number of words in the sentence, a number of syllables per word, a frequency of a word based on a vocabulary frequency, a frequency of a word based on context, or if words of the sentence appear on a reading list; select sentences based in part on the evaluated readability of the identified sentences, wherein the selecting comprises: determining a subset of sentences from the plurality of sentences, wherein the sentences in the subset of sentences have informativeness measures greater than a threshold, and selecting, from the subset of sentences, the selected sentences based on a ranking of the sentences in the subset of sentences according to readability measures of the sentences in the subset of sentences, wherein the selected sentences are the to ranked sentences in the subset of sentences; add a low readability, high informativeness sentence to at least one of the selected sentences to create a concatenated sentence, wherein the concatenated sentence has a higher readability than the low readability, high informativeness sentence, and wherein: a low readability sentence includes at least one of fewer syllables per word, fewer words on a reading list, or a lower frequency of words associated with a vocabulary frequency list; and a high informativeness sentence includes greater similarity to other sentences in the plurality of sentences and more words having term frequency-inverse document frequency (tf-idf) values indicating that the words are key words. 8. The system of claim 7 , wherein the instructions are executable on the processor to assign an informativeness measure to each sentence of the plurality of sentences, wherein the identifying is based on the informativeness measures. 9. The system of claim 8 , wherein the criterion is informativeness. 10. The system of claim 7 , wherein the instructions are executable on the processor to: compute a readability measure of the concatenated sentence; and include the concatenated sentence in the summary in response to the readability measure of the concatenated sentence satisfying a specified criterion. 11. A non-transitory computer readable storage medium storing instructions that when executed cause a computer system to: assign a respective informativeness measure to each sentence of a set of sentences in a digital document; assign a respective readability measure to each sentence of the set of sentences; select selected sentences in the set of sentences based on the readability measures and informativeness measures, wherein the selecting comprises: determining a subset of sentences from the set of sentences, wherein the sentences in the subset of sentences have informativeness measures greater than a threshold, and selecting, from the subset of sentences, the selected sentences based on a ranking of the sentences in the subset of sentences according to readability measures of the sentences in the subset of sentences, wherein the selected sentences are the top ranked sentences in the subset of sentences; identify a low readability, high informativeness sentence from the set of sentences, wherein: a low readability sentence includes at least one of fewer syllables per word, fewer words on a reading list, or a lower frequency of words associated with a vocabulary frequency list; and a high informativeness sentence includes greater similarity to other sentences in the set of sentences and more words having term frequency-inverse document frequency (tf-idf) values indicating that the words are key words; generate a concatenated sentence by concatenating at least one contextual sentence onto the low readability, high informativeness

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9727641B2 cover?
A technique to generate a summary of a set of sentences. Each sentence in the set can be evaluated based on a criterion, such as informativeness of the sentence. The sentences may also be evaluated for readability based on a readability measure. Sentences can be selected for inclusion in the summary based on the evaluations.
Who is the assignee on this patent?
Hewlett Packard Development Co Lp, Entit Software Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/30719. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 08 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).