Generating a summary based on readability
US-9727641-B2 · Aug 8, 2017 · US
US10042880B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10042880-B1 |
| Application number | US-201614989098-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jan 6, 2016 |
| Priority date | Jan 6, 2016 |
| Publication date | Aug 7, 2018 |
| Grant date | Aug 7, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A machine-learning system analyzes electronic books to determine a “start-of-reading location” (SRL) in each book. Based on this location, when an electronic book is opened on a reading device for the first time, the book can be opened to where a reader is likely to want to start reading, automatically skipping past introductory pages. Books are divided into logical blocks (e.g., title page, forward, chapters, etc.), and a title portion and a body-text portion is identified in each block. A title classifier attempts to determine whether or not a block should be marked as the SRL. If the score from the title classifier is indefinite, a body-text classifier is used.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: processing an electronic-book (“e-book”) to extract a plurality of blocks, each block constituting a logical entity within the e-book; categorizing text within each block, of the plurality of blocks, as corresponding to a title or to body-text of that block; determining a first set of title features relating to a first title of a first block of the plurality of blocks, including features based on a bag-of-words analysis of the e-book; providing the first set of title features to a title classifier; receiving, from the title classifier, a first title score for the first set of title features; determining, based on the first title score, that the first block is unlikely to be where a hypothetical person reading the e-book would begin reading; determining a second set of title features relating to a second title of a second block of the plurality of blocks, including features based on the bag-of-words analysis of the e-book; providing the second set of title features to the title classifier; receiving, from the title classifier, a second title score for the second set of title features; determining, based on the second title score, that further processing is required to determine whether or not the second block is likely to be where the hypothetical person reading the e-book would begin reading; determining a first set of body-text features relating to first body-text of the second block of the plurality of blocks, including features based on name-entity recognition; providing the first set of body-text features to a body-text classifier; receiving, from the body-text classifier, a first body-text score for the first set of body-text features; determining, based on the first body-text score, that the second block is likely to be where the hypothetical person reading the e-book would begin reading; and annotating metadata of the e-book with a start-of-reading location (“SRL”) indicator, wherein the SRL indicator indicates, to a computing device used to access the e-book, to output the second block upon initially opening the e-book. 2. The method of claim 1 , further comprising: performing named-entity recognition on the e-book to generate a list of named entities within the e-book, and a frequency of occurrence of each of the named entities in the e-book, wherein named entities include at least names of people and place names; ranking the named entities based on their frequency of occurrence; and selecting a first named entity based on the ranking, wherein determining the first set of body-text features further includes: identifying occurrences of the first named entity in the body-text of the second block; and calculating a ratio of a number of occurrences of the first named entity in the body-text of the second block to a number of named entities in the list of named entities, the first set of body-text features including the ratio. 3. The method of claim 1 , further comprising: comparing words in the body-text of the second block with a list of words that indicate that the second block is likely to occur prior to the SRL, each word in the list being associated with a weight; and calculating a sum of the weight of words in the list that correspond to words occurring in the body-text of the second block, wherein the first set of body-text features includes the sum. 4. A computing system comprising: at least one processor; and a memory including instructions operable to be executed by the at least one processor to configure the computing system to: process a first electronic document to determine a first block and a second block, each block constituting a logical entity within the first electronic document; categorize portions of the first block to identify a first title portion and a first body-text portion; determine a first plurality of features from the first block, wherein the first plurality of features relate, at least in part, to the first body-text portion; provide the first plurality of features from the first block to a first classifier, the first classifier to identify whether the first block is likely to be where a hypothetical person would begin reading the first electronic document; determine, based on a first score output by the first classifier in response to the first plurality of features, that the first block is not likely to be where the hypothetical person would begin reading the electronic document; categorize portions of the second block to identify a second title, a second title portion and a second body-text portion; determine a second plurality of features from a second block, wherein the second plurality of features relate, at least in part, to the second body-text portion; provide the second plurality of features to the first classifier; determine, based on a second score output by the first classifier in response to the second plurality of features, that the second block is likely to be where the hypothetical person would begin reading the first electronic document; and generate data for the first electronic document to indicate a start-of-reading location to a document output device, used to access the first electronic document, to output the second block upon initially opening the first electronic document. 5. The computing system of claim 4 , the memory further comprises instructions that further configure the computing system to: determine a third plurality of features from the second block, the third plurality of features relating, at least in part, to the second title portion; provide the third plurality of features to a second classifier, the second classifier to identify whether the second block is likely to be where the hypothetical person would begin reading the first electronic document based on the second title portion; and determine, based on a third score output by the second classifier in response to the third plurality of features, that the second block is not likely to be where the hypothetical person would begin reading the first electronic document, wherein the third score is determined prior to the second plurality of features being provided to the first classifier. 6. The computing system of claim 5 , the memory further comprises instructions that further configure the computing system to: process each document in a training set of documents to determine training blocks, each training block constituting a logical entity within the training set; categorize a portion of each training block as being a title portion of the training block; determine a frequency of occurrence of each words appearing in title portions of the training blocks; rank the words based on their frequency of occurrence; and select a set of the words based on their ranking, wherein the instructions to determine the third plurality of features further configure the computing device to: determine which of the words in the set occur in the second title portion and which of the words in the set do not occur in the second title portion, wherein the third plurality of features include an indication of how many of the words in the set do occur in the second title portion, and how many of the words in the set do not occur in the second title portion. 7. The computing system of claim 5 , the memory further comprises instructions that further configure the computing system to: process each document in a set of documents to determine training blocks, wherein metadata for each document in the set includes an annotation indicating from where the document should be opened upon initially opening the respective document; categorize portions of each training block, categories for the portions including a title portion and a body-text portion; train the first classifi
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces · CPC title
Hierarchical processing, e.g. outlines · CPC title
Handling of whitespace · CPC title
Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.