Scalable and effective document summarization framework

US10042924B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10042924-B2
Application numberUS-201615019646-A
CountryUS
Kind codeB2
Filing dateFeb 9, 2016
Priority dateFeb 9, 2016
Publication dateAug 7, 2018
Grant dateAug 7, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and apparatuses are disclosed for adaptively generating a summary of web-based content based on an attribute of a mobile communication device having transmitted a request for the web-based content. By adaptively generating the summary based on an attribute of the mobile communication device such as an amount of visual space available or a number of characters permitted in the interface, a display of the web-based content may be controlled on the mobile communication device in a way that was not previously available. This enables control of displaying web-based content that has been adaptively generated to be displayed on limited display screens based on a learned attribute of the mobile communication device requesting the web-based content.

First claim

Opening claim text (preview).

What is claimed is: 1. A summarization engine comprising: a memory configured to store a document including textual information; an interface configured to receive a viewing request from a communication device, the viewing request corresponding to the document; and a processor configured to: communicate with the memory and interface; in response to receiving the viewing request, determine a target summary length, wherein the target summary length identifies a targeted length for a generated summary; extract the textual information from the document; parse the textual information; identify a plurality of sentence structures from the textual information based on the parsing; assign each sentence structure a weighted score based on the target summary length; generate, in accordance with a summarization policy, candidate summaries to include one or more sentence structures from the plurality of sentence structures; determine a candidate score for each candidate summary as a linear function of the scores assigned to each sentence structure included in the respective candidate summary; learn coefficients of the linear function from a training dataset of documents and authored summaries via a predetermined learning algorithm; and select, from the generated candidate summaries, a generated summary determined to have the highest candidate score under the linear function. 2. The summarization engine of claim 1 , wherein the processor is configured to generate the weighted score by: determining a position of each sentence structure within the document; and generating the weighted score of each sentence structure based on the determined position of each corresponding sentence structure. 3. The summarization engine of claim 1 , wherein the processor is configured to generate the weighted score by: determining a content of each sentence structure within the document; and generating the weighted score of each sentence structure based on the determined content of each corresponding sentence structure. 4. The summarization engine of claim 1 , wherein the processor is configured to generate the weighted score by: identifying one or more tokens from each sentence structure, wherein each identified token corresponds to a token type; determining a token type of each identified token; and generating the weighted score of each sentence structure based on the determined token type of each identified token of each corresponding sentence structure. 5. The summarization engine of claim 1 , wherein the processor is configured to generate the weighted score by: identifying one or more lexical cues from each sentence structure; and generating the weighted score of each sentence structure based on the identified lexical cues of each corresponding sentence structure. 6. The summarization engine of claim 1 , wherein the target summary length is included in the viewing request; and wherein the processor is configured to determine the target summary length by extracting the target summary length from the viewing request. 7. The summarization engine of claim 6 , wherein the target summary length corresponds to an attribute of the communication device transmitting the viewing request. 8. A method for generating a summary of a document, the method comprising: receiving, through an interface, a viewing request from a communication device, the viewing request corresponding to a document including textual information stored on a memory; in response to receiving the viewing request, determining a target summary length; extracting the textual information from the document; parsing the textual information; identifying a plurality of sentence structures from the textual information based on the parsing; assigning each sentence structure a weighted score based on the target summary length; generating, in accordance with a summarization policy, candidate summaries to include one or more sentence structures from the plurality of sentence structures; determining a candidate score for each candidate summary as a linear function of the scores assigned to each sentence structure included in the respective candidate summary; learning coefficients of the linear function from a training dataset of documents and authored summaries via a predetermined learning algorithm; and selecting, from the generated candidate summaries, a generated summary determined to have the highest candidate score under the linear function. 9. The method of claim 8 , further comprising generating the weighted score by: determining a position of each sentence structure within the document; and generating the weighted score of each sentence structure based on the determined position of each corresponding sentence structure. 10. The method of claim 8 , further comprising generating the weighted score by: determining a content of each sentence structure within the document; and generating the weighted score of each sentence structure based on the determined content of each corresponding sentence structure. 11. The method of claim 8 , further comprising generating the weighted score by: identifying one or more tokens from each sentence structure, wherein each identified token corresponds to a token type; determining a token type of each identified token; and generating the weighted score of each sentence structure based on the determined token type of each identified token of each corresponding sentence structure. 12. The method of claim 8 , wherein assigning each sentence structure the weighted score based on the analysis and the target summary length comprises: scoring each sentence structure as it appears in the document; and scoring each sentence structure as it appears in a partial summary including one or more sentence structures already selected from the document. 13. The method of claim 8 , wherein the predetermined learning algorithm is a structured perception. 14. The method of claim 8 , wherein generating, in accordance with the summarization policy, the candidate summaries comprises: considering each sentence structure included in the document; revising each sentence score corresponding to a considered sentence structure; and adding a highest-scoring sentence structure to the candidate summary such that the target summary length is not exceeded. 15. The method of claim 8 , wherein generating, in accordance with the summarization policy, the candidate summaries comprises: considering each sentence structure in the document that appears after every sentence structure in the partial summary; revising each sentence score corresponding to a considered sentence structure; and adding a highest-scoring sentence structure to the candidate summary such that the target summary length is not exceeded. 16. The method of claim 8 , wherein generating, in accordance with the summarization policy, the candidate summaries comprises: considering each sentence structure in the document that appears at a beginning or an end of a paragraph in the document; revising each sentence score corresponding to a considered sentence structure; and adding a highest-scoring sentence to the candidate summary such that the target summary length is not exceeded. 17. A method for generating a summary of a document, the method comprising: receiving, through an interface, a viewing request from a communication device, the viewing request corresponding to a document including textual information stored on a memory; in response to receiving the viewing request, determining a target summary length; ext

Assignees

Inventors

Classifications

  • Discourse or dialogue representation · CPC title

  • Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • G06F16/345Primary

    Summarisation for human users · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10042924B2 cover?
Systems, methods, and apparatuses are disclosed for adaptively generating a summary of web-based content based on an attribute of a mobile communication device having transmitted a request for the web-based content. By adaptively generating the summary based on an attribute of the mobile communication device such as an amount of visual space available or a number of characters permitted in the …
Who is the assignee on this patent?
Yahoo Holdings Inc, Oath Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/345. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 07 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).