Extractive query-focused multi-document summarization
US-10019525-B1 · Jul 10, 2018 · US
US10885281B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10885281-B2 |
| Application number | US-201816212194-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 6, 2018 |
| Priority date | Dec 6, 2018 |
| Publication date | Jan 5, 2021 |
| Grant date | Jan 5, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A mechanism is provided to implement a summarization mechanism for summarizing an identified natural language document using hyperbolic embeddings. Responsive to receiving a query from a user for a summarization of the identified natural language document, the summarization mechanism produces a hyperbolic embedding model of embeddings of the query. The summarization mechanism compares the embeddings of the query to each of a set of embeddings associated with a set of sentences of the identified natural language document. Responsive to identifying a subset of embeddings associated with the set of sentences of the identified natural language document having a semantic specificity to a subset of embeddings associated with the query, the summarization mechanism adds the sentence to a summary of the identified natural language document. The summarization mechanism then outputs the summary to the user.
Opening claim text (preview).
What is claimed is: 1. A method, in a data processing system comprising at least one processor and at least one memory, the at least one memory comprising instructions that are executed by the at least one processor to cause the at least one processor to be configured to implement a summarization mechanism for summarizing an identified natural language document using hyperbolic embeddings, the method comprising: responsive to receiving a query from a user for a summarization of the identified natural language document, producing, by the summarization mechanism, a hyperbolic embedding model of first embeddings of the query; comparing, by the summarization mechanism, the first embeddings to each second embedding of a set of second embeddings associated with a set of sentences of the identified natural language document; for at least one second embedding of at least one portion of a sentence in the set of sentences, determining, by the summarization mechanism, whether the at least one second embedding has a semantic specificity to at least one first embedding equal to or above a specificity threshold, wherein the specificity threshold is a number of matches between second embeddings and first embeddings; responsive to the at least one second embedding having a semantic specificity equal to or above the specificity threshold, adding, by the summarization mechanism, the sentence to a summary of the identified natural language document; and outputting, by the summarization mechanism, the summary to the user. 2. The method of claim 1 , wherein the query from the user identifies a set of keywords or asset of natural language phrases for which a summarization is to be produced as well as an identification of the identified natural language document. 3. The method of claim 1 , wherein the specificity threshold is defined by the user in the query. 4. The method of claim 1 , wherein the specificity threshold is a dynamic threshold and wherein the dynamic threshold increases or decreases dynamically in order to meet a user defined length for the summary of the identified natural language document. 5. The method of claim 1 , wherein the set of hyperbolic embedding models associated with the set of sentences of the identified natural language document is generated by the method comprising: performing, by the summarization mechanism, a parse on each natural language document in an identified corpus; generating, by the summarization mechanism, a corresponding parse tree representation of each sentence in a set of sentence associated with each natural language document; and for each parse tree representation of the set of sentences of each natural language document, performing, by the summarization mechanism, unsupervised hyperbolic embeddings training on each parse tree representation of each of the set of sentences of each natural language document to produce a hyperbolic embedding model of each sentence in the set of sentences or a phrase or clause within the sentence of the set of sentences of each natural language document. 6. A computer program product comprising a non-transitory computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the data processing system to implement a summarization mechanism for summarizing an identified natural language document using hyperbolic embeddings, and further causes the computing device to: responsive to receiving a query from a user for a summarization of the identified natural language document, produce, by the summarization mechanism, a hyperbolic embedding model of first embeddings of the query; compare, by the summarization mechanism, the first embeddings to each second embedding of a set of second embeddings associated with a set of sentences of the identified natural language document; for at least one second embedding of at least one portion of a sentence in the set of sentences, determining, by the summarization mechanism, whether the at least one second embedding has a semantic specificity to at least one first embedding equal to or above a specificity threshold, wherein the specificity threshold is a number of matches between second embeddings and first embeddings; responsive to the at least one second embedding having a semantic specificity equal to or above the specificity threshold, add, by the summarization mechanism, the sentence to a summary of the identified natural language document; and output, by the summarization mechanism, the summary to the user. 7. The computer program product of claim 6 , wherein the query from the user identifies a set of keywords or asset of natural language phrases for which a summarization is to be produced as well as an identification of the identified natural language document. 8. The computer program product of claim 6 , wherein the specificity threshold is defined by the user in the query. 9. The computer program product of claim 6 , wherein the specificity threshold is a dynamic threshold and wherein the dynamic threshold increases or decreases dynamically in order to meet a user defined length for the summary of the identified natural language document. 10. The computer program product of claim 6 , wherein the set of hyperbolic embedding models associated with the set of sentences of the identified natural language document is generated by the computer readable program further causing the computing device to: perform, by the summarization mechanism, a parse on each natural language document in an identified corpus; generate, by the summarization mechanism, a corresponding parse tree representation of each sentence in a set of sentence associated with each natural language document; and for each parse tree representation of the set of sentences of each natural language document, perform, by the summarization mechanism, unsupervised hyperbolic embeddings training on each parse tree representation of each of the set of sentences of each natural language document to produce a hyperbolic embedding model of each sentence in the set of sentences or a phrase or clause within the sentence of the set of sentences of each natural language document. 11. An apparatus comprising: at least one processor; and at least one memory coupled to the at least one processor, wherein the at least one memory comprises instructions which, when executed by the at least one processor, cause the at least one processor to implement a summarization mechanism for summarizing an identified natural language document using hyperbolic embeddings, and further cause the at least one processor to: responsive to receiving a query from a user for a summarization of the identified natural language document, produce, by the summarization mechanism, a hyperbolic embedding model of first embeddings of the query; compare, by the summarization mechanism, the first embeddings to each second embedding of a set of second embeddings associated with a set of sentences of the identified natural language document; for at least one second embedding of at least one portion of a sentence in the set of sentences, determining, by the summarization mechanism, whether the at least one second embedding has a semantic specificity to at least one first embedding equal to or above a specificity threshold, wherein the specificity threshold is a number of matches between second embeddings and first embeddings; responsive to the at least one second embedding having a semantic specificity equal to or above the specificity threshold, add, by the summarization mechanism, the sentence to a summary of the identified natural language document; and output, by the summariza
Semantic analysis · CPC title
Summarisation for human users · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Parsing · CPC title
Natural language generation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.