Natural language document summarization using hyperbolic embeddings

US10885281B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10885281-B2
Application numberUS-201816212194-A
CountryUS
Kind codeB2
Filing dateDec 6, 2018
Priority dateDec 6, 2018
Publication dateJan 5, 2021
Grant dateJan 5, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A mechanism is provided to implement a summarization mechanism for summarizing an identified natural language document using hyperbolic embeddings. Responsive to receiving a query from a user for a summarization of the identified natural language document, the summarization mechanism produces a hyperbolic embedding model of embeddings of the query. The summarization mechanism compares the embeddings of the query to each of a set of embeddings associated with a set of sentences of the identified natural language document. Responsive to identifying a subset of embeddings associated with the set of sentences of the identified natural language document having a semantic specificity to a subset of embeddings associated with the query, the summarization mechanism adds the sentence to a summary of the identified natural language document. The summarization mechanism then outputs the summary to the user.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, in a data processing system comprising at least one processor and at least one memory, the at least one memory comprising instructions that are executed by the at least one processor to cause the at least one processor to be configured to implement a summarization mechanism for summarizing an identified natural language document using hyperbolic embeddings, the method comprising: responsive to receiving a query from a user for a summarization of the identified natural language document, producing, by the summarization mechanism, a hyperbolic embedding model of first embeddings of the query; comparing, by the summarization mechanism, the first embeddings to each second embedding of a set of second embeddings associated with a set of sentences of the identified natural language document; for at least one second embedding of at least one portion of a sentence in the set of sentences, determining, by the summarization mechanism, whether the at least one second embedding has a semantic specificity to at least one first embedding equal to or above a specificity threshold, wherein the specificity threshold is a number of matches between second embeddings and first embeddings; responsive to the at least one second embedding having a semantic specificity equal to or above the specificity threshold, adding, by the summarization mechanism, the sentence to a summary of the identified natural language document; and outputting, by the summarization mechanism, the summary to the user. 2. The method of claim 1 , wherein the query from the user identifies a set of keywords or asset of natural language phrases for which a summarization is to be produced as well as an identification of the identified natural language document. 3. The method of claim 1 , wherein the specificity threshold is defined by the user in the query. 4. The method of claim 1 , wherein the specificity threshold is a dynamic threshold and wherein the dynamic threshold increases or decreases dynamically in order to meet a user defined length for the summary of the identified natural language document. 5. The method of claim 1 , wherein the set of hyperbolic embedding models associated with the set of sentences of the identified natural language document is generated by the method comprising: performing, by the summarization mechanism, a parse on each natural language document in an identified corpus; generating, by the summarization mechanism, a corresponding parse tree representation of each sentence in a set of sentence associated with each natural language document; and for each parse tree representation of the set of sentences of each natural language document, performing, by the summarization mechanism, unsupervised hyperbolic embeddings training on each parse tree representation of each of the set of sentences of each natural language document to produce a hyperbolic embedding model of each sentence in the set of sentences or a phrase or clause within the sentence of the set of sentences of each natural language document. 6. A computer program product comprising a non-transitory computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the data processing system to implement a summarization mechanism for summarizing an identified natural language document using hyperbolic embeddings, and further causes the computing device to: responsive to receiving a query from a user for a summarization of the identified natural language document, produce, by the summarization mechanism, a hyperbolic embedding model of first embeddings of the query; compare, by the summarization mechanism, the first embeddings to each second embedding of a set of second embeddings associated with a set of sentences of the identified natural language document; for at least one second embedding of at least one portion of a sentence in the set of sentences, determining, by the summarization mechanism, whether the at least one second embedding has a semantic specificity to at least one first embedding equal to or above a specificity threshold, wherein the specificity threshold is a number of matches between second embeddings and first embeddings; responsive to the at least one second embedding having a semantic specificity equal to or above the specificity threshold, add, by the summarization mechanism, the sentence to a summary of the identified natural language document; and output, by the summarization mechanism, the summary to the user. 7. The computer program product of claim 6 , wherein the query from the user identifies a set of keywords or asset of natural language phrases for which a summarization is to be produced as well as an identification of the identified natural language document. 8. The computer program product of claim 6 , wherein the specificity threshold is defined by the user in the query. 9. The computer program product of claim 6 , wherein the specificity threshold is a dynamic threshold and wherein the dynamic threshold increases or decreases dynamically in order to meet a user defined length for the summary of the identified natural language document. 10. The computer program product of claim 6 , wherein the set of hyperbolic embedding models associated with the set of sentences of the identified natural language document is generated by the computer readable program further causing the computing device to: perform, by the summarization mechanism, a parse on each natural language document in an identified corpus; generate, by the summarization mechanism, a corresponding parse tree representation of each sentence in a set of sentence associated with each natural language document; and for each parse tree representation of the set of sentences of each natural language document, perform, by the summarization mechanism, unsupervised hyperbolic embeddings training on each parse tree representation of each of the set of sentences of each natural language document to produce a hyperbolic embedding model of each sentence in the set of sentences or a phrase or clause within the sentence of the set of sentences of each natural language document. 11. An apparatus comprising: at least one processor; and at least one memory coupled to the at least one processor, wherein the at least one memory comprises instructions which, when executed by the at least one processor, cause the at least one processor to implement a summarization mechanism for summarizing an identified natural language document using hyperbolic embeddings, and further cause the at least one processor to: responsive to receiving a query from a user for a summarization of the identified natural language document, produce, by the summarization mechanism, a hyperbolic embedding model of first embeddings of the query; compare, by the summarization mechanism, the first embeddings to each second embedding of a set of second embeddings associated with a set of sentences of the identified natural language document; for at least one second embedding of at least one portion of a sentence in the set of sentences, determining, by the summarization mechanism, whether the at least one second embedding has a semantic specificity to at least one first embedding equal to or above a specificity threshold, wherein the specificity threshold is a number of matches between second embeddings and first embeddings; responsive to the at least one second embedding having a semantic specificity equal to or above the specificity threshold, add, by the summarization mechanism, the sentence to a summary of the identified natural language document; and output, by the summariza

Assignees

Inventors

Classifications

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Summarisation for human users · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • G06F40/205Primary

    Parsing · CPC title

  • Natural language generation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10885281B2 cover?
A mechanism is provided to implement a summarization mechanism for summarizing an identified natural language document using hyperbolic embeddings. Responsive to receiving a query from a user for a summarization of the identified natural language document, the summarization mechanism produces a hyperbolic embedding model of embeddings of the query. The summarization mechanism compares the embed…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 05 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).