Creation of a summary for a plurality of texts

US11762893B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11762893-B2
Application numberUS-202117550121-A
CountryUS
Kind codeB2
Filing dateDec 14, 2021
Priority dateAug 22, 2016
Publication dateSep 19, 2023
Grant dateSep 19, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Creating a summary of a plurality of texts includes tokenizing each of a plurality of texts to obtain tokens; generating a vector space using a first set of vectors having one or more obtained feature scores equal to or larger than a predefined value; executing non-hierarchical clustering using the vector space to generate a first plurality of clusters; choosing a first representative text in each of the plurality of clusters; generating a second set of vectors from each of the arrays generated based on a number of characters included in tokens of the representative texts; executing hierarchical clustering using the second set of vectors to generate a second plurality of clusters; and in response to a determining a number of clusters included in the second plurality of clusters, determining a second representative text for each of the clusters included in the second plurality of clusters.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for summarizing a plurality of texts, the method comprising: generating a vector space based on a first set of vectors, wherein each vector includes one or more feature scores determined from tokens of the plurality of texts; determining a number of clusters that will be included in a first plurality of clusters when non-hierarchical clustering generates the first plurality of clusters, wherein the number of clusters that will be generated in the non-hierarchical clustering is determined according to a number of texts included in the plurality of texts; executing the non-hierarchical clustering using the vector space to generate the first plurality of clusters; generating a second set of vectors based on quantities of characters in tokens of first representative texts, wherein each first representative text is selected from a corresponding cluster of the first plurality of clusters, and wherein arrays are generated based on the quantities of characters in the tokens of the first representative texts to generate the second set of vectors when the number of clusters that will be generated in the non-hierarchical clustering is equal to or larger than a predefined number; aligning a dimension of each of the vectors of the second set of vectors and executing hierarchical clustering using the second set of vectors to generate a second plurality of clusters; determining a second representative text for each of the clusters included in the second plurality of clusters that represents a summary of the plurality of texts; and displaying, on a display, a visualization of the second plurality of clusters and an element that alters a number of clusters in the second plurality of clusters, wherein when the element alters the number of clusters in the second plurality of clusters, the visualization is automatically changed to reflect the altered number of clusters in the second plurality of clusters. 2. The computer-implemented method of claim 1 , wherein executing the hierarchical clustering generates a tree diagram. 3. The computer-implemented method of claim 2 , wherein determining the second representative text for each of the clusters in the second plurality of clusters further comprises: applying a threshold to the tree diagram. 4. The computer-implemented method of claim 3 , further comprising: updating the number of clusters in the second plurality of clusters and the second representative text for each of the clusters in the second plurality of clusters by changing the threshold to a value altering the number of clusters in the second plurality of clusters. 5. The computer-implemented method of claim 1 , wherein a number of texts in each of the clusters in the second plurality of clusters is obtained after the execution of the hierarchical clustering. 6. The computer-implemented method of claim 1 , wherein the element changes one or more of the number of clusters in the second plurality of clusters and a distance between the clusters in the second plurality of clusters. 7. The computer-implemented method of claim 6 , wherein, as a result of the execution of the hierarchical clustering, the display further displays one or more of: a tree diagram; the number of clusters included in the second plurality of clusters; a representative text for each of the clusters in the second plurality of clusters; and a URL for a source of the second representative text. 8. The computer-implemented method of claim 1 , further comprising: sending an alert to a user or displaying an alert on a display when the second representative text for one or more of the clusters in the second plurality of clusters has a predefined term. 9. The computer-implemented method of claim 1 , wherein the aligning a dimension of each of the vectors of the second set of vectors comprises: truncating one or more array elements in each array of the arrays by a predefined number of array elements from a beginning of the array; or padding a tail of each array of the arrays so that a number of digits in each array becomes the predefined number of array elements. 10. The computer-implemented method of claim 1 , wherein the number of clusters included in the second plurality of clusters is determined automatically or by a user. 11. A system comprising: a processor; and a memory storing a program, which, when executed on the processor, summarizes a plurality of texts, the processor configured to perform operations comprising: generating a vector space based on a first set of vectors, wherein each vector includes one or more feature scores determined from tokens of the plurality of texts; determining a number of clusters that will be included in a first plurality of clusters when non-hierarchical clustering generates the first plurality of clusters, wherein the number of clusters that will be generated in the non-hierarchical clustering is determined according to a number of texts included in the plurality of texts; executing the non-hierarchical clustering using the vector space to generate the first plurality of clusters; generating a second set of vectors based on quantities of characters in tokens of first representative texts, wherein each first representative text is selected from a corresponding cluster of the first plurality of clusters, and wherein arrays are generated based on the quantities of characters in the tokens of the first representative texts to generate the second set of vectors when the number of clusters that will be generated in the non-hierarchical clustering is equal to or larger than a predefined number; aligning a dimension of each of the vectors of the second set of vectors and executing hierarchical clustering using the second set of vectors to generate a second plurality of clusters; determining a second representative text for each of the clusters included in the second plurality of clusters that represents a summary of the plurality of texts; and displaying, on a display, a visualization of the second plurality of clusters and an element that alters a number of clusters in the second plurality of clusters, wherein when the element alters the number of clusters in the second plurality of clusters, the visualization is automatically changed to reflect the altered number of clusters in the second plurality of clusters. 12. The system of claim 11 , wherein executing the hierarchical clustering generates a tree diagram. 13. The system of claim 12 , wherein determining the second representative text for each of the clusters in the second plurality of clusters further comprises one or more of: applying a threshold to the tree diagram; and updating the number of clusters in the second plurality of clusters and the second representative text for each of the clusters in the second plurality of clusters by changing the threshold to a value altering the number of clusters in the second plurality of clusters. 14. A computer program product for summarizing a plurality of texts, the computer program product comprising one or more computer readable storage media collectively having program instructions embodied therewith that are executable by at least one processor to cause the at least one processor to: generate a vector space based on a first set of vectors, wherein each vector includes one or more feature scores determined from tokens of the plurality of texts; determine a number of clusters that will be included in a first plurality of clusters when non-hierarchical clustering generates the first plurality of clusters, wherein the number of clusters that will be generated in the non-hierarchic

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11762893B2 cover?
Creating a summary of a plurality of texts includes tokenizing each of a plurality of texts to obtain tokens; generating a vector space using a first set of vectors having one or more obtained feature scores equal to or larger than a predefined value; executing non-hierarchical clustering using the vector space to generate a first plurality of clusters; choosing a first representative text in e…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/345. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).