Creation of a summary for a plurality of texts
US-2019317956-A1 · Oct 17, 2019 · US
US11762893B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11762893-B2 |
| Application number | US-202117550121-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 14, 2021 |
| Priority date | Aug 22, 2016 |
| Publication date | Sep 19, 2023 |
| Grant date | Sep 19, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Creating a summary of a plurality of texts includes tokenizing each of a plurality of texts to obtain tokens; generating a vector space using a first set of vectors having one or more obtained feature scores equal to or larger than a predefined value; executing non-hierarchical clustering using the vector space to generate a first plurality of clusters; choosing a first representative text in each of the plurality of clusters; generating a second set of vectors from each of the arrays generated based on a number of characters included in tokens of the representative texts; executing hierarchical clustering using the second set of vectors to generate a second plurality of clusters; and in response to a determining a number of clusters included in the second plurality of clusters, determining a second representative text for each of the clusters included in the second plurality of clusters.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for summarizing a plurality of texts, the method comprising: generating a vector space based on a first set of vectors, wherein each vector includes one or more feature scores determined from tokens of the plurality of texts; determining a number of clusters that will be included in a first plurality of clusters when non-hierarchical clustering generates the first plurality of clusters, wherein the number of clusters that will be generated in the non-hierarchical clustering is determined according to a number of texts included in the plurality of texts; executing the non-hierarchical clustering using the vector space to generate the first plurality of clusters; generating a second set of vectors based on quantities of characters in tokens of first representative texts, wherein each first representative text is selected from a corresponding cluster of the first plurality of clusters, and wherein arrays are generated based on the quantities of characters in the tokens of the first representative texts to generate the second set of vectors when the number of clusters that will be generated in the non-hierarchical clustering is equal to or larger than a predefined number; aligning a dimension of each of the vectors of the second set of vectors and executing hierarchical clustering using the second set of vectors to generate a second plurality of clusters; determining a second representative text for each of the clusters included in the second plurality of clusters that represents a summary of the plurality of texts; and displaying, on a display, a visualization of the second plurality of clusters and an element that alters a number of clusters in the second plurality of clusters, wherein when the element alters the number of clusters in the second plurality of clusters, the visualization is automatically changed to reflect the altered number of clusters in the second plurality of clusters. 2. The computer-implemented method of claim 1 , wherein executing the hierarchical clustering generates a tree diagram. 3. The computer-implemented method of claim 2 , wherein determining the second representative text for each of the clusters in the second plurality of clusters further comprises: applying a threshold to the tree diagram. 4. The computer-implemented method of claim 3 , further comprising: updating the number of clusters in the second plurality of clusters and the second representative text for each of the clusters in the second plurality of clusters by changing the threshold to a value altering the number of clusters in the second plurality of clusters. 5. The computer-implemented method of claim 1 , wherein a number of texts in each of the clusters in the second plurality of clusters is obtained after the execution of the hierarchical clustering. 6. The computer-implemented method of claim 1 , wherein the element changes one or more of the number of clusters in the second plurality of clusters and a distance between the clusters in the second plurality of clusters. 7. The computer-implemented method of claim 6 , wherein, as a result of the execution of the hierarchical clustering, the display further displays one or more of: a tree diagram; the number of clusters included in the second plurality of clusters; a representative text for each of the clusters in the second plurality of clusters; and a URL for a source of the second representative text. 8. The computer-implemented method of claim 1 , further comprising: sending an alert to a user or displaying an alert on a display when the second representative text for one or more of the clusters in the second plurality of clusters has a predefined term. 9. The computer-implemented method of claim 1 , wherein the aligning a dimension of each of the vectors of the second set of vectors comprises: truncating one or more array elements in each array of the arrays by a predefined number of array elements from a beginning of the array; or padding a tail of each array of the arrays so that a number of digits in each array becomes the predefined number of array elements. 10. The computer-implemented method of claim 1 , wherein the number of clusters included in the second plurality of clusters is determined automatically or by a user. 11. A system comprising: a processor; and a memory storing a program, which, when executed on the processor, summarizes a plurality of texts, the processor configured to perform operations comprising: generating a vector space based on a first set of vectors, wherein each vector includes one or more feature scores determined from tokens of the plurality of texts; determining a number of clusters that will be included in a first plurality of clusters when non-hierarchical clustering generates the first plurality of clusters, wherein the number of clusters that will be generated in the non-hierarchical clustering is determined according to a number of texts included in the plurality of texts; executing the non-hierarchical clustering using the vector space to generate the first plurality of clusters; generating a second set of vectors based on quantities of characters in tokens of first representative texts, wherein each first representative text is selected from a corresponding cluster of the first plurality of clusters, and wherein arrays are generated based on the quantities of characters in the tokens of the first representative texts to generate the second set of vectors when the number of clusters that will be generated in the non-hierarchical clustering is equal to or larger than a predefined number; aligning a dimension of each of the vectors of the second set of vectors and executing hierarchical clustering using the second set of vectors to generate a second plurality of clusters; determining a second representative text for each of the clusters included in the second plurality of clusters that represents a summary of the plurality of texts; and displaying, on a display, a visualization of the second plurality of clusters and an element that alters a number of clusters in the second plurality of clusters, wherein when the element alters the number of clusters in the second plurality of clusters, the visualization is automatically changed to reflect the altered number of clusters in the second plurality of clusters. 12. The system of claim 11 , wherein executing the hierarchical clustering generates a tree diagram. 13. The system of claim 12 , wherein determining the second representative text for each of the clusters in the second plurality of clusters further comprises one or more of: applying a threshold to the tree diagram; and updating the number of clusters in the second plurality of clusters and the second representative text for each of the clusters in the second plurality of clusters by changing the threshold to a value altering the number of clusters in the second plurality of clusters. 14. A computer program product for summarizing a plurality of texts, the computer program product comprising one or more computer readable storage media collectively having program instructions embodied therewith that are executable by at least one processor to cause the at least one processor to: generate a vector space based on a first set of vectors, wherein each vector includes one or more feature scores determined from tokens of the plurality of texts; determine a number of clusters that will be included in a first plurality of clusters when non-hierarchical clustering generates the first plurality of clusters, wherein the number of clusters that will be generated in the non-hierarchic
Summarisation for human users · CPC title
Clustering; Classification · CPC title
Trees · CPC title
Monitoring · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.