User behavior segmentation using latent topic detection
US-10242019-B1 · Mar 26, 2019 · US
US2025117443A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025117443-A1 |
| Application number | US-202318482975-A |
| Country | US |
| Kind code | A1 |
| Filing date | Oct 9, 2023 |
| Priority date | Oct 9, 2023 |
| Publication date | Apr 10, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method for performing data difference evaluation is provided. Aspects include obtaining a first data set and a second data set, creating a first plurality of feature vectors by inputting the first data set into each of a plurality of models, and creating a second plurality of feature vectors by inputting the second data set into each of the plurality of models. Aspects also include identifying a mapping between elements of the first plurality of vectors and elements the second plurality of feature vectors created by a same model of the plurality of models, calculating, for each of the plurality of models based at least in part on the mapping, a model distance between the first data set and the second data set, and calculating, based at least in part on the model distances, an ensemble distance between first data set and the second data set.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for data difference evaluation, the computer-implemented method comprising: obtaining a first data set and a second data set; inputting the first data set into each of a plurality of clustering models, wherein each of the plurality of clustering models separates the first data set into a different number of clusters; storing an output of each of the plurality of clustering models corresponding to the first data set into a first plurality of cluster vectors, where each of the first plurality of cluster vectors has a dimension that corresponds to the number of clusters; inputting the second data set into each of the plurality of clustering models, wherein each of the plurality of clustering models separates the second data set into a different number of clusters; storing the output of each of the plurality of clustering models corresponding to the second data set into a second plurality of cluster vectors, where each of the second plurality of cluster vectors has a dimension that corresponds to the number of clusters; identifying a mapping between elements of the first plurality of cluster vectors and elements the second plurality of cluster vectors having a same dimension; calculating, for each dimension based at least in part on the mapping, a dimensional distance between the first data set and the second data set; and calculating, based at least in part on the dimensional distances, an ensemble distance between first data set and the second data set. 2 . The computer-implemented method of claim 1 , wherein each of the elements of the first plurality of cluster vectors and the elements of the second plurality of cluster vectors each include a data cluster and wherein the mapping is identified based on a centroid for each data cluster. 3 . The computer-implemented method of claim 2 , wherein the dimensional distance between the first data set and the second data set for each dimension is calculated based on a size of the first data set, a size of the second data set, and a distance between the centroid of mapped elements of the first plurality of cluster vectors and elements of the second plurality of cluster vectors. 4 . The computer-implemented method of claim 1 , wherein the ensemble distance between first data set and the second data set is calculated as an average of the dimensional distance for each dimension. 5 . The computer-implemented method of claim 1 , wherein the ensemble distance between first data set and the second data set is calculated as a weighted average of the dimensional distance for each dimension, where a weight applied to each dimensional distance is based on a cluster quality associated with each dimension. 6 . The computer-implemented method of claim 1 , further comprising removing cluster vectors from the first plurality of cluster vectors and the second plurality of cluster vectors having a cluster quality below a threshold value. 7 . The computer-implemented method of claim 1 , wherein the plurality of clustering models are K-means clustering models. 8 . A computer program product having one or more computer readable storage media having computer readable program code collectively stored on the one or more computer readable storage media, the computer readable program code being executed by a processor of a computer system to cause the computer system to perform operations comprising: obtaining a first data set and a second data set; inputting the first data set into each of a plurality of clustering models, wherein each of the plurality of clustering models separates the first data set into a different number of clusters; storing an output of each of the plurality of clustering models corresponding to the first data set into a first plurality of cluster vectors, where each of the first plurality of cluster vectors has a dimension that corresponds to the number of clusters; inputting the second data set into each of the plurality of clustering models, wherein each of the plurality of clustering models separates the second data set into a different number of clusters; storing the output of each of the plurality of clustering models corresponding to the second data set into a second plurality of cluster vectors, where each of the second plurality of cluster vectors has a dimension that corresponds to the number of clusters; identifying a mapping between elements of the first plurality of cluster vectors and elements the second plurality of cluster vectors having a same dimension; calculating, for each dimension based at least in part on the mapping, a dimensional distance between the first data set and the second data set; and calculating, based at least in part on the dimensional distances, an ensemble distance between first data set and the second data set. 9 . The computer program product of claim 8 , wherein each of the elements of the first plurality of cluster vectors and the elements of the second plurality of cluster vectors each include a data cluster and wherein the mapping is identified based on a centroid for each data cluster. 10 . The computer program product of claim 9 , wherein the dimensional distance between the first data set and the second data set for each dimension is calculated based on a size of the first data set, a size of the second data set, and a distance between the centroid of mapped elements of the first plurality of cluster vectors and elements of the second plurality of cluster vectors. 11 . The computer program product of claim 8 , wherein the ensemble distance between first data set and the second data set is calculated as an average of the dimensional distance for each dimension. 12 . The computer program product of claim 8 , wherein the ensemble distance between first data set and the second data set is calculated as a weighted average of the dimensional distance for each dimension, where a weight applied to each dimensional distance is based on a cluster quality associated with each dimension. 13 . The computer program product of claim 8 , wherein the operations further comprise removing cluster vectors from the first plurality of cluster vectors and the second plurality of cluster vectors having a cluster quality below a threshold value. 14 . The computer program product of claim 8 , wherein the plurality of clustering models are K-means clustering models. 15 . A computing system comprising: a processor; a memory coupled to the processor; and one or more computer readable storage media coupled to the processor, the one or more computer readable storage media collectively containing instructions that are executed by the processor via the memory to cause the processor to perform operations comprising: obtaining a first data set and a second data set; inputting the first data set into each of a plurality of clustering models, wherein each of the plurality of clustering models separates the first data set into a different number of clusters; storing an output of each of the plurality of clustering models corresponding to the first data set into a first plurality of cluster vectors, where each of the first plurality of cluster vectors has a dimension that corresponds to the number of clusters; inputting the second data set into each of the plurality of clustering models, wherein each of the plurality of clustering models separates the second data set into a different number of clusters; storing the output of each of the plurality of clustering models corresponding to the second data set into a second plurality of cluster vectors, where each of the second pl
using vector quantisation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.