Change-point driven feature selection for multi-variate time series clustering

US2021365498A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021365498-A1
Application numberUS-202016877981-A
CountryUS
Kind codeA1
Filing dateMay 19, 2020
Priority dateMay 19, 2020
Publication dateNov 25, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides a method, including: receiving a multi-variate time-series dataset comprising a plurality of time-dependent datasets; for each of the plurality of time-dependent datasets, segmenting each of the plurality of time-dependent datasets at a transition point; clustering segments of the plurality of time-dependent datasets into clusters having similar lengths of segments; for each cluster (i) selecting a representative segment length and (ii) identifying a feature subset in that cluster; identifying, across the feature subsets, subset transition points, wherein each of the subset transition points corresponds to a change in value that meets a predetermined threshold within its corresponding feature subset; and determining, by applying a threshold test to the subset transition points, a segment length to be used in segmenting the entire multi-variate time-series dataset.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: receiving a multi-variate time-series dataset comprising a plurality of time-dependent datasets; for each of the plurality of time-dependent datasets, segmenting that time-dependent dataset at a transition point, wherein each of the transition points corresponds to a change in value that meets a predetermined threshold and occurs over a period of time; clustering segments of the plurality of time-dependent datasets into clusters having similar lengths of segments; for each cluster (i) selecting a representative segment length and (ii) identifying a feature subset in that cluster, wherein a feature subset comprises features from the time-dependent datasets that can be represented by the representative segment for the given cluster; identifying, across the feature subsets, subset transition points, wherein each of the subset transition points corresponds to a change in value that meets a predetermined threshold within its corresponding feature subset; and determining, by applying a threshold test to the subset transition points, a segment length to be used in segmenting the entire multi-variate time-series dataset. 2 . The method of claim 1 , wherein the segmenting comprises an iterative segmenting process that results in different numbers of segments across each iteration of the segmenting via modifying the predetermined threshold for each iteration. 3 . The method of claim 2 , comprising selecting a time-dependent dataset segment length by (i) forming a graph of the different numbers of segments produced via the iterative segmenting process and (ii) identifying a knee point within the graph, wherein the knee point of the graph corresponds to a segment length and is selected as the time-dependent dataset segment length, the knee point comprising a local maximum of the graph. 4 . The method of claim 1 , wherein the threshold test comprises a lower threshold boundary and an upper threshold boundary. 5 . The method of claim 4 , wherein the determining comprises (i) identifying that a number of the subset transition points are below the lower threshold boundary and (ii) augmenting the subset transition points with an additional segmentation of the multi-variate time-dependent datasets utilizing the representative segment length. 6 . The method of claim 4 , wherein the determining comprises (i) identifying that a number of the subset transition points are above the upper threshold boundary and (ii) selecting the representative segment length as the segment length. 7 . The method of claim 4 , wherein the determining comprises (i) identifying that a number of the subset transition points are within the lower threshold boundary and the upper threshold boundary and (ii) selecting the subset transition points as the segment change points. 8 . The method of claim 1 , wherein identifying a feature subset comprises mapping a given segment within a cluster to the time-dependent dataset that the given segment occurs within. 9 . The method of claim 1 , wherein the selecting a representative segment length for a given cluster comprises averaging the segment lengths within the given cluster. 10 . The method of claim 1 , wherein the identifying subset transition points comprises identifying a change in value within the feature subset that at least meets a predetermined threshold. 11 . An apparatus, comprising: at least one processor; and a computer readable storage medium having computer readable program code embodied therewith and executable by the at least one processor, the computer readable program code comprising: computer readable program code configured to receive a multi-variate time-series dataset comprising a plurality of time-dependent datasets; computer readable program code configured to, for each of the plurality of time-dependent datasets, segment that time-dependent dataset at a transition point, wherein each of the transition points corresponds to a change in value that meets a predetermined threshold and occurs over a period of time; computer readable program code configured to cluster segments of the plurality of time-dependent datasets into clusters having similar lengths of segments; computer readable program code configured to, for each cluster, (i) select a representative segment length and (ii) identify a feature subset, wherein a feature subset comprises features from the time-dependent datasets that can be represented by the representative segment for the given cluster; computer readable program code configured to identify, across the feature subsets, subset transition points, wherein each of the subset transition points corresponds to a change in value that meets a predetermined threshold within its corresponding feature subset; and computer readable program code configured to determine, by applying a threshold test to the subset transition points, a segment length to be used in segmenting the entire multi-variate time-series dataset. 12 . A computer program product, comprising: a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code executable by a processor and comprising: computer readable program code configured to receive a multi-variate time-series dataset comprising a plurality of time-dependent datasets; computer readable program code configured to, for each of the plurality of time-dependent datasets, segment that time-dependent dataset at a transition point, wherein each of the transition points corresponds to a change in value that meets a predetermined threshold and occurs over a period of time; computer readable program code configured to cluster segments of the plurality of time-dependent datasets into clusters having similar lengths of segments; computer readable program code configured to, for each cluster, (i) select a representative segment length and (ii) identify a feature subset, wherein a feature subset comprises features from the time-dependent datasets that can be represented by the representative segment for the given cluster; computer readable program code configured to identify, across the feature subsets, subset transition points, wherein each of the subset transition points corresponds to a change in value that meets a predetermined threshold within its corresponding feature subset; and computer readable program code configured to determine, by applying a threshold test to the subset transition points, a segment length to be used in segmenting the entire multi-variate time-series dataset. 13 . The computer program product of claim 12 , wherein the segmenting comprises an iterative segmenting process that results in different numbers of segments across each iteration of the segmenting via modifying the predetermined threshold for each iteration. 14 . The computer program product of claim 13 , comprising selecting a time-dependent dataset segment length by (i) forming a graph of the different numbers of segments produced via the iterative segmenting process and (ii) identifying a knee point within the graph, wherein the knee point of the graph corresponds to a segment length and is selected as the time-dependent dataset segment length, the knee point comprising a local maximum of the graph. 15 . The computer program product of claim 12 , wherein the determining comprises (i) identifying that a number of the subset transition points are below a lower threshold boundary of the threshold test and (ii) augmenting the subset transition points with an additional segmentation of the multi-variate time-dependent datasets utilizing th

Assignees

Inventors

Classifications

  • based on graph theory, e.g. minimum spanning trees [MST] or graph cuts · CPC title

  • Matching criteria, e.g. proximity measures · CPC title

  • by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation · CPC title

  • Clustering or classification · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021365498A1 cover?
One embodiment provides a method, including: receiving a multi-variate time-series dataset comprising a plurality of time-dependent datasets; for each of the plurality of time-dependent datasets, segmenting each of the plurality of time-dependent datasets at a transition point; clustering segments of the plurality of time-dependent datasets into clusters having similar lengths of segments; for …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/9024. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 25 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).