Computer architecture for identifying data clusters using unsupervised machine learning in a correlithm object processing system

US2020175320A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020175320-A1
Application numberUS-201816208136-A
CountryUS
Kind codeA1
Filing dateDec 3, 2018
Priority dateDec 3, 2018
Publication dateJun 4, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device that includes a model training engine implemented by a processor. The model training engine is configured to obtain a set of data values associated with a feature vector. The model training engine is further configured to generate a set of gradients by dividing separation distances by an average separation distance and to compare each gradient to a gradient threshold value. The model training engine is further configured to identify a boundary in response to determining a gradient exceeds the gradient threshold value, to determine a number of identified boundaries, and to determine a number of clusters based on the number of identified boundaries. The model training engine is further configured to train the machine learning model to associate the determined number of clusters with the feature vector.

First claim

Opening claim text (preview).

1 . A device, comprising: a memory operable to store a machine learning model configured to map a set of feature vector inputs to a plurality of clusters; and a model training engine implemented by a processor operably coupled to the memory, configured to: obtain a set of data values associated with a feature vector; sort the set of data values in an ascending order; determine a range value for the set of data values, wherein the range value is equal to a difference between a maximum data value and a minimum data value; determine an average separation distance by dividing the range value by the number of data values in the set of data values; determine separation distances between adjacent data values in the set of data values; generate a set of gradients by dividing the separation distances by the average separation distance; compare each gradient from the set of gradients to a gradient threshold value; identify a boundary in response to determining a gradient exceeds the gradient threshold value; determine a number of identified boundaries; determine a number of clusters based on the number of identified boundaries; and train the machine learning model to associate the determined number of clusters with the feature vector. 2 . The device of claim 1 , wherein the set of feature vector inputs comprises non-numerical values. 3 . The device of claim 1 , wherein the set of feature vector inputs comprises text inputs. 4 . The device of claim 1 , wherein the model training engine is configured to normalize the set of data values. 5 . The device of claim 1 , wherein each cluster in the plurality of clusters corresponds with a different network attack. 6 . The device of claim 1 , wherein the model training engine is further configured to assign the assign data values from the set of data values to the clusters. 7 . The device of claim 6 , wherein the model training engine is further configured to compute centroids for the clusters. 8 . A machine learning model training method, comprising: obtaining, by a model training engine implemented by a processor, a set of data values associated with a feature vector; sorting, by the model training engine, the set of data values in an ascending order; determining, by the model training engine, a range value for the set of data values, wherein the range value is equal to a difference between a maximum data value and a minimum data value; determining, by the model training engine, an average separation distance by dividing the range value by the number of data values in the set of data values; determining, by the model training engine, separation distances between adjacent data values in the set of data values; generating, by the model training engine, a set of gradients by dividing the separation distances by the average separation distance; comparing, by the model training engine, each gradient from the set of gradients to a gradient threshold value; identifying, by the model training engine, a boundary in response to determining a gradient exceeds the gradient threshold value; determining, by the model training engine, a number of identified boundaries; determining, by the model training engine, a number of clusters based on the number of identified boundaries; and training, by the model training engine, a machine learning model to associate the determined number of clusters with the feature vector, wherein the machine learning model is configured to map a set of feature vector inputs to a plurality of clusters. 9 . The method of claim 8 , wherein the set of feature vector inputs comprises non-numerical values. 10 . The method of claim 8 , wherein the set of feature vector inputs comprises text inputs. 11 . The method of claim 8 , further comprising normalizing, by the model training engine, the set of data values. 12 . The method of claim 8 , wherein each cluster in the plurality of clusters corresponds with a different network attack. 13 . The method of claim 8 , further comprising assigning, by the model training engine, the assign data values from the set of data values to the clusters. 14 . The method of claim 13 , further comprising computing, by the model training engine, centroids for the clusters. 15 . A computer program comprising executable instructions stored in a non-transitory computer readable medium that when executed by a processor causes the processor to: obtain a set of data values associated with a feature vector; sort the set of data values in an ascending order; determine a range value for the set of data values, wherein the range value is equal to a difference between a maximum data value and a minimum data value; determine an average separation distance by dividing the range value by the number of data values in the set of data values; determine separation distances between adjacent data values in the set of data values; generate a set of gradients by dividing the separation distances by the average separation distance; compare each gradient from the set of gradients to a gradient threshold value; identify a boundary in response to determining a gradient exceeds the gradient threshold value; determine a number of identified boundaries; determine a number of clusters based on the number of identified boundaries; and train the machine learning model to associate the determined number of clusters with the feature vector. 16 . The computer program product of claim 15 , wherein the set of feature vector inputs comprises non-numerical values. 17 . The computer program product of claim 15 , further comprising instructions that configure the processor to normalize the set of data values. 18 . The computer program product of claim 15 , wherein each cluster in the plurality of clusters corresponds with a different network attack. 19 . The computer program product of claim 15 , further comprising instructions that configure the processor to assign the assign data values from the set of data values to the clusters. 20 . The computer program product of claim 19 , further comprising instructions that configure the processor to compute centroids for the clusters.

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

  • Traffic logging, e.g. anomaly detection · CPC title

  • G06K9/6218Primary

    Physics · mapped topic

  • Sorting, i.e. extracting data from one or more carriers, rearranging the data in numerical or other ordered sequence, and rerecording the sorted data on the original carrier or on a different carrier or set of carriers {sorting methods in general}(G06F7/36 takes precedence) · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020175320A1 cover?
A device that includes a model training engine implemented by a processor. The model training engine is configured to obtain a set of data values associated with a feature vector. The model training engine is further configured to generate a set of gradients by dividing separation distances by an average separation distance and to compare each gradient to a gradient threshold value. The model t…
Who is the assignee on this patent?
Bank Of America
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 04 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).