Computer architecture for identifying data clusters using correlithm objects and machine learning in a correlithm object processing system

US2020175321A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020175321-A1
Application numberUS-201816208055-A
CountryUS
Kind codeA1
Filing dateDec 3, 2018
Priority dateDec 3, 2018
Publication dateJun 4, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device that includes a model training engine implemented by a processor. The model training engine is configured to obtain a set of data values associated with a feature vector. The model training engine is further configured to transform a first data value and a second data value from the set of data value into sub-string correlithm objects. The model training engine is further configured to compute a Hamming distance between the first sub-string correlithm object and the second sub-string correlithm object and to identify a boundary in response to determining that the Hamming distance exceeds a bit difference threshold value. The model training engine is further configured to determine a number of identified boundaries, to determine a number of clusters based on the number of identified boundaries, and to train the machine learning model to associate the determined number of clusters with the feature vector.

First claim

Opening claim text (preview).

1 . A device, comprising: a memory operable to store a machine learning model configured to map a set of feature vector inputs to a plurality of clusters; and a model training engine implemented by a processor operably coupled to the memory, configured to: obtain a set of data values associated with a feature vector, wherein the set of data values comprises non-numerical values; transform a first data value from the set of data value into a first sub-string correlithm object from a string correlithm object, wherein: the string correlithm object comprises a plurality of sub-string correlithm objects; each sub-string correlithm object is represented by an n-bit digital word; and each sub-string correlithm object is adjacent in n-dimensional space to a preceding sub-string correlithm object and a subsequent sub-string correlithm object to form the string correlithm object; transform a second data value from the set of data value into a second sub-string correlithm object in the string correlithm object; compute a Hamming distance between the first sub-string correlithm object and the second sub-string correlithm object; compare the Hamming distance to a bit difference threshold value, wherein the bit different threshold value indicates a maximum number of different bits to be considered a part of the same cluster; identify a boundary between the first sub-string correlithm object and the second sub-string correlithm object in response to determining that the Hamming distance exceeds the bit difference threshold value; determine a number of identified boundaries; determine a number of clusters based on the number of identified boundaries; train the machine learning model to associate the determined number of clusters with the feature vector. 2 . The device of claim 1 , wherein the model training engine is further configured to: assign the first sub-string correlithm object to a first cluster; train the machine learning model with a mapping between the first sub-string correlithm object and the first cluster; assign the second sub-string correlithm object to a second cluster in response to determining that the Hamming distance exceeds the bit difference threshold value, wherein the second cluster is different than the first cluster; and train the machine learning model with a mapping between the second sub-string correlithm object and the second cluster. 3 . The device of claim 1 , wherein the model training engine is further configured to: assign the first sub-string correlithm object to a first cluster; train the machine learning model with a mapping between the first sub-string correlithm object and the first cluster; assign the second sub-string correlithm object to the first cluster in response to determining that the Hamming distance does not exceed the bit difference threshold value; and train the machine learning model with a mapping between the second sub-string correlithm object and the first cluster. 4 . The device of claim 1 , wherein computing the Hamming distance comprises: performing an XOR operation between the first sub-string correlithm object and the second sub-string correlithm object to generate a binary string; and counting the number of logical high values in the binary string. 5 . The device of claim 1 , wherein the model training engine is configured to normalize the set of data values. 6 . The device of claim 1 , wherein the set of data values comprises text. 7 . The device of claim 1 , wherein transforming the first data value into the first sub-string comprises: identifying a reference sub-string correlithm object linked with a reference data value; determining a difference between the reference data value and the first data value; determining a distance parameter in the n-dimensional space based on the difference; and modifying the reference sub-string correlithm object to generate the first sub-string correlithm object, wherein modifying the reference sub-string correlithm object comprises changing a number of bits equal to the distance parameter. 8 . A machine learning model training method, comprising: obtaining, by a model training engine implemented by a processor, a set of data values associated with a feature vector, wherein the set of data values comprises non-numerical values; transforming, by the model training engine, a first data value from the set of data value into a first sub-string correlithm object from a string correlithm object, wherein: the string correlithm object comprises a plurality of sub-string correlithm objects; each sub-string correlithm object is represented by an n-bit digital word; and each sub-string correlithm object is adjacent in n-dimensional space to a preceding sub-string correlithm object and a subsequent sub-string correlithm object to form the string correlithm object; transforming, by the model training engine, a second data value from the set of data value into a second sub-string correlithm object in the string correlithm object; computing, by the model training engine, a Hamming distance between the first sub-string correlithm object and the second sub-string correlithm object; comparing, by the model training engine, the Hamming distance to a bit difference threshold value, wherein the bit different threshold value indicates a maximum number of different bits to be considered a part of the same cluster; identifying, by the model training engine, a boundary between the first sub-string correlithm object and the second sub-string correlithm object in response to determining that the Hamming distance exceeds the bit difference threshold value; determining, by the model training engine, a number of identified boundaries; determining, by the model training engine, a number of clusters based on the number of identified boundaries; training, by the model training engine, a machine learning model to associate the determined number of clusters with the feature vector, wherein the machine learning model is configured to map a set of feature vector inputs to a plurality of clusters. 9 . The method of claim 8 , further comprising: assigning, by the model training engine, the first sub-string correlithm object to a first cluster; training, by the model training engine, the machine learning model with a mapping between the first sub-string correlithm object and the first cluster; assigning, by the model training engine, the second sub-string correlithm object to a second cluster in response to determining that the Hamming distance exceeds the bit difference threshold value, wherein the second cluster is different than the first cluster; and training, by the model training engine, the machine learning model with a mapping between the second sub-string correlithm object and the second cluster. 10 . The method of claim 8 , further comprising: assigning, by the model training engine, the first sub-string correlithm object to a first cluster; training, by the model training engine, the machine learning model with a mapping between the first sub-string correlithm object and the first cluster; assigning, by the model training engine, the second sub-string correlithm object to the first cluster in response to determining that the Hamming distance does not exceed the bit difference threshold value; and training, by the model training engine, the machine learning model with a mapping between the second sub-string correlithm object and the first cluster. 11 . The method of claim 8 , wherein computing the Hamming distance comprises: performing an XOR operation between the first sub-string correlithm object and the second sub-string correlithm object to generate a binary str

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020175321A1 cover?
A device that includes a model training engine implemented by a processor. The model training engine is configured to obtain a set of data values associated with a feature vector. The model training engine is further configured to transform a first data value and a second data value from the set of data value into sub-string correlithm objects. The model training engine is further configured to…
Who is the assignee on this patent?
Bank Of America
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 04 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).