Selectively generating word vector and paragraph vector representations of fields for machine learning
US-10459962-B1 · Oct 29, 2019 · US
US11423249B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11423249-B2 |
| Application number | US-201816208136-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 3, 2018 |
| Priority date | Dec 3, 2018 |
| Publication date | Aug 23, 2022 |
| Grant date | Aug 23, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A device that includes a model training engine implemented by a processor. The model training engine is configured to obtain a set of data values associated with a feature vector. The model training engine is further configured to generate a set of gradients by dividing separation distances by an average separation distance and to compare each gradient to a gradient threshold value. The model training engine is further configured to identify a boundary in response to determining a gradient exceeds the gradient threshold value, to determine a number of identified boundaries, and to determine a number of clusters based on the number of identified boundaries. The model training engine is further configured to train the machine learning model to associate the determined number of clusters with the feature vector.
Opening claim text (preview).
The invention claimed is: 1. A device, comprising: a memory to store a machine learning model configured to map a set of feature vector inputs to a plurality of clusters; and a model training engine implemented by a processor coupled to the memory, configured to: obtain a set of data values associated with a feature vector; sort the set of data values in an ascending order; determine a range value for the set of data values, wherein the range value is equal to a difference between a maximum data value and a minimum data value; determine an average separation distance by dividing the range value by the number of data values in the set of data values; determine separation distances between adjacent data values in the set of data values; generate a set of gradients by dividing the separation distances by the average separation distance; compare each gradient from the set of gradients to a gradient threshold value; identify a boundary in response to determining a gradient exceeds the gradient threshold value; determine a number of identified boundaries; determine a number of clusters based on the number of identified boundaries; train the machine learning model to associate the determined number of clusters with the feature vector; and assign the data values from the set of data values to the clusters. 2. The device of claim 1 , wherein the set of feature vector inputs comprises non-numerical values. 3. The device of claim 1 , wherein the set of feature vector inputs comprises text inputs. 4. The device of claim 1 , wherein the model training engine is configured to normalize the set of data values. 5. The device of claim 1 , wherein each cluster in the plurality of clusters corresponds with a different network attack. 6. The device of claim 1 , wherein the model training engine is further configured to compute centroids for the clusters. 7. A machine learning model training method, comprising: obtaining, by a model training engine implemented by a processor, a set of data values associated with a feature vector; sorting, by the model training engine, the set of data values in an ascending order; determining, by the model training engine, a range value for the set of data values, wherein the range value is equal to a difference between a maximum data value and a minimum data value; determining, by the model training engine, an average separation distance by dividing the range value by the number of data values in the set of data values; determining, by the model training engine, separation distances between adjacent data values in the set of data values; generating, by the model training engine, a set of gradients by dividing the separation distances by the average separation distance; comparing, by the model training engine, each gradient from the set of gradients to a gradient threshold value; identifying, by the model training engine, a boundary in response to determining a gradient exceeds the gradient threshold value; determining, by the model training engine, a number of identified boundaries; determining, by the model training engine, a number of clusters based on the number of identified boundaries; training, by the model training engine, a machine learning model to associate the determined number of clusters with the feature vector, wherein the machine learning model is configured to map a set of feature vector inputs to a plurality of clusters; and assigning, by the model training engine, the data values from the set of data values to the clusters. 8. The method of claim 7 , wherein the set of feature vector inputs comprises non-numerical values. 9. The method of claim 7 , wherein the set of feature vector inputs comprises text inputs. 10. The method of claim 7 , further comprising normalizing, by the model training engine, the set of data values. 11. The method of claim 7 , wherein each cluster in the plurality of clusters corresponds with a different network attack. 12. The method of claim 7 , further comprising computing, by the model training engine, centroids for the clusters. 13. A computer program comprising executable instructions stored in a non-transitory computer readable medium that when executed by a processor causes the processor to: obtain a set of data values associated with a feature vector; sort the set of data values in an ascending order; determine a range value for the set of data values, wherein the range value is equal to a difference between a maximum data value and a minimum data value; determine an average separation distance by dividing the range value by the number of data values in the set of data values; determine separation distances between adjacent data values in the set of data values; generate a set of gradients by dividing the separation distances by the average separation distance; compare each gradient from the set of gradients to a gradient threshold value; identify a boundary in response to determining a gradient exceeds the gradient threshold value; determine a number of identified boundaries; determine a number of clusters based on the number of identified boundaries; train the machine learning model to associate the determined number of clusters with the feature vector; and assign the data values from the set of data values to the clusters. 14. The computer program product of claim 13 , wherein the set of feature vector inputs comprises non-numerical values. 15. The computer program product of claim 13 , further comprising instructions that configure the processor to normalize the set of data values. 16. The computer program product of claim 13 , wherein each cluster in the plurality of clusters corresponds with a different network attack. 17. The computer program product of claim 13 , further comprising instructions that configure the processor to compute centroids for the clusters.
using clustering, e.g. of similar faces in social networks · CPC title
by analysing connectivity, e.g. edge linking, connected component analysis or slices · CPC title
Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title
using classification, e.g. of video objects · CPC title
Clustering techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.