Learning device and learning discrimination system
US-2018039822-A1 · Feb 8, 2018 · US
US10417530B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10417530-B2 |
| Application number | US-201715720372-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 29, 2017 |
| Priority date | Sep 30, 2016 |
| Publication date | Sep 17, 2019 |
| Grant date | Sep 17, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Centroids are used for improving machine learning classification and information retrieval. A plurality of files are classified as malicious or not malicious based on a function dividing a coordinate space into at least a first portion and a second portion such that the first portion includes a first subset of the plurality of files classified as malicious. One or more first geometric regions are defined in the first portion that classify files from the first subset as not malicious. A file is determined to be malicious based on whether the file is located within the one or more first geometric regions.
Opening claim text (preview).
The invention claimed is: 1. A system, comprising: at least one processor; and at least one memory including program code which when executed by the at least one memory provides operations comprising: classifying at least a portion of a plurality of files as malicious based on a function dividing a coordinate space into at least a first portion and a second portion, wherein the first portion includes a first subset of the plurality of files classified as malicious; defining one or more first geometric regions in the first portion that classify files from the first subset as not malicious; identifying a plurality of clusters from the plurality of files; determining whether any of the plurality of clusters do not include known malicious files; defining individual geometric regions around at least one of the plurality of clusters which do not include known malicious files, wherein the one or more first geometric regions include the individual geometric regions; determining whether a file is malicious based on whether the file is located within the one or more first geometric regions; and preventing files determined to be malicious from such files from executing, opening, continuing to execute, writing, or being downloaded. 2. A system as in claim 1 , wherein the second portion includes a second subset of the plurality of files classified as not malicious, and wherein the operations further comprise: defining one or more second geometric regions in the second portion that classify files from the second subset as malicious, wherein determining whether the file is malicious further comprises determining whether the file is located within a region of the second portion that does not include the one or more second geometric regions. 3. A system as in claim 1 , wherein the operations further comprise: determining a plurality of attributes of the plurality of files; and mapping the plurality of files in a positive portion of the coordinate space defined by an intersection of at least two of the plurality of attributes. 4. A system as in claim 1 , wherein the operations further comprise: determining whether any of the individual geometric regions include a radius greater than a threshold value; reducing the radius of the individual geometric regions which are greater than the threshold value such that the radius is less than or equal to the threshold value; and re-defining, after the reducing, the individual geometric regions which no longer include all files from a respective cluster of the plurality of clusters, wherein the re-defining includes defining multiple smaller geometric regions in place of the individual geometric regions. 5. A system as in claim 1 , wherein the one or more first geometric regions include a circular geometry having a center point and a radius, and wherein the file is determined to be located within the one or more first geometric regions when a distance between the center point and a location of the file is less than or equal to the radius. 6. A system as in claim 5 , wherein the center point is determined based on averaging locations for each of the plurality of files located within the one or more first geometric regions. 7. A system as in claim 5 , wherein the center point is determined based on shared attributes for each of the plurality of files located within the one or more first geometric regions. 8. A system as in claim 5 , wherein the radius is determined based on a maximum Euclidian distance between each of the plurality of files located within the one or more first geometric regions. 9. A system as in claim 1 , wherein the classifying employs at least one machine learning model. 10. A system as in claim 1 , wherein the classifying employs at least one of: a neural networks, a support vector machine, a logistic regression model, a Bayesian algorithm, or a decision tree. 11. A computer-implemented method, comprising: classifying at least a portion of a plurality of files as malicious based on a function dividing a coordinate space into at least a first portion and a second portion, wherein the first portion includes a first subset of the plurality of files classified as malicious; defining one or more first geometric regions in the first portion that classify files from the first subset as not malicious; identifying a plurality of clusters from the plurality of files; determining whether any of the plurality of clusters do not include known malicious files; defining individual geometric regions around at least one of the plurality of clusters which do not include known malicious files, wherein the one or more first geometric regions include the individual geometric regions; determining whether a file is malicious based on whether the file is located within the one or more first geometric regions; and preventing files determined to be malicious from such files from executing, opening, continuing to execute, writing, or being downloaded. 12. A computer-implemented method as in claim 11 , wherein the second portion includes a second subset of the plurality of files classified as not malicious, wherein the method further comprises: defining one or more second geometric regions in the second portion that classify files from the second subset as malicious, and wherein determining whether the file is malicious further comprises determining whether the file is located within a region of the second portion that does not include the one or more second geometric regions. 13. A computer-implemented method as in claim 11 , further comprising: determining a plurality of attributes of the plurality of files; and mapping the plurality of files in a positive portion of the coordinate space defined by an intersection of at least two of the plurality of attributes. 14. A computer-implemented method as in claim 11 , further comprising: determining whether any of the individual geometric regions include a radius greater than a threshold value; reducing the radius of the individual geometric regions which are greater than the threshold value such that the radius is less than or equal to the threshold value; and re-defining, after the reducing, the individual geometric regions which no longer include all files from a respective cluster of the plurality of clusters, wherein the re-defining includes defining multiple smaller geometric regions in place of the individual geometric regions. 15. A computer-implemented method as in claim 11 , wherein the one or more first geometric regions include a circular geometry having a center point and a radius, and wherein the file is determined to be located within the one or more first geometric regions when a distance between the center point and a location of the file is less than or equal to the radius. 16. A computer-implemented method as in claim 15 , wherein the center point is determined based on averaging locations for each of the plurality of files located within the one or more first geometric regions. 17. A computer-implemented method as in claim 15 , wherein the center point is determined based on shared attributes for each of the plurality of files located within the one or more first geometric regions. 18. A computer-implemented method as in claim 15 , wherein the radius is determined based on a maximum Euclidian distance between each of the plurality of files located within the one or more first geometric regions. 19. A method as in claim 11 , wherein the classifying employs at least one machine learning model. 20. A system
Combinations of networks · CPC title
Learning methods · CPC title
Distances to cluster centroïds · CPC title
File meta data generation · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.