Kernel parameter selection in support vector data description for outlier identification
US-2017236074-A1 · Aug 17, 2017 · US
US9830558B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9830558-B1 |
| Application number | US-201615185277-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jun 17, 2016 |
| Priority date | May 3, 2016 |
| Publication date | Nov 28, 2017 |
| Grant date | Nov 28, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computing device determines an SVDD to identify an outlier in a dataset. First and second sets of observation vectors of a predefined sample size are randomly selected from a training dataset. First and second optimal values are computed using the first and second observation vectors to define a first set of support vectors and a second set of support vectors. A third optimal value is computed using the first set of support vectors updated to include the second set of support vectors to define a third set of support vectors. Whether or not a stop condition is satisfied is determined by comparing a computed value to a stop criterion. When the stop condition is not satisfied, the first set of support vectors is defined as the third set of support vectors, and operations are repeated until the stop condition is satisfied. The third set of support vectors is output.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer-readable medium having stored thereon computer-readable instructions that when executed by a computing device cause the computing device to: randomly select a first set of observation vectors from a training dataset, wherein a number of the first set of observation vectors is a predefined sample size; compute a first optimal value of an objective function defined for a support vector data description (SVDD) model using the selected first set of observation vectors to define a first set of support vectors, wherein the first set of support vectors define a first data description for the training dataset; (a) randomly select a second set of observation vectors from the training dataset, wherein a number of the second set of observation vectors is the predefined sample size; (b) compute a second optimal value of the objective function using the selected second set of observation vectors to define a second set of support vectors, wherein the second set of support vectors define a second data description for the training dataset; (c) update the first set of support vectors to include the defined second set of support vectors; (d) compute a third optimal value of the objective function using the updated first set of support vectors to define a third set of support vectors, wherein the third set of support vectors define a third data description for the training dataset; (e) compute a value of a stop parameter; (f) determine whether or not a stop condition is satisfied by comparing the computed value to a stop criterion; (g) when the stop condition is not satisfied, define the first set of support vectors as the defined third set of support vectors; and repeat (a)-(g) until the stop condition is satisfied; and when the stop condition is satisfied, output the defined third set of support vectors; compute a threshold using the defined third set of support vectors; read an observation vector from a scoring dataset; compute a distance value using the output third set of support vectors and the read observation vector; when the computed distance value is greater than the computed threshold, output an abnormal indicator indicating that the read observation vector is abnormal relative to the output third set of support vectors; and when the computed distance value is not greater than the computed threshold, output a normal indicator indicating that the read observation vector is normal relative to the output third set of support vectors. 2. The non-transitory computer-readable medium of claim 1 , wherein the objective function defined for the SVDD model is max(Σ i=1 n α i K (x i , x i )−Σ i=1 n Σ j=1 n α i α j K(x i , x i )), subject to Σ i=1 n α i =1 and 0≦α i ≦C, ∀i=1, . . . , n, where K(x i , x j ) is a kernel function, n is the predefined sample size, C=1/nf where f is an expected outlier fraction, x i and x j are the selected observation vectors for each computation, and α i are Lagrange constants. 3. The non-transitory computer-readable medium of claim 2 , wherein the expected outlier fraction is a predefined input value. 4. The non-transitory computer-readable medium of claim 2 , wherein the x i that have 0<α i ≦C are the defined set of support vectors for each computation. 5. The non-transitory computer-readable medium of claim 4 , wherein, when the stop condition is satisfied, the computer-readable instructions further cause the computing device to output the Lagrange constants α k for each of the defined third set of support vectors. 6. The non-transitory computer-readable medium of claim 2 , wherein the kernel function is a Gaussian kernel function. 7. The non-transitory computer-readable medium of claim 1 , wherein the threshold is computed using R 2 =K(x k , x k )−2Σ i=1 N α i K (x i , x k )+Σ i=1 N Σ j=1 N α i α j K(x i , x j ), where x k is any support vector of the output third set of support vectors, x i and x j are the defined third set of support vectors, K(x k , x k ) is a kernel function of the associated support vectors, α i and α j are Lagrange constants of the associated support vector, and N is a number of support vectors included in the defined third set of support vectors. 8. The non-transitory computer-readable medium of claim 7 , wherein, when the stop condition is satisfied, the computer-readable instructions further cause the computing device to output the computed threshold for identifying the outlier. 9. The non-transitory computer-readable medium of claim 1 , wherein the distance value is computed using dist 2 (z)=K(z,z)−2Σ i=1 N α i K(x i , z)+Σ i=1 N Σ j=1 N α i α j K(x i , x j ), where z is the read observation vector, x i and x j are the output third set of support vectors, K(z,z) is a kernel function of the associated vectors, α i and α j are Lagrange constants of the associated support vector, and N is a number of support vectors included in the output third set of support vectors. 10. The non-transitory computer-readable medium of claim 1 , wherein each observation vector includes a plurality of values, wherein each value of the plurality of values is associated with a variable to define a plurality of variables, wherein each variable of the plurality of variables describes a characteristic of a physical object generated or captured by a device. 11. The non-transitory computer-readable medium of claim 10 , wherein the predefined sample size is greater than a number of the plurality of variables. 12. The non-transitory computer-readable medium of claim 1 , wherein, after (b) and before (c), the computer-readable instructions further cause the computing device to: initialize a set of iteration support vectors as the defined second set of support vectors; and a predefined number of times, randomly select a fourth set of observation vectors from the training dataset, wherein a number of the fourth set of observation vectors is the predefined sample size; compute a fourth optimal value of the objective function using the selected fourth set of observation vectors to define a fourth set of support vectors, wherein the fourth set of support vectors define a fourth data description for the training dataset; and update the set of iteration support vectors to include the defined fourth set of support vectors; wherein the updated set of iteration support vectors replace the defined second set of support vectors in (c). 13. The non-transitory computer-readable medium of claim 1 , wherein the computed value is a number of iterations of (d), and the stop criterion is a predefined maximum number of iterations, wherein the determination is that the stop condition is satisfied when the computed value is greater than or equal to the predefined maximum number of iterations. 14. The non-transitory computer-readable medium of claim 2 , wherein the computed value is computed using c p =∥α j −α j-1 ∥/∥α j-1 ∥, where α j =Σ i=1 N α i x i where x i are the defined support vectors for each computation, α i is the Lagrange constant of the associated support vector, and N is a number of support vectors included in the defined set of support vectors for each computation, and α j-1 =Σ i=1 N p α ip x ip where x ip are the defined support vectors for a previous computation, α ip is the Lagrange constant of the associated previously computed support vector, and N p is a number of support vectors included in the defined set of support vectors for the previous computation, and the stop criterion is a predefined center tolerance value. 15. The non-transitory computer-readable medium of claim 14 , wh
Physics · mapped topic
Physics · mapped topic
based on web technology, e.g. hypertext transfer protocol [HTTP] · CPC title
using kernel methods, e.g. support vector machines [SVM] · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.