Neural architecture for self supervised event learning and anomaly detection
US-2020410322-A1 · Dec 31, 2020 · US
US11669428B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11669428-B2 |
| Application number | US-202016878429-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 19, 2020 |
| Priority date | May 19, 2020 |
| Publication date | Jun 6, 2023 |
| Grant date | Jun 6, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed relating to detecting matching datasets using encode values. In various embodiments, a data monitoring system may perform encoding operations on a first dataset to generate a first encode value that corresponds to a particular one of one or more fields included in the first dataset. The data monitoring system may then determine whether the first dataset matches a previously analyzed dataset. For example, in some embodiments, data monitoring system may compare the first encode value to a previous encode value that corresponds to a second field of the previously analyzed dataset. Based on this comparison, the data monitoring system may generate an output value that is indicative of a similarity between the first encode value and the previous encode value. The data monitoring system may then determine whether the first dataset matches the previously analyzed dataset based on this output value.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: performing, by a data monitoring system, encoding operations on a first dataset to generate a set of encode values including first and second encode values generated using different types of encoding, wherein the first dataset includes first data organized into a first plurality of fields, the first data includes data records having data values for multiple fields within the first plurality of fields, and the first encode value corresponds to a particular field of the first plurality of fields; determining, by the data monitoring system, whether the first dataset matches a previously analyzed dataset, wherein the previously analyzed dataset includes second data organized into a second plurality of fields, the second data includes data records having data values for multiple fields within the second plurality of fields, and the determining includes: comparing the set of encode values to a previous set of encode values, wherein the previous set of encode values includes third and fourth encode values generated using different types of encoding and the third encode value corresponds to a second field, of the second plurality of fields, of the previously analyzed dataset; based on the comparing, generating an output value that is indicative of a similarity between the set of encode values and the previous set of encode values; and based on the output value, determining whether the first dataset matches the previously analyzed dataset. 2. The method of claim 1 , wherein, for the particular field, the performing the encoding operations includes: selecting a particular one of a plurality of encoder modules based on a data type associated with the particular field; and encoding data included in the particular field of the first dataset, using the particular encoder module, to generate the first encode value. 3. The method of claim 2 , wherein the second field of the previously analyzed dataset has a same data type as the data type associated with the particular field of the first dataset; and wherein the determining whether the first dataset matches the previously analyzed dataset further includes: retrieving the third encode value corresponding to the second field of the previously analyzed dataset, wherein the third encode value was generated by encoding data included in the second field of the previously analyzed dataset using the particular encoder module. 4. The method of claim 2 , wherein the particular field of the first dataset includes string-type data; a type of encoding used to generate the first encode value includes semantic encoding; and the encoding data included in the particular field of the first dataset includes: generating one or more vector word-embedding representations of the string-type data included in the particular field of the first dataset. 5. The method of claim 2 , wherein the particular field of the first dataset includes string-type data; a type of encoding used to generate the first encode value includes value-format encoding; and the encoding data included in the particular field of the first dataset includes: generating a first regular expression based on the string-type data included in the particular field of the first dataset. 6. The method of claim 5 , wherein the third encode value is a second regular expression generated based on string-type data included in the second field of the previously analyzed dataset; and wherein the comparing the set of encode values to the previous set of encode values includes comparing the first and second regular expressions. 7. The method of claim 2 , wherein the particular field of the first dataset includes numerical data; a type of encoding used to generate the first encode value includes numerical distribution encoding; and the encoding data included in the particular field of the first dataset includes: calculating a first latent probability distribution corresponding to the numerical data in the particular field of the first dataset. 8. The method of claim 7 , wherein the third encode value is a second latent probability distribution corresponding to numerical data included in the second field of the previously analyzed dataset; and wherein the comparing the set of encode values to the previous set of encode values includes comparing the first and second latent probability distributions. 9. The method of claim 1 , further comprising: prior to determining whether the first dataset matches the previously analyzed dataset, comparing, by the data monitoring system, properties of a first schema associated with the first dataset to properties of a second schema associated with the previously analyzed dataset, wherein properties of the first schema include characteristics of the data records or the first plurality of fields; and in response to a determination that the first schema does not match the second schema, determining that the first dataset does not match the previously analyzed dataset. 10. The method of claim 1 , wherein the output value is specified using Kullback-Leibler divergence. 11. A non-transitory, computer-readable medium having instructions stored thereon that are executable by a data monitoring system to perform operations comprising: performing encoding operations on a first dataset to generate a set of encode values including first and second encode values generated using different types of encoding, wherein the first dataset includes first data organized into a first plurality of fields, the first data includes data records having data values for multiple fields within the first plurality of fields, and the first encode value corresponds to a particular field of the first plurality of fields; determining whether the first dataset matches a previously analyzed dataset, wherein the previously analyzed dataset includes second data organized into a second plurality of fields, the second data includes data records having data values for multiple fields within the second plurality of fields, and the determining includes: comparing the set of encode values to a previous set of encode values, wherein the previous set of encode values includes third and fourth encode values generated using different types of encoding and the third encode value corresponds to a second field, of the second plurality of fields, of the previously analyzed dataset; based on the comparing, generating an output value that is indicative of a similarity between the set of encode values and the previous set of encode values; and based on the output value, determining whether the first dataset matches the previously analyzed dataset. 12. The non-transitory, computer-readable medium of claim 11 , wherein, for the particular field, the performing the encoding operations includes: selecting a particular one of a plurality of encoder modules based on a data type associated with the particular field; and encoding data included in the particular field of the first dataset, using the particular encoder module, to generate the first encode value. 13. The non-transitory, computer-readable medium of claim 12 , wherein the second field has a same data type as the data type associated with the particular field of the first dataset; and wherein the determining whether the first dataset matches the previously analyzed dataset further includes: retrieving the third encode value corresponding to the second field of the previously analyzed dataset, wherein the third encode value was generated by encoding data included in the second field of the previously analyzed dataset using the particular encoder module.
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
using adaptive string matching, e.g. the Lempel-Ziv method · CPC title
Encoder aspects · CPC title
Vector coding (for television signals, see H04N19/94) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.