Methods, apparatus and systems for efficient cross-layer network analytics
US-2022116265-A1 · Apr 14, 2022 · US
US11899693B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11899693-B2 |
| Application number | US-202217677323-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 22, 2022 |
| Priority date | Feb 22, 2022 |
| Publication date | Feb 13, 2024 |
| Grant date | Feb 13, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A cluster generation system identifies data elements, from a first binary record, that each have a particular value and correspond to respective binary traits. A candidate description function describing the binary traits is generated, the candidate description function including a model factor that describes the data elements. Responsive to determining that a second record has additional data elements having the particular value and corresponding to the respective binary traits, the candidate description function is modified to indicate that the model factor describes the additional elements. The candidate description function is also modified to include a correction factor describing an additional binary trait excluded from the respective binary traits. Based on the modified candidate description function, the cluster generation system generates a data summary cluster, which includes a compact representation of the binary traits of the data elements and additional data elements.
Opening claim text (preview).
What is claimed is: 1. A method performed by one or more computing devices, the method comprising: accessing a set of binary records, wherein each record in the set of binary records includes multiple data elements corresponding to binary traits; identifying, in a first record from the set of binary records, a first group of data elements that each include a first value, wherein each data element in the first group of data elements corresponds to a respective binary trait; generating a candidate description function that describes the respective binary traits, wherein the candidate description function includes a model factor that describes the first group of data elements of the first record; responsive to determining that a second record has a second group of data elements corresponding to the respective binary traits, wherein each data element in the second group of data elements includes the first value, modifying the candidate description function to indicate that the model factor further describes the second group of data elements of the second record; responsive to determining that the second record has an additional data element corresponding to an additional binary trait that is excluded from the respective binary traits, wherein the additional data element includes the first value, modifying the candidate description function to include a correction factor that describes the additional data element of the second record; generating a data summary cluster based on the modified candidate description function, wherein the data summary cluster includes a compact representation of the respective binary traits corresponding to the first group of data elements and the second group of data elements; and providing the data summary cluster to a trait expansion query system that is configured for modifying the data summary cluster to identify an expansion trait associated with a subset of the set of binary records. 2. The method of claim 1 , wherein modifying the candidate description function includes modifying metadata associated with one or more of the model factor or the correction factor. 3. The method of claim 1 , further comprising: calculating a cost factor associated with modifying the candidate description function, wherein the cost factor indicates a change in a quantity of a combination of model factors and correction factors included in the candidate description function, wherein generating the data summary cluster is responsive to determining that the cost factor indicates a positive cost reduction. 4. The method of claim 1 , further comprising: calculating a similarity of the first record and the second record; and responsive to determining that the similarity of the first record and the second record exceeds a partitioning threshold, generating a partition of the set of binary records, wherein the partition includes the first record and the second record. 5. The method of claim 4 , wherein the similarity is calculated as a Jaccard similarity. 6. The method of claim 4 , wherein generating the partition is based on a locality sensitive hashing (“LSH”) of the first record and the second record. 7. The method of claim 4 , further comprising: generating a respective repartitioning key for each of the first record and the second record, wherein generating the partition is based on each respective repartitioning key of the first record and the second record having a same value. 8. The method of claim 1 , further comprising: identifying an additional record from the set of binary records, the additional record having a further additional data element corresponding to the additional binary trait that is excluded from the respective binary traits; modifying the candidate description function to include an additional correction factor that describes the further additional data element of the additional record; and subsequent to modifying the candidate description function to include the correction factor and the additional correction factor and responsive to determining that each of the additional data element and the further additional data element correspond to the additional binary trait: modifying the candidate description function to include an additional model factor that describes the additional data element and the further additional data element, and modifying the candidate description function to omit the correction factor and the additional correction factor. 9. A system comprising one or more processors and a memory having stored thereon instructions that, upon execution by the one or more processors, cause the one or more processors to perform one or more operations, the system further comprising: a datastore that includes a set of binary records, wherein each record in the set of binary records includes multiple data elements corresponding to binary traits; and a cluster generation component that is configured for: identifying, in a first record from the set of binary records, a first group of data elements that each include a first value, wherein each data element in the first group of data elements corresponds to a respective binary trait; generating a candidate description function that describes the respective binary traits, wherein the candidate description function includes a model factor that describes the first group of data elements of the first record; responsive to determining that a second record has a second group of data elements corresponding to the respective binary traits, wherein each data element in the second group of data elements includes the first value, modifying the candidate description function to indicate that the model factor further describes the second group of data elements of the second record; responsive to determining that the second record has an additional data element corresponding to an additional binary trait that is excluded from the respective binary traits, wherein the additional data element includes the first value, modifying the candidate description function to include a correction factor that describes the additional data element of the second record; generating a data summary cluster based on the modified candidate description function, wherein the data summary cluster includes a compact representation of the respective binary traits corresponding to the first group of data elements and the second group of data elements; and providing the data summary cluster to a trait expansion query system that is configured for modifying the data summary cluster to identify an expansion trait associated with a subset of the set of binary records. 10. The system of claim 9 , wherein modifying the candidate description function includes modifying metadata associated with one or more of the model factor or the correction factor. 11. The system of claim 9 , the cluster generation component further configured for: calculating a cost factor associated with modifying the candidate description function, wherein the cost factor indicates a change in a quantity of a combination of model factors and correction factors included in the candidate description function, wherein generating the data summary cluster is responsive to determining that the cost factor indicates a positive cost reduction. 12. The system of claim 9 , further comprising a partitioning component that is configured for: calculating a similarity of the first record and the second record; and responsive to determining that the similarity of the first record and the second record exceeds a partitioning threshold, generating a partition of the set of binary records, wherein the partition includes the first record and the second record.
Clustering or classification · CPC title
Aggregation; Duplicate elimination · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.