Trait expansion techniques in binary matrix datasets

US11899693B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11899693-B2
Application numberUS-202217677323-A
CountryUS
Kind codeB2
Filing dateFeb 22, 2022
Priority dateFeb 22, 2022
Publication dateFeb 13, 2024
Grant dateFeb 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A cluster generation system identifies data elements, from a first binary record, that each have a particular value and correspond to respective binary traits. A candidate description function describing the binary traits is generated, the candidate description function including a model factor that describes the data elements. Responsive to determining that a second record has additional data elements having the particular value and corresponding to the respective binary traits, the candidate description function is modified to indicate that the model factor describes the additional elements. The candidate description function is also modified to include a correction factor describing an additional binary trait excluded from the respective binary traits. Based on the modified candidate description function, the cluster generation system generates a data summary cluster, which includes a compact representation of the binary traits of the data elements and additional data elements.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by one or more computing devices, the method comprising: accessing a set of binary records, wherein each record in the set of binary records includes multiple data elements corresponding to binary traits; identifying, in a first record from the set of binary records, a first group of data elements that each include a first value, wherein each data element in the first group of data elements corresponds to a respective binary trait; generating a candidate description function that describes the respective binary traits, wherein the candidate description function includes a model factor that describes the first group of data elements of the first record; responsive to determining that a second record has a second group of data elements corresponding to the respective binary traits, wherein each data element in the second group of data elements includes the first value, modifying the candidate description function to indicate that the model factor further describes the second group of data elements of the second record; responsive to determining that the second record has an additional data element corresponding to an additional binary trait that is excluded from the respective binary traits, wherein the additional data element includes the first value, modifying the candidate description function to include a correction factor that describes the additional data element of the second record; generating a data summary cluster based on the modified candidate description function, wherein the data summary cluster includes a compact representation of the respective binary traits corresponding to the first group of data elements and the second group of data elements; and providing the data summary cluster to a trait expansion query system that is configured for modifying the data summary cluster to identify an expansion trait associated with a subset of the set of binary records. 2. The method of claim 1 , wherein modifying the candidate description function includes modifying metadata associated with one or more of the model factor or the correction factor. 3. The method of claim 1 , further comprising: calculating a cost factor associated with modifying the candidate description function, wherein the cost factor indicates a change in a quantity of a combination of model factors and correction factors included in the candidate description function, wherein generating the data summary cluster is responsive to determining that the cost factor indicates a positive cost reduction. 4. The method of claim 1 , further comprising: calculating a similarity of the first record and the second record; and responsive to determining that the similarity of the first record and the second record exceeds a partitioning threshold, generating a partition of the set of binary records, wherein the partition includes the first record and the second record. 5. The method of claim 4 , wherein the similarity is calculated as a Jaccard similarity. 6. The method of claim 4 , wherein generating the partition is based on a locality sensitive hashing (“LSH”) of the first record and the second record. 7. The method of claim 4 , further comprising: generating a respective repartitioning key for each of the first record and the second record, wherein generating the partition is based on each respective repartitioning key of the first record and the second record having a same value. 8. The method of claim 1 , further comprising: identifying an additional record from the set of binary records, the additional record having a further additional data element corresponding to the additional binary trait that is excluded from the respective binary traits; modifying the candidate description function to include an additional correction factor that describes the further additional data element of the additional record; and subsequent to modifying the candidate description function to include the correction factor and the additional correction factor and responsive to determining that each of the additional data element and the further additional data element correspond to the additional binary trait: modifying the candidate description function to include an additional model factor that describes the additional data element and the further additional data element, and modifying the candidate description function to omit the correction factor and the additional correction factor. 9. A system comprising one or more processors and a memory having stored thereon instructions that, upon execution by the one or more processors, cause the one or more processors to perform one or more operations, the system further comprising: a datastore that includes a set of binary records, wherein each record in the set of binary records includes multiple data elements corresponding to binary traits; and a cluster generation component that is configured for: identifying, in a first record from the set of binary records, a first group of data elements that each include a first value, wherein each data element in the first group of data elements corresponds to a respective binary trait; generating a candidate description function that describes the respective binary traits, wherein the candidate description function includes a model factor that describes the first group of data elements of the first record; responsive to determining that a second record has a second group of data elements corresponding to the respective binary traits, wherein each data element in the second group of data elements includes the first value, modifying the candidate description function to indicate that the model factor further describes the second group of data elements of the second record; responsive to determining that the second record has an additional data element corresponding to an additional binary trait that is excluded from the respective binary traits, wherein the additional data element includes the first value, modifying the candidate description function to include a correction factor that describes the additional data element of the second record; generating a data summary cluster based on the modified candidate description function, wherein the data summary cluster includes a compact representation of the respective binary traits corresponding to the first group of data elements and the second group of data elements; and providing the data summary cluster to a trait expansion query system that is configured for modifying the data summary cluster to identify an expansion trait associated with a subset of the set of binary records. 10. The system of claim 9 , wherein modifying the candidate description function includes modifying metadata associated with one or more of the model factor or the correction factor. 11. The system of claim 9 , the cluster generation component further configured for: calculating a cost factor associated with modifying the candidate description function, wherein the cost factor indicates a change in a quantity of a combination of model factors and correction factors included in the candidate description function, wherein generating the data summary cluster is responsive to determining that the cost factor indicates a positive cost reduction. 12. The system of claim 9 , further comprising a partitioning component that is configured for: calculating a similarity of the first record and the second record; and responsive to determining that the similarity of the first record and the second record exceeds a partitioning threshold, generating a partition of the set of binary records, wherein the partition includes the first record and the second record.

Assignees

Inventors

Classifications

  • G06F16/285Primary

    Clustering or classification · CPC title

  • Aggregation; Duplicate elimination · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11899693B2 cover?
A cluster generation system identifies data elements, from a first binary record, that each have a particular value and correspond to respective binary traits. A candidate description function describing the binary traits is generated, the candidate description function including a model factor that describes the data elements. Responsive to determining that a second record has additional data …
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/285. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).