Flexible tables
US-9031976-B2 · May 12, 2015 · US
US9600503B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9600503-B2 |
| Application number | US-201313951435-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 25, 2013 |
| Priority date | Jul 25, 2013 |
| Publication date | Mar 21, 2017 |
| Grant date | Mar 21, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques provided herein allow for management of data. In various embodiments, systems and methods prune and retain data being managed by a data management system, where the managed data can include log data aggregated from one or more servers for analysis purposes. According to some embodiments, pruning can be triggered according to one or more constraints, such as the age of managed data (e.g., retain only 30 days of managed data) or the memory space required to store the managed data (e.g., retain only 100 GB worth of managed data). The constraints that trigger data pruning can be based on a data retention policy. When triggered, pruning can be performed on a fraction of the managed data stored based on the data retention policy (e.g., 3 days of full managed data, 27 days of pruned managed data). The pruning may be performed by sampling, at a desired rate, the managed data.
Opening claim text (preview).
What is claimed is: 1. A computer system comprising: at least one processor; and a memory storing instructions configured to instruct the at least one processor to perform: detecting when a constraint for storing a data set has been exceeded; identifying, based on the constraint, an initial data subset from the data set for each of a plurality of time periods, from which at least some data elements will be removed by sampling; determining a sampling rate for data element retention; identifying a secondary data subset from the initial data subset for each of the plurality of time periods, based on sampling the initial data subset according to the sampling rate, the sampling rate applied to the initial data subset for each of the plurality of time periods; and removing from the data set one or more data elements of the initial data subset for each of the plurality of time periods while retaining data elements of the secondary data subset for each of the plurality of time periods, wherein the sampling rate is uniform, and wherein the sampling rate is determined such that a representative portion of the data set is retained when the one or more data elements of the initial data subset for each of the plurality of time periods are removed from the data set. 2. The computer system of claim 1 , wherein the data set comprises log data. 3. The computer system of claim 2 , wherein the log data is associated with operation of a social networking system. 4. The computer system of claim 3 , wherein the log data comprises one or more time-stamped data elements regarding user activity occurring on the social networking system. 5. The computer system of claim 1 , wherein the constraint relates to age of data elements in the data set. 6. The computer system of claim 1 , wherein the constraint relates to storage space occupied by data elements in the data set. 7. The computer system of claim 1 , wherein the constraint is based on a data retention policy. 8. The computer system of claim 1 , wherein the data set comprises data sampled from a larger data set. 9. The computer system of claim 1 , wherein the initial data subset for each of the plurality of time periods is identified according to a data retention policy. 10. The computer system of claim 9 , wherein the data retention policy prohibits removal of data elements from the data set that have been maintained for less than a threshold period of time. 11. The computer system of claim 1 , wherein the sampling rate is defined by a ratio of data elements. 12. The computer system of claim 1 , wherein the sampling rate is determined based on a type of data element included in the data set. 13. The computer system of claim 12 , wherein the data set comprises event log data and the type of data element is based on an event type. 14. The computer system of claim 1 , wherein the data set is a database table. 15. The computer system of claim 14 , wherein the sampling rate is determined based on a table type associated with the database table. 16. The computer system of claim 1 , wherein the instructions are further configured to instruct the at least one processor to perform: designating data of the secondary data subset as being data retained during a data removal process. 17. The computer system of claim 1 , wherein the instructions are further configured to instruct the at least one processor to perform: associating the sampling rate with data of the secondary data subset. 18. The computer system of claim 1 , wherein the data set is being stored in an in-memory database. 19. A non-transitory computer-storage medium storing computer-executable instructions that, when executed, cause a computer system to perform a computer-implemented method comprising: detecting when a constraint for storing a data set has been exceeded; identifying, based on the constraint, an initial data subset from the data set for each of a plurality of time periods, from which at least some data elements will be removed by sampling; determining a sampling rate for data element retention; identifying a secondary data subset from the initial data subset for each of the plurality of time periods, based on sampling the initial data subset according to the sampling rate, the sampling rate applied to the initial data subset for each of the plurality of time periods; and removing from the data set one or more data elements of the initial data subset for each of the plurality of time periods while retaining data elements of the secondary data subset for each of the plurality of time periods, wherein the sampling rate is uniform, and wherein the sampling rate is determined such that a representative portion of the data set is retained when the one or more data elements of the initial data subset for each of the plurality of time periods are removed from the data set. 20. A computer implemented method comprising: detecting, by a computer system, when a constraint for storing a data set has been exceeded; identifying, by the computer system, based on the constraint, an initial data subset from the data set for each of a plurality of time periods, from which at least some data elements will be removed by sampling; determining, by the computer system, a sampling rate for data element retention; identifying, by the computer system, a secondary data subset from the initial data subset for each of the plurality of time periods, based on sampling the initial data subset according to the sampling rate, the sampling rate applied to the initial data subset for each of the plurality of time periods; and removing, by the computer system, from the data set one or more data elements of the initial data subset for each of the plurality of time periods while retaining data elements of the secondary data subset for each of the plurality of time periods, wherein the sampling rate is uniform, and wherein the sampling rate is determined such that a representative portion of the data set is retained when the one or more data elements of the initial data subset for each of the plurality of time periods are removed from the data set.
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Design, administration or maintenance of databases · CPC title
characterised by the use of retention policies (retention policies for HSM systems G06F16/185) · CPC title
Triggers; Constraints · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.