Data collection method and apparatus, computer device, and storage medium

US2025190417A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025190417-A1
Application numberUS-202418790545-A
CountryUS
Kind codeA1
Filing dateJul 31, 2024
Priority dateDec 6, 2023
Publication dateJun 12, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a data collection method and apparatus, a computer device, and a storage medium. The method includes: determining, in response to a garbage collection request, respective first index entries from a first data table to be processed, first key data and storage location information in the first data table of a first key-value pair data corresponding to the first key data are stored in the first index entries; selecting valid target key data from the first key data according to current respective second data tables in the log-structured merge tree; reading target value data corresponding to the respective target key data in the first data table according to storage location information in the first index entries; constructing a new first data table according to the target key data and the target value data, and collecting the first data table to be processed.

First claim

Opening claim text (preview).

1 . A data collection method, comprising: determining, in response to a garbage collection request, respective first index entries from a first data table to be processed, wherein first key-value pair data and the first index entries are stored in the first data table, the first key-value pair data is derived from a key-value separated log-structured merge tree, and first key data and storage location information in the first data table of the first key-value pair data corresponding to the first key data are stored in the first index entries; selecting valid target key data from the first key data stored in the respective first index entries according to current respective second data tables in the log-structured merge tree; reading target value data corresponding to the respective target key data in the first data table according to storage location information in the first index entries where the respective target key data are located; and constructing a new first data table according to the target key data and the target value data, and collecting the first data table to be processed, wherein target key-value pair data consisting of the target key data and the target value data, as well as new first index entries, are stored in the new first data table, the target key data and storage location information in the new first data table of the target key-value pair data corresponding to the target key data are comprised in the new first index entries. 2 . The method according to claim 1 , wherein selecting the valid target key data from the first key data stored in the respective first index entries according to the current respective second data tables in the log-structured merge tree, comprises: selecting first key data identical to any one of the second key data, as the valid target key data, from the first key data stored in the respective first index entries according to respective second key data stored in the current respective second data tables in the log-structured merge tree. 3 . The method according to claim 2 , wherein selecting the first key data identical to any one of the second key data as the valid target key data from the first key data stored in the respective first index entries according to the respective second key data stored in current respective current second data tables in the log-structured merge tree, comprises: traversing, with respect to any one of the first key data, the respective second data tables sequentially in accordance with a hierarchy to which the respective second data table belongs in the log-structured merge tree to search a second data table associated with the first key data; and determining that the first key data is the valid target key data in response to that the second data table associated with the first key data is searched and the second key data in the second data table which is identical to the first key data has the same version as the first key data. 4 . The method according to claim 2 , wherein selecting the first key data identical to any one of the second key data as the valid target key data, from the first key data stored in the respective first index entries according to the respective second key data stored in current respective second data tables in the log-structured merge tree, comprises: traversing, with respect to any one of the first key data, the respective second data tables sequentially in accordance with a hierarchy to which the respective second data table belongs in the log-structured merge tree to determine a second data table associated with the first key data; and reading a first target index data block from the second data table associated with the first key data, wherein the first target index data block comprises multiple second index entries, the second index entries are first type of index entries and/or second type of index entries, the first type of index entries are indexes associated with the second key data and table indexes of the first data table in which the first value data corresponding to the second key data is located, the second type of index entries are index entries associated with third key data and storage location information of key-value pair data corresponding to the third key data in the second data table, the key-value pair data corresponding to the third key data has a data volume less than a preset data volume, and the key-value pair data corresponding to the second key data has a data volume greater than or equal to the preset data volume; determining, in response to the multiple second index entries with the first type of index entries, whether second key data matching the first key data exists according to the second key data in the first type of index entries; and taking the first key data as the target key data in response to determining that second key data matching the first key data exists. 5 . The method according to claim 3 , wherein the second data table associated with the first key data contains second key data identical to the first key data, and in response to that there exists multiple second data tables containing the second key data identical to the first key data, a second data table with the highest hierarchical level in the multiple second data tables is token as the second data table associated with the first key data. 6 . The method according to claim 1 , wherein before selecting the first key data identical to any one of the second key data as the valid target key data from the first key data stored in the respective first index entries according to the respective second key data stored in current respective second data tables in the log-structured merge tree, the method comprises: determining, with respect to any one of the first key data, whether matched key data matching the first key data exists from buffered key data with a hot data feature that is contained in a set of key data buffered in a memory, wherein the hot data feature is used for indicating that the buffered key data is key data repeatedly written multiple times; determining whether the matched key data and the first key data are the same version in response to that there exists the matched key data matched with the first key data; and determining that the first key data is invalid key data in response to that the matched key data and the first key data are not the same version; or taking the first key data directly as the target key data in response to that the matched key data and the first key data are the same version. 7 . The method according to claim 6 , wherein selecting the first key data identical to any one of the second key data as the valid target key data, from the first key data stored in the respective first index entries according to the respective second key data stored in current respective second data tables in the log-structured merge tree, comprises: selecting the target key data from the first key data without the matched key data according to the respective second key data stored in the respective second data tables. 8 . The method according to claim 7 , wherein constructing the new first data table according to the target key data and the target value data, comprises: determining the target key data with the matched key data as hot key data with the hot data feature, and/or, determining the target key data without the matched key data as cold key data with a cold data feature, wherein the cold data feature is used for indicating that the cold key data is key data written once; and constructing the new first data table with the hot data feature according to the respective hot key data and target value data corresponding to the hot key data, and/or, constructing the new first data table with the cold data feature according to the re

Assignees

Inventors

Classifications

  • Tablespace storage structures; Management thereof · CPC title

  • Trees, e.g. B+trees · CPC title

  • Garbage collection, i.e. reclamation of unreferenced memory · CPC title

  • Journaling file systems · CPC title

  • Indexing structures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025190417A1 cover?
The present disclosure provides a data collection method and apparatus, a computer device, and a storage medium. The method includes: determining, in response to a garbage collection request, respective first index entries from a first data table to be processed, first key data and storage location information in the first data table of a first key-value pair data corresponding to the first key…
Who is the assignee on this patent?
Douyin Vision Co Ltd, Lemon Inc, Univ Huazhong Science Tech
What technology area does this patent fall under?
Primary CPC classification G06F16/2282. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 12 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).