Data storage method and data storage apparatus for string data

US12450264B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12450264-B2
Application numberUS-202418413729-A
CountryUS
Kind codeB2
Filing dateJan 16, 2024
Priority dateJan 18, 2023
Publication dateOct 21, 2025
Grant dateOct 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data storage method for string data includes: performing pattern matching on each piece of string data in a to-be-stored string data set by using a pattern data set, to determine whether the string data includes matched pattern data in the pattern data set; in response to that the string data includes the matched pattern data, extracting dedicated string data other than the matched pattern data from the string data, and storing the extracted dedicated string data in a dedicated data storage area of the data storage system, wherein an index relationship is formed between the stored dedicated string data and corresponding pattern data stored in the pattern data storage area; and in response to that the string data does not include the matched pattern data, storing original data of the string data in the dedicated data storage area as a whole.

First claim

Opening claim text (preview).

What is claimed is: 1. A data storage method for string data, comprising: performing pattern matching on each piece of string data in a to-be-stored string data set by using a pattern data set, to determine whether the string data comprises matched pattern data in the pattern data set, wherein the pattern data set is obtained through training by using a string data sample set sampled from the to-be-stored string data set, and each piece of pattern data is common string data of a plurality of string data samples and is stored in a pattern data storage area of a data storage system; in response to that the string data comprises the matched pattern data, extracting dedicated string data other than the matched pattern data from the string data, and storing the extracted dedicated string data in a dedicated data storage area of the data storage system, wherein an index relationship is formed between the stored dedicated string data and corresponding pattern data stored in the pattern data storage area; and in response to that the string data does not comprise the matched pattern data, storing original data of the string data in the dedicated data storage area as a whole. 2. The data storage method according to claim 1 , wherein a data structure of the stored dedicated string data comprises at least an index data field and a dedicated data field, the index data field stores an index relationship to index corresponding pattern data, and the dedicated data field stores dedicated string data. 3. The data storage method according to claim 2 , wherein when the stored dedicated string data comprises a plurality of pieces of dedicated string sub-sequence data, for each piece of dedicated string sub-sequence data, the dedicated data field comprises a sub-sequence data length field and a sub-sequence data body field, the sub-sequence data length field stores a data length of the piece of dedicated string sub-sequence data, and the sub-sequence data body field stores the piece of dedicated string sub-sequence data. 4. The data storage method according to claim 1 , wherein the data storage system further has an index data storage area, and the data storage method further comprises: storing corresponding index data for each piece of string data in the index data storage area, wherein a data structure of the stored index data comprises a pattern data index field and a dedicated data index field, the pattern data index field stores first index data to index corresponding pattern data, and the dedicated data index field stores second index data to index corresponding dedicated string data. 5. The data storage method according to claim 1 , wherein the pattern data set is obtained through training based on a hierarchical clustering algorithm by using the string data sample set. 6. The data storage method according to claim 5 , wherein the pattern data set is obtained by: initializing each string data sample in the string data sample set as a whole to initial pattern data, to generate an initial pattern data set; and cyclically performing a pattern data set training process until a quantity of pieces of trained pattern data reaches a preset value, wherein the pattern data set training process comprises: calculating a pattern data similarity between every two pieces of pattern data in a current pattern data set; performing pattern data combination on two pieces of pattern data with a highest pattern data similarity to obtain combined pattern data; and replacing the two pieces of pattern data with the highest pattern data similarity in the current pattern data set with the combined pattern data, to update the pattern data set. 7. The data storage method according to claim 6 , wherein the calculated pattern data similarity comprises a pattern data distance. 8. The data storage method according to claim 7 , wherein the pattern data distance comprises a code length gain obtained after the two pieces of pattern data are combined. 9. The data storage method according to claim 1 , further comprising at least one of: performing first data compression on each piece of pattern data before the piece of pattern data is stored in the pattern data storage area; or performing second data compression on each piece of dedicated string data before the piece of dedicated string data is stored in the dedicated data storage area. 10. The data storage method according to claim 1 , further comprising: performing data compression on each piece of pattern data based on a data structure and a data composition of the pattern data; and performing data compression on each piece of dedicated string data based on a data structure and a data composition of the dedicated string data. 11. The data storage method according to claim 1 , further comprising: performing data compression on the original data of the string data before the original data of the string data is stored in the dedicated data storage area as a whole. 12. The method according to claim 1 , wherein the data storage system comprises a plurality of data storage devices, and the pattern data storage area and the dedicated data storage area are deployed in different data storage devices. 13. A data storage apparatus for string data, comprising: a processor; and a memory storing instructions executable by the processor; wherein the processor is configured to: perform pattern matching on each piece of string data in a to-be-stored string data set by using a pattern data set, to determine whether the string data comprises matched pattern data in the pattern data set, wherein the pattern data set is obtained through training by using a string data sample set sampled from the to-be-stored string data set, and each piece of pattern data is common string data of a plurality of string data samples and is stored in a pattern data storage area of a data storage system; in response to that the string data comprises the matched pattern data, extract dedicated string data other than the matched pattern data from the string data, and store the extracted dedicated string data in a dedicated data storage area of the data storage system, wherein an index relationship is formed between the stored dedicated string data and corresponding pattern data stored in the pattern data storage area; and in response to that the string data does not comprise the matched pattern data, store original data of the string data in the dedicated data storage area as a whole. 14. The data storage apparatus according to claim 13 , wherein a data structure of the stored dedicated string data comprises at least an index data field and a dedicated data field, the index data field stores an index relationship to index corresponding pattern data, and the dedicated data field stores dedicated string data. 15. The data storage apparatus according to claim 13 , wherein the data storage system further has an index data storage area, and the processor is further configured to: store corresponding index data for each piece of string data in the index data storage area, wherein a data structure of the stored index data comprises a pattern data index field and a dedicated data index field, the pattern data index field stores first index data to index corresponding pattern data, and the dedicated data index field stores second index data to index corresponding dedicated string data. 16. The data storage apparatus according to claim 13 , wherein the processor is further configured to obtain, through training, the pattern data set based on a hierarchical clustering algorithm by using the string data sample set.

Assignees

Inventors

Classifications

  • Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • G06F16/288Primary

    Entity relationship models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12450264B2 cover?
A data storage method for string data includes: performing pattern matching on each piece of string data in a to-be-stored string data set by using a pattern data set, to determine whether the string data includes matched pattern data in the pattern data set; in response to that the string data includes the matched pattern data, extracting dedicated string data other than the matched pattern da…
Who is the assignee on this patent?
Alipay Hangzhou Inf Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/288. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).