Speculative execution for regular expressions
US-2024176781-A1 · May 30, 2024 · US
US12450264B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12450264-B2 |
| Application number | US-202418413729-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 16, 2024 |
| Priority date | Jan 18, 2023 |
| Publication date | Oct 21, 2025 |
| Grant date | Oct 21, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A data storage method for string data includes: performing pattern matching on each piece of string data in a to-be-stored string data set by using a pattern data set, to determine whether the string data includes matched pattern data in the pattern data set; in response to that the string data includes the matched pattern data, extracting dedicated string data other than the matched pattern data from the string data, and storing the extracted dedicated string data in a dedicated data storage area of the data storage system, wherein an index relationship is formed between the stored dedicated string data and corresponding pattern data stored in the pattern data storage area; and in response to that the string data does not include the matched pattern data, storing original data of the string data in the dedicated data storage area as a whole.
Opening claim text (preview).
What is claimed is: 1. A data storage method for string data, comprising: performing pattern matching on each piece of string data in a to-be-stored string data set by using a pattern data set, to determine whether the string data comprises matched pattern data in the pattern data set, wherein the pattern data set is obtained through training by using a string data sample set sampled from the to-be-stored string data set, and each piece of pattern data is common string data of a plurality of string data samples and is stored in a pattern data storage area of a data storage system; in response to that the string data comprises the matched pattern data, extracting dedicated string data other than the matched pattern data from the string data, and storing the extracted dedicated string data in a dedicated data storage area of the data storage system, wherein an index relationship is formed between the stored dedicated string data and corresponding pattern data stored in the pattern data storage area; and in response to that the string data does not comprise the matched pattern data, storing original data of the string data in the dedicated data storage area as a whole. 2. The data storage method according to claim 1 , wherein a data structure of the stored dedicated string data comprises at least an index data field and a dedicated data field, the index data field stores an index relationship to index corresponding pattern data, and the dedicated data field stores dedicated string data. 3. The data storage method according to claim 2 , wherein when the stored dedicated string data comprises a plurality of pieces of dedicated string sub-sequence data, for each piece of dedicated string sub-sequence data, the dedicated data field comprises a sub-sequence data length field and a sub-sequence data body field, the sub-sequence data length field stores a data length of the piece of dedicated string sub-sequence data, and the sub-sequence data body field stores the piece of dedicated string sub-sequence data. 4. The data storage method according to claim 1 , wherein the data storage system further has an index data storage area, and the data storage method further comprises: storing corresponding index data for each piece of string data in the index data storage area, wherein a data structure of the stored index data comprises a pattern data index field and a dedicated data index field, the pattern data index field stores first index data to index corresponding pattern data, and the dedicated data index field stores second index data to index corresponding dedicated string data. 5. The data storage method according to claim 1 , wherein the pattern data set is obtained through training based on a hierarchical clustering algorithm by using the string data sample set. 6. The data storage method according to claim 5 , wherein the pattern data set is obtained by: initializing each string data sample in the string data sample set as a whole to initial pattern data, to generate an initial pattern data set; and cyclically performing a pattern data set training process until a quantity of pieces of trained pattern data reaches a preset value, wherein the pattern data set training process comprises: calculating a pattern data similarity between every two pieces of pattern data in a current pattern data set; performing pattern data combination on two pieces of pattern data with a highest pattern data similarity to obtain combined pattern data; and replacing the two pieces of pattern data with the highest pattern data similarity in the current pattern data set with the combined pattern data, to update the pattern data set. 7. The data storage method according to claim 6 , wherein the calculated pattern data similarity comprises a pattern data distance. 8. The data storage method according to claim 7 , wherein the pattern data distance comprises a code length gain obtained after the two pieces of pattern data are combined. 9. The data storage method according to claim 1 , further comprising at least one of: performing first data compression on each piece of pattern data before the piece of pattern data is stored in the pattern data storage area; or performing second data compression on each piece of dedicated string data before the piece of dedicated string data is stored in the dedicated data storage area. 10. The data storage method according to claim 1 , further comprising: performing data compression on each piece of pattern data based on a data structure and a data composition of the pattern data; and performing data compression on each piece of dedicated string data based on a data structure and a data composition of the dedicated string data. 11. The data storage method according to claim 1 , further comprising: performing data compression on the original data of the string data before the original data of the string data is stored in the dedicated data storage area as a whole. 12. The method according to claim 1 , wherein the data storage system comprises a plurality of data storage devices, and the pattern data storage area and the dedicated data storage area are deployed in different data storage devices. 13. A data storage apparatus for string data, comprising: a processor; and a memory storing instructions executable by the processor; wherein the processor is configured to: perform pattern matching on each piece of string data in a to-be-stored string data set by using a pattern data set, to determine whether the string data comprises matched pattern data in the pattern data set, wherein the pattern data set is obtained through training by using a string data sample set sampled from the to-be-stored string data set, and each piece of pattern data is common string data of a plurality of string data samples and is stored in a pattern data storage area of a data storage system; in response to that the string data comprises the matched pattern data, extract dedicated string data other than the matched pattern data from the string data, and store the extracted dedicated string data in a dedicated data storage area of the data storage system, wherein an index relationship is formed between the stored dedicated string data and corresponding pattern data stored in the pattern data storage area; and in response to that the string data does not comprise the matched pattern data, store original data of the string data in the dedicated data storage area as a whole. 14. The data storage apparatus according to claim 13 , wherein a data structure of the stored dedicated string data comprises at least an index data field and a dedicated data field, the index data field stores an index relationship to index corresponding pattern data, and the dedicated data field stores dedicated string data. 15. The data storage apparatus according to claim 13 , wherein the data storage system further has an index data storage area, and the processor is further configured to: store corresponding index data for each piece of string data in the index data storage area, wherein a data structure of the stored index data comprises a pattern data index field and a dedicated data index field, the pattern data index field stores first index data to index corresponding pattern data, and the dedicated data index field stores second index data to index corresponding dedicated string data. 16. The data storage apparatus according to claim 13 , wherein the processor is further configured to obtain, through training, the pattern data set based on a hierarchical clustering algorithm by using the string data sample set.
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Entity relationship models · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.