Hybrid comparison for unicode text strings consisting primarily of ASCII characters
US-10089281-B1 · Oct 2, 2018 · US
US12073172B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12073172-B2 |
| Application number | US-202217878187-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 1, 2022 |
| Priority date | Aug 1, 2022 |
| Publication date | Aug 27, 2024 |
| Grant date | Aug 27, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A pointer is set to a first code unit of an original string that encodes characters via code units within an encoding scheme. Whether the code unit of the original string referenced by the pointer is valid within the encoding scheme is determined. If the code unit referenced by the pointer is valid, one or more code units of the original string that encode a single character within the encoding scheme are processed, starting at the code unit referenced by the pointer. The one or more code units as have been processed are appended to a processed string. A single shadow unit indicating that the one or more code units that have been processed are valid is appended to a shadow array. The pointer is advanced to the code unit of the original string following the one or more code units.
Opening claim text (preview).
We claim: 1. A non-transitory computer-readable data storage medium storing program code executable by a processor to perform processing comprising: setting a pointer to a first code unit of an original string encoding characters via a plurality of code units, including the first code unit, within an encoding scheme; determining whether the code unit of the original string referenced by the pointer is valid within the encoding scheme; in response to determining that the code unit of the original string referenced by the pointer is valid: starting at the code unit referenced by the pointer, processing one or more code units of the original string that encode a single character within the encoding scheme; appending the one or more code units as have been processed to a processed string; appending, to a shadow array corresponding to the original string, a single shadow unit indicating that the one or more code units that have been processed are valid; and advancing the pointer to the code unit of the original string following the one or more code units. 2. The non-transitory computer-readable data storage medium of claim 1 , wherein the processing further comprises, in response to determining that the code unit of the string referenced by the pointer is invalid: appending, to the shadow array, a single shadow unit indicating that the code unit is invalid; advancing the pointer to a next code unit of the original string. 3. The non-transitory computer-readable data storage medium of claim 2 , wherein the processing further comprises, in response to determining that the code unit of the string referenced by the pointer is invalid: appending the code unit of the string referenced by the code unit to the processed string without first processing the code unit. 4. The non-transitory computer-readable data storage medium of claim 3 , wherein the shadow array comprises a plurality of shadow units that are each equal or smaller in size as compared to the code units of the original string, wherein the single shadow unit indicating that the code unit is invalid is a first prespecified shadow unit that does not encode a value of the code unit, and wherein the single shadow unit indicating that the one or more code units that have been processed are valid is a second prespecified shadow unit that does not encode on values of the one or more code units that have been processed. 5. The non-transitory computer-readable data storage medium of claim 2 , wherein the processing further comprises, in response to determining that the code unit of the string referenced by the pointer is invalid: not appending any code units to the processed string. 6. The non-transitory computer-readable data storage medium of claim 5 , wherein the shadow array comprises a plurality of shadow units that are each equal or larger in size as compared to the code units of the original string, wherein the single shadow unit indicating that the code unit is invalid is the code unit that is invalid, and wherein the single shadow unit indicating that the one or more code units that have been processed are valid is a prespecified code unit that is valid within the encoding scheme, the prespecified code unit not encoding values of the one or more code units that have been processed. 7. The non-transitory computer-readable data storage medium of claim 2 , further comprising repeating the processing at determining whether the code unit of the original string referenced by the pointer is valid within the encoding scheme. 8. The non-transitory computer-readable data storage medium of claim 1 , wherein processing the one or more code units of the original string comprises: decoding the single character encoded by the one or more code units within the encoding scheme; converting the single character to a processed character; and encoding the processed character within an encoding scheme that is identical to or different than the encoding scheme within which the one or more code units encode the single character, wherein the processed character as encoded within the encoding scheme constitute the one or more code units as processed that are appended to the processed string. 9. The non-transitory computer-readable data storage medium of claim 1 , wherein processing the one or more code units of the original string comprises: decoding the single character encoded by the one or more code units within the encoding scheme; appending the single character to a first interim string; and generating a placeholder code unit constituting the one or more code units as processed that are appended to the processed string. 10. The non-transitory computer-readable data storage medium of claim 9 , wherein the processing further comprises: converting the first interim string to a second interim string having a plurality of interim characters equal or greater in number than the first interim string; for each interim character of the second interim string, encoding the interim character as one or more interim code units within the encoding scheme of the original string; and for each interim character of the second interim string having logical positional correspondence with a respective placeholder code unit of the processed string, replacing the respective placeholder code unit in the processed string with the one or more interim code units within which the interim character has been encoded within the encoding scheme. 11. The non-transitory computer-readable data storage medium of claim 10 , wherein the processing further comprises: for each interim character of the second interim string not having logical positional correspondence with a respective placeholder code unit of the processed string, appending to the processed string the one or more interim code units within which the interim character has been encoded within the encoding scheme. 12. The non-transitory computer-readable data storage medium of claim 10 , wherein the original string is recoverable from the processed string without using the shadow array in a same manner in which the processed string is generated from the original string. 13. A method comprising: setting, by a processor, a shadow pointer to a first shadow unit of a plurality of shadow units of a shadow array; setting, by the processor, a string pointer to a first code unit of a processed string encoding characters via a plurality of code units, including the first code unit, within an encoding scheme, the shadow array corresponding to the processed string; determining, by the processor, whether the shadow unit referenced by the shadow pointer indicates that one or more code units of the processed string beginning with the code unit referenced by the string pointer are valid within the encoding scheme, or whether the shadow unit referenced by the shadow pointer indicates that the code unit referenced by the string pointer is invalid within the encoding scheme; in response to determining that the shadow unit referenced by the shadow pointer indicates that the one or more code units of the processed string beginning with the code unit referenced by the string pointer are valid: starting at the code unit referenced by the string pointer, processing, by the processor, the one or more code units that encode a single character; appending, by the processor, the one or more code units as have been processed to an original string; advancing, by the processor, the string pointer to the code unit of the processed string following the one or more code units; and advancing, by the processor, the shadow pointer to a next shadow unit within the shadow array.
Unicode · CPC title
Encoder aspects · CPC title
Character encoding · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.