Method and device for data compression, transmission, and decompression
US-9515737-B2 · Dec 6, 2016 · US
US10089282B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10089282-B1 |
| Application number | US-201815885646-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jan 31, 2018 |
| Priority date | Nov 6, 2016 |
| Publication date | Oct 2, 2018 |
| Grant date | Oct 2, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Collating text strings having Unicode encoding includes receiving two text strings S=s 1 s 2 . . . s n and T=t 1 t 2 . . . t m . When the two text strings are not identical, there is a smallest positive integer p for which the two text strings differ. The process looks up the characters s p and t p in a predefined lookup table. If either of these characters is missing from the lookup table, the collation of the text strings is determined using the standard Unicode comparison of the text strings s p s p+1 . . . s n and t p t p+1 . . . t m . Otherwise, the lookup table assigns weights v p and w p for the characters s p and t p . When v p ≠w p , these weights define the collation order of the strings S and T. When v p =w p , the collation of S and T is determined recursively using the suffix strings s p+1 . . . s n and t p+1 . . . t m .
Opening claim text (preview).
What is claimed is: 1. A method of collating text strings having Unicode encoding, comprising: at a computer having one or more processors, and memory storing one or more programs configured for execution by the one or more processors: receiving a first text string S=s 1 s 2 . . . s n having Unicode encoding and a second text string T=t 1 t 2 . . . t m having Unicode encoding, wherein n and m are positive integers, s 1 , s 2 , . . . , s n and t 1 , t 2 , . . . , t m are Unicode characters, and S is not identical to T; (1) identifying a positive integer p with s 1 =t 1 , s 2 =t 2 , . . . , s p−1 =t p−1 and s p ≠t p , wherein at least one of s p and t p is a non-ASCII character; (2) looking up the characters s p and t p in a predefined lookup table to determine a weight v p for the character s p and a weight w p for the character t p ; (3) when at least one of s p and t p is not found in the lookup table, determining the collation order of the strings S and T using Unicode weights for the corresponding strings s p s p+1 . . . s n and t p t p+1 . . . t m ; (4) when both s p and t p are found in the lookup table and v p <w p , determining that S is collated before T; (5) when both s p and t p are found in the lookup table and w p <v p , determining that T is collated before S; (6) when both s p and t p are found in the lookup table, v p =w p , and s p+1 . . . s n =t p+1 . . . t m , determining that S and T have the same collation position; and when both s p and t p are found in the lookup table, v p =w p , and s p+1 . . . s n ≠t p+1 . . . t m , determining the collation order of S and T recursively according to steps (1)-(6) using the suffix strings s p+1 . . . s n and t p+1 . . . t m . 2. The method of claim 1 , wherein the lookup table includes lookup values for each non-control ASCII character plus a plurality of accented Roman characters. 3. The method of claim 1 , wherein each weight in the lookup table is encoded as a respective single byte. 4. The method of claim 1 , further comprising, when m≠n, padding the shorter of the text strings S and T on the right so that the text strings S and T have the same length. 5. The method of claim 4 , wherein the padding comprises ASCII null characters. 6. The method of claim 1 , wherein the Unicode weights for the strings s p s p+1 . . . s n and t p t p+1 . . . t m are computed, the computation comprising: for each character, performing a lookup in a Unicode weight table to identify a respective primary weight, a respective accent weight, and a respective case-weight; forming a primary Unicode weight w p as a concatenation of the identified primary weights; forming an accent Unicode weight w a as a concatenation of the identified accent weights; forming a case Unicode weight w c as a concatenation of the identified case weights; and forming the Unicode weight as a concatenation w p +w a +w c of the primary Unicode weight, the accent Unicode weight, and the case Unicode weight. 7. The method of claim 6 , wherein the collation order is in accordance with a specified language, and the Unicode weight table is selected according to the specified language. 8. A computing device, comprising: one or more processors; memory; and one or more programs stored in the memory and configured for execution by the one or more processors, the one or more programs comprising instructions for: receiving a first text string S=s 1 s 2 . . . s n having Unicode encoding and a second text string T=t 1 t 2 . . . t m having Unicode encoding, wherein n and m are positive integers, s 1 , s 2 , . . . , s n and t 1 , t 2 , . . . , t m are Unicode characters, and S is not identical to T; (1) identifying a positive integer p with s 1 =t 1 , s 2 =t 2 , . . . , s p−1 =t p−1 and s p ≠t p , wherein at least one of s p and t p is a non-ASCII character; (2) looking up the characters s p and t p in a predefined lookup table to determine a weight v p for the character s p and a weight w p for the character t p ; (3) when at least one of s p and t p is not found in the lookup table, determining the collation order of the strings S and T using Unicode weights for the corresponding strings s p s p+1 . . . s n and t p t p+1 . . . t m ; (4) when both s p and t p are found in the lookup table and v p <w p , determining that S is collated before T; (5) when both s p and t p are found in the lookup table and w p <v p , determining that T is collated before S; (6) when both s p and t p are found in the lookup table, v p =w p , and s p+1 . . . s n =t p+1 . . . t m , determining that S and T have the same collation position; and when both s p and t p are found in the lookup table, v p =w p , and s p+1 . . . s n ≠t p+1 . . . t m , determining the collation order of S and T recursively according to steps (1)-(6) using the suffix strings s p+1 . . . s n and t p+1 . . . t m . 9. The computing device of claim 8 , wherein the lookup table includes lookup values for each non-control ASCII character plus a plurality of accented Roman characters. 10. The computing device of claim 8 , wherein each weight in the lookup table is encoded as a respective single byte. 11. The computing device of claim 8 , wherein the one or more programs further comprise instructions padding the shorter of the text strings S and T on the right so that the text strings S and T have the same length when m≠n. 12. The computing device of claim 11 , wherein the padding comprises ASCII null characters. 13. The computing device of claim 8 , wherein the one or more programs comprise instructions for computing the Unicode weights for the strings s p s p+1 . . . s n and t p t p+1 . . . t m are computed, the computation comprising: for each character, performing a lookup in a Unicode weight table to identify a respective primary weight, a respective accent weight, and a respective case-weight; forming a primary Unicode weight w p as a concatenation of the identified primary weights; forming an accent Unicode weight w a as a concatenation of the identified accent weights; forming a case Unicode weight w c as a concatenation of the identified case weights; and forming the Unicode weight as a concatenation w p +w a +w c of the primary Unicode weight, the accent Unicode weight, and the case Unicode weight. 14. The computing device of claim 13 , wherein the collation order is in accordance with a specified language, and the Unicode weight table is selected according to the specified language. 15. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computing device having one or more processors and memory, the one or more programs comprising instructions for: receiving a first text string S=s 1 s 2 . . . s n having Unicode encoding and a second text string T=t 1 t 2 . . . t m having Unicode encoding, wherein n and m are positive integers, s 1 , s 2 , . . . , s n and t 1 , t 2 , . . . , t m are Unicode characters, and S is not identical to T; (1) identifying a positive integer p with s 1 =t 1 , s 2 =t 2 , . . . , s p−1 =t p−1 and s p ≠t p , wherein at least one of s p and t p is a non-ASCII character; (2) looking up the characters s p and t p in a predefined lookup table to determine a weight v p for the character s p and a weight w p for the character t p ; (3) when at least one of s p and t p is not found in the lookup table, determining the collation order of the strings S and T using Unicode weights for the corresponding strings s p s p+1 . . . s n and t p t p+1 . . . t m ; (4)
Plan optimisation · CPC title
of operators · CPC title
of sub-queries or views · CPC title
Editing, e.g. inserting or deleting · CPC title
Conversion to or from non-weighted codes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.