Machine learning-based universal software component identification
US-12175241-B1 · Dec 24, 2024 · US
US2016203154A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016203154-A1 |
| Application number | US-201514594421-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jan 12, 2015 |
| Priority date | Jan 12, 2015 |
| Publication date | Jul 14, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Technology is disclosed herein for compressing, encoding, and otherwise reducing the size of resource files. In at least one implementation, similarity compression is employed to reduce the size of a resource file. In another implementation, map-less encoding is employed to reduce the number of bytes used to represent a resource string. Bit-level compression is employed in another implementation to reduce the quantity of bits used to encode each character in a string. In addition, implementations are disclosed related to technology for naming strings and accelerated string location and retrieval.
Opening claim text (preview).
1 . A method to facilitate enhanced resource file compression comprising: ordering a set of resource strings in a resource file to produce an ordered set of resource strings in the resource file; and reducing a size of the resource file by, for any of the ordered set of resource strings that qualify for similarity compression, at least: identifying a similarity value representative of an extent to which an initial portion of a resource string is similar to a next resource string in the ordered set of resource strings; and replacing the initial portion of the resource string in the resource file with the similarity value while retaining in the file a remaining portion of the resource string that was not replaced by the similarity value. 2 . The method of claim 1 wherein ordering the set of resource strings in the resource file comprises alphabetizing the set of resource strings and wherein the method further comprises determining whether or not any given resource string of the ordered set of resource strings qualifies for the similarity compression based at least in part on whether or not the given resource string is followed by any other resource string in the ordered set of resource strings. 3 . The method of claim 1 wherein each of the ordered set of resource strings comprises a set of characters and wherein the method further comprises further reducing the size of the resource file by, for any of the ordered set of resource strings that qualify for map-less encoding, at least: identifying a double-byte Unicode representation of each character in the set of characters in the resource string, wherein the double-byte Unicode representation comprises a lower byte and an upper byte; identifying at least one character in the set of characters for which the upper byte of the one character comprises a non-zero value, wherein the non-zero value indicates an occurrence of non-Latin characters; setting a value of an encoding byte to the non-zero value and retaining the encoding byte in the resource file to reflect the occurrence of the non-Latin characters; discarding the upper byte from the resource file for each of the set of characters; and retaining the lower byte in the resource file for each of the set of characters. 4 . The method of claim 3 further comprising determining whether or not any of the ordered set of resource strings qualifies for the map-less encoding based at least in part on whether or not the set of characters for any given resource string of the ordered set of resource strings includes characters from more than two character ranges corresponding to more than two different languages. 5 . The method of claim 3 further comprising, when a value of the lower byte of any of the non-Latin characters falls within a lower half of a range of possible values for the lower byte, shifting a value of the lower byte of any Latin characters into an upper half of the range of possible values for the lower byte. 6 . The method of claim 5 further comprising shifting the encoding byte to reflect the shifting of the value of the lower byte of the Latin characters into the upper half of the range of the possible values for the lower byte. 7 . The method of claim 3 wherein the lower byte retained in the resource file for each of the set of characters comprises an initial quantity of bits and wherein the method further comprises further reducing the size of the resource file by, for any of the ordered set of resource strings that qualify for bit-level compression, at least: defining a dictionary specific to the resource string to include one or more characters of the set of characters in the resource string; and for each of the set of characters in the resource string, encoding the character in the resource file in a subsequent quantity of bits that is less than the initial quantity of bits and that represents a position of the character in either the dictionary or in a range of characters not included in the dictionary. 8 . The method of claim 7 wherein the lower byte comprises eight bits initially and five bits subsequent to the encoding. 9 . The method of claim 7 further comprising determining whether or not any of the ordered set of resource strings qualify for bit-level compression based at least in part on a length of a given string of the ordered set of resource strings. 10 . The method of claim 1 wherein the resource file comprises a one of a plurality of files associated with a productivity application and wherein the ordered set of resource strings describe features in the productivity application. 11 . The method of claim 1 wherein the method further comprises further reducing the size of the resource file by, for any of the ordered set of resource strings that qualify for bit-level compression, at least: identifying a double-byte Unicode representation of each character in the set of characters in the resource string, wherein the double-byte Unicode representation comprises a lower byte and an upper byte, wherein the lower byte comprises an initial quantity of bits; defining a dictionary specific to the resource string to include one or more characters of the set of characters in the resource string; and for each of the set of characters in the resource string, encoding the character in the resource file in a subsequent quantity of bits that is less than the initial quantity of bits and that represents a position of the character in either the dictionary or in a range of characters not included in the dictionary. 12 . The method of claim 1 wherein the resource file comprises the ordered set of resource strings and a resource name corresponding to each of the ordered set of resource strings. 13 . The method of claim 12 wherein the method further comprises, for each of the ordered set of resources strings, hashing the resource name generate a hash value and replacing the resource name with a resource identifier that comprises the hash value. 14 . A method to facilitate enhanced resource file compression comprising: compressing at least a resource string of a plurality of resource strings in a resource file based at least in part on a similarity of the resource string to at least one other of the plurality of resource strings in the resource file; further compressing the resource string by encoding a double byte representation of each character in the resource string in a single byte representation of the character; and further compressing the resource string by compressing the single byte representation of the character from eight bits to five bits. 15 . An apparatus comprising: one or more computer readable storage media; and program instructions stored on the one or more computer readable storage media for reducing a size of a resource file that, when executed by a processing system, direct the processing system to at least, for any of an ordered set of resource strings in the resource file that qualify for similarity compression: identify a similarity value representative of an extent to which an initial portion of a resource string is similar to a next resource string in the ordered set of resource strings; and replace the initial portion of the resource string in the resource file with the similarity value while retaining in the file a remaining portion of the resource string that was not replaced by the similarity value. 16 . The apparatus of claim 15 wherein each of the ordered set of resource strings comprises a set of characters and wherein the program instructions further direct the processing system to reduce the size of the resource fil
employing the use of a dictionary, e.g. LZ78 · CPC title
Sorting · CPC title
Space efficiency improvement · CPC title
Version control (security arrangements therefor G06F21/57); Configuration management · CPC title
Multi-language systems; Localisation; Internationalisation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.