Enhanced compression, encoding, and naming for resource strings

US2016203152A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016203152-A1
Application numberUS-201514594559-A
CountryUS
Kind codeA1
Filing dateJan 12, 2015
Priority dateJan 12, 2015
Publication dateJul 14, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Technology is disclosed herein for compressing, encoding, and otherwise reducing the size of resource files. In at least one implementation, similarity compression is employed to reduce the size of a resource file. In another implementation, map-less encoding is employed to reduce the number of bytes used to represent a resource string. Bit-level compression is employed in another implementation to reduce the quantity of bits used to encode each character in a string. In addition, implementations are disclosed related to technology for naming strings and accelerated string location and retrieval.

First claim

Opening claim text (preview).

1 . A method to facilitate enhanced resource file compression comprising: ordering a set of resource strings in a resource file to produce an ordered set of resource strings in the resource file, wherein each of the ordered set of resource strings comprises a set of characters; and reducing a size of the resource file by, for any resource string of the ordered set of resource strings that qualifies for map-less encoding, at least: identifying a double-byte Unicode representation of each character in the set of characters in the resource string, wherein the double-byte Unicode representation comprises a lower byte and an upper byte; identifying at least one character in the set of characters for which the upper byte of the one character comprises a non-zero value, wherein the non-zero value indicates an occurrence of non-Latin characters; setting a value of an encoding byte to the non-zero value and retaining the encoding byte in the resource file to reflect the occurrence of the non-Latin characters; discarding the upper byte from the resource file for each of the set of characters; and retaining the lower byte in the resource file for each of the set of characters. 2 . The method of claim 1 further comprising further reducing the size of the resource file by, for any of the ordered set of resource strings that qualify for similarity compression, at least: identifying a similarity value representative of an extent to which an initial portion of the resource string is similar to a next resource string in the ordered set of resource strings; and replacing the initial portion of the resource string in the resource file with the similarity value while retaining in the file a remaining portion of the resource string that was not replaced by the similarity value. 3 . The method of claim 2 wherein ordering the set of resource strings in the resource file comprises alphabetizing the set of resource strings and wherein the method further comprises determining whether or not any given resource string of the ordered set of resource strings qualifies for the similarity compression based at least in part on whether or not the given resource string is followed by any other resource string in the ordered set of resource strings. 4 . The method of claim 2 wherein the lower byte retained in the resource file for each of the set of characters comprises an initial quantity of bits and wherein the method further comprises further reducing the size of the resource file by, for any of the ordered set of resource strings that qualify for bit-level compression, at least: defining a dictionary specific to the resource string to include one or more characters of the set of characters in the resource string; and for each of the set of characters in the resource string, encoding the character in the resource file in a subsequent quantity of bits that is less than the initial quantity of bits and that represents a position of the character in either the dictionary or in a range of characters not included in the dictionary. 5 . The method of claim 4 wherein the lower byte comprises eight bits initially and five bits subsequent to the encoding. 6 . The method of claim 5 further comprising determining whether or not any of the ordered set of resource strings qualify for bit-level compression based at least in part on a length of a given string of the ordered set of resource strings. 7 . The method of claim 1 further comprising determining whether or not any of the ordered set of resource strings qualifies for the map-less encoding based at least in part on whether or not the set of characters for any given resource string of the ordered set of resource strings includes characters from more than two character ranges corresponding to more than two different languages. 8 . The method of claim 1 further comprising, when a value of the lower byte of any of the non-Latin characters falls within a lower half of a range of possible values for the lower byte, shifting a value of the lower byte of any Latin characters into an upper half of the range of possible values for the lower byte. 9 . The method of claim 6 further comprising shifting the encoding byte to reflect the shifting of the value of the lower byte of the Latin characters into the upper half of the range of the possible values for the lower byte. 10 . The method of claim 1 wherein the resource file comprises a one of a plurality of files associated with a productivity application and wherein the ordered set of resource strings describe features in the productivity application. 11 . The method of claim 1 wherein the method further comprises further reducing the size of the resource file by, for any of the ordered set of resource strings that qualify for bit-level compression, at least: defining a dictionary specific to the resource string to include one or more characters of the set of characters in the resource string; and for each of the set of characters in the resource string, encoding the character in the resource file in a subsequent quantity of bits that is less than the initial quantity of bits and that represents a position of the character in either the dictionary or in a range of characters not included in the dictionary. 12 . The method of claim 1 wherein the resource file comprises the ordered set of resource strings and a resource name corresponding to each of the ordered set of resource strings, wherein the method further comprises, for each of the ordered set of resources strings, hashing the resource name to generate a hash value and replacing the resource name with a resource identifier that comprises the hash value. 14 . A system comprising: a storage system for storing software; a processing system operatively coupled to the storage system; and program instructions stored on the storage system for facilitating enhanced resource file compression that, when read and executed by the processing system, direct the processing system to at least: compress at least a resource string of a plurality of resource strings in a resource file based at least in part on a similarity of the resource string to at least one other of the plurality of resource strings in the resource file; further compress the resource string by encoding a double byte representation of each character in the resource string in a single byte representation of the character; and further compress the resource string by compressing the single byte representation of the character from eight bits to five bits. 15 . An apparatus comprising: one or more computer readable storage media; and program instructions stored on the one or more computer readable storage media for reducing a size of a resource file that, when executed by a processing system, direct the processing system to at least, for any resource string of an ordered set of resource strings in the resource file having a set of characters that qualify for map-less encoding: identify a double-byte Unicode representation of each character in the set of characters in the resource string, wherein the double-byte Unicode representation comprises a lower byte and an upper byte; identify at least one character in the set of characters for which the upper byte of the one character comprises a non-zero value, wherein the non-zero value indicates an occurrence of non-Latin characters; set a value of an encoding byte to the non-zero value and retain the encoding byte in the resource file to reflect the occurrence of the non-Latin characters; discard the upper byte from the resource file for each of the set of characters; and retain the lower byte in

Assignees

Inventors

Classifications

  • Indexing; Data structures therefor; Storage structures · CPC title

  • Unicode · CPC title

  • Sorting · CPC title

  • Multi-language systems; Localisation; Internationalisation · CPC title

  • G06F8/71Primary

    Version control (security arrangements therefor G06F21/57); Configuration management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016203152A1 cover?
Technology is disclosed herein for compressing, encoding, and otherwise reducing the size of resource files. In at least one implementation, similarity compression is employed to reduce the size of a resource file. In another implementation, map-less encoding is employed to reduce the number of bytes used to represent a resource string. Bit-level compression is employed in another implementatio…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F8/71. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 14 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).