Xtreme Compression's Proprietary Data Compression Methods
Attribute Vector Coding
Attribute vector coding is a content-aware vector transform method for compressing multidimensional database tables. It breaks new ground by capturing and exploiting the innumerable relationships, functional dependencies, and statistical correlations in data without having to solve the intractable problem of explicitly identifying and defining them as individual conditional mutual information terms.
Attribute vector coding recognizes correlational structure, and produces complex structured symbols that are fast to decompress. It achieves unequaled compression by systematically modeling data at high levels of abstraction, across dimensions and data types. That makes it far less subject than conventional methods to compressibility limits imposed by information theory.
Wordencoding is a 0-order (context-independent) variable-to-variable-length algorithm for compressing text strings in database table record fields. It achieves compression close to the 0-order source entropy without sacrificing speed. It does that by efficiently maximizing effective combined data locality over compressed record fields, lexicons holding strings, and access data structures. Wordencoding deals explicitly with the data's correlational structure by recognizing that redundancy in text strings exists at multiple levels of granularity.
Repopulation is a structural method for compressing monotonic integer sequences in hash tables and similar data structures. It populates table locations that would otherwise be unused with subsequences that would otherwise occupy memory.
Unlike almost every other lossless compression method, repopulation is not a replacement scheme. Instead, repopulation is transpositional and mechanistic; it works like a chess-playing automaton. It draws on no information-theoretic concepts. Repopulation simultaneously achieves the access speed of a low load factor and the table compactness of a high one, thus avoiding that historical compromise.
Superpopulation is a variable-to-variable-length algorithm targeting index tables, lists, arrays, zerotrees, and the like. It systematically accommodates wide local variations in data statistics. Superpopulation can be used by itself or in conjunction with repopulation.
Superpopulation recognizes that distributions of values in access data structures are often far from random, having areas of high and low correlation. It works by classifying each such area as one of two distinct target types, and applying a target type-specific encoding method to each.