DBAs and database professionals have been aware of the pros and cons of compressing data for years. The traditional argument goes something like this: with compression you can store more data in less space, but at the cost of incurring CPU to compress the data upon insertion (and modification) and decompress the data upon reading it. Over time, the benefits of compression became greater as compression algorithms became more robust, hardware assist chips became available to augment compression speed, and the distributed model of computing made transmitting data across networks a critical piece of the business transaction (and transmitting compressed data is more efficient than transmitting uncompressed data).
IBM has significantly improved compression in DB2 for z/OS over the years. In the early days of mainframe DB2 no compression capability came with DB2 out-of-the-box -- the only mechanism for compressing data was via an exit routine (EDITPROC). Many software vendors developed and sold compression routines for DB2. Eventually, IBM began shipping a sample compression routine with DB2. And then in DB2 Version 3 (1993) hardware-assisted compression was introduced. Using the hardware assist , the CPU used by DB2 compression is minimal and the cons list gets a little shorter.
Indeed, one piece of advice that I give to most shops when I consult for them is that they probably need to look at compressing more data than they already are. Compressed data can improve performance these days because, in many cases, you can fit more rows per page. And therefore scans and sequential processes can process more data with the same number of I/Os, thereby improving performance. Of course, you should use the DSN1COMP utility to estimate the amount of savings that can accrue via compression before compressing any existing data.
Eventually, in DB2 9 we even get index compression capability (of course, using different technology than data compression). At any rate, compressing data on DB2 for z/OS is no longer the “only-if-I-have-to” task that it once was.
Then along comes the Big Data phenomenon where increasingly large data sets need to be stored and analyzed. Big Data is typified by data sets that are so large and complex that traditional tools and database systems are ill-suited to process them. Clearly, compressing such data could be advantageous… but is it possible to process and compress such large volumes of data?
New alternatives to traditional systems are being made available that offer efficient resource usage based on principles of compressed sensing and other techniques. One example of this new technology is IBM’s BLU Acceleration, which is included in DB2 10.5 for Linux, Unix, and Windows. One feature of BLU Acceleration is “actionable compression,” which can deliver up to 10x storage space savings. The compression delivered by BLU eliminates the need for indexes and aggregation when operating on compressed data. BLU can eliminate the CPU time that would be required to decompress the data. Benchmarks tests at some customer sites have achieved 90% to 95% data compression for their large data warehouse database tables. But why is it called “actionable?” Well, there are two key ingredients that make the compression actionable. There are (1) new algorithms enabling many predicates to be evaluated without having to decompress and (2) the most frequently occurring values are compressed the most, thereby saving the greatest level of storage space.
The bottom line is that IBM’s advanced algorithms are being used to maximize compression while preserving the order of encoding so compressed data can be quickly analyzed without decompressing it. It is an impressive technology as no changes are required to your existing SQL statements.
Of course, BLU Acceleration is much more than just additional compression techniques. BLU Acceleration adds a column store capability to DB2 10.5 for LUW. A column store physically stores data as sections of columns rather than as rows of data. By doing so, data warehouse queries, customer relationship management (CRM) systems, and other types of ad-hoc queries where aggregates are computed over large numbers of similar data items can be optimized.
Be aware that using a column store for your data, such as provided by BLU Acceleration, is predominantly for read-only types of workloads such as analytics, data warehousing, business intelligence and other reporting applications. So you will need to pick and choose the applications and data that match the capabilities of BLU.
Another new feature of BLU Acceleration comes via the exploitation of the SIMD (Single Instruction Multiple Data) capabilities of modern CPUs. The basic idea behind SIMD is the ability for a single instruction to be able to act upon multiple items at the same time, which obviously can speed up processing.
BLU Acceleration also adds data skipping technology. You can probably guess what this does, but let’s explain it a little bit anyway. The basic idea is to skip over data that is not required in order to deliver an answer set for a query. Metadata is stored for sets of data records that can be accessed by DB2 to determine whether that particular set of data holds anything of interest. If not, it can be skipped over.
And best of all, BLU Acceleration is simple to use. All that is necessary is to specify ORGANIZE BY COLUMN in the DDL for a table to make it BLU.
“So what?” you may ask…Well, in my opinion, BLU Acceleration is a very significant milestone in the history of DB2. It brings a column store capability that can be implemented right inside of DB2, without any additional product or technology. So you can implement a multi-workload database implementation for the Big Data era using nothing more than DB2 software. BLU Acceleration provides blazing speed and can act upon large amounts of analytical data. And that is something we all should consider when embarking on our Big Data projects.
So compression is just one of several new capabilities and improvements added by IBM to DB2 with BLU Acceleration. But it looks like compression is becoming cool… who’d have thought that back in the 1980s when compression was something we only did when we absolutely had to?