Continued from Columnar Databases I ...
So if a situation allows us to live with the limitations of the columnar databases, how good are columnar databases. To find out for myself I set up an experiment to compare a popular row based database (Oracle 11g) with compression turned on with a columnar database (Infobright) that relies on an open source database engine (MySQL). I also set the experiment to mainly explore the compression in storage, rather than any query performance as I did not have resource to set up that elaborate an experiment.
For OLTP data structures(TCP-H(tm) Bench Mark) the Oracle compressed row data store used about 10 GB storage including some indexes, which are required for such databases. Infobright database on the other hand showed a 1.9GB. This is about 1/5th the storage required. This is a significant saving, when there are not a lot of indexes in row store, and if more indexes were added for performance reasons it would have shown even better comparison on storage requirements.
For Star Schema Bench Mark database, the data extracts were of the range of 6.7 GB of raw ASCII data, when pulled from Infobright (it by default provides quoted strings, etc) vs. about 6 GB of raw ASCII data when pulled from Oracle tables, using custom pipe delimiter. When loaded into Infobright it compressed the data into a size of about 800MB, again using no indexes. When loaded into Oracle database with the same compression scheme as before the data used about 6.5GB. From these observations, we conclude that while Oracle provides compression, considering that we had 3 large indexes on the lineorder, and smaller indexes on smaller tables as well. However the columnar database (Inforbright) provided an order of magnitude compression compared to the raw text data and row store's (Oracle) equivalent database with basic compression. Due to lack of appropriate storage (exadata machine) I could not test the more aggressive compression scheme available from Oracle row store database.
Query timings were better in case of Infobright database where the large fact table extraction to flat file took about 12 minutes and in case of Oracle the same took about 42 minutes. Thus highlighting the benefits of smaller storage, at the least, as the queries did not use any index for Oracle either as these queries gets all the data from the tables in the join.
The star schema shows a higher compression ratio for the columnar database, even though, it uses mainly numeric type data types in the large fact table.
Abadi, D., Boncz, P., Harizopoulos, S. "Column-oriented Database Systems" in VLDB ’09, August 24-28, 009, Lyon, France.
Abadi, D.J., Madden, S.R., and Ferreira, M. "Integrating compression and execution in column-oriented database systems" In Proc. SIGMOD, 2006.
Hodak, W., Jernigan, Kevin, "Advanced Compression with Oracle Database 11g Release 2" An Oracle White Paper from Oracle corporation, September 2009
Oracle, "Oracle 11g SQL Reference Guide" from otn.oracle.com
Oracle, "Oracle 11g Utilities Guide" from otn.oracle.com
Inforbright.org, "Direct data load guide" available from inforbright.org
About Sarbjit Parmar
A practitioner with technical and business knowledge in areas of Data Management( Online transaction processing, data modeling(relational, hierarchical, dimensional, etc.), S/M/L/XL/XXL & XML data, application design, batch processing, analytics(reporting + some statistical analysis), MBA+DBA), Project Management / Product/Software Development Life Cycle Management.