Sorting a Terabyte in 197 seconds I just returned from The 21st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), held in Calgary, where I gave a talk about my entry to the sorting contest. I sorted 1TB in 197s on a 400-node machine at MIT Lincoln Laboratory, a record which still stands today. [...]
In this post I’m going to talk about how TokuDB’s implementation of auto increment works, and contrast it to the behavior of MyISAM and InnoDB. We feel that the TokuDB behavior is easier to understand, more standard-compliant and offers higher performance (especially when implemented with Fractal Tree indexes). In TokuDB, each table can have an [...]
Summary: An alternate approach, offered in response to our original post, provides excellent improvements for smaller databases, but clustered indexes offer better performance as database size increases. (This posting is by Dave.) Jay Pipes suggested an alternate approach to improving MySQL performance of Query 17 on a TPC-H-like database. Add the index (l_partkey, l_quantity) to [...]
Executive Summary: A query like TPC-H Query 17 can be sped up by large factors by using straight_joins and clustering indexes. (This entry posted by Dave.) In a previous post, we wrote about queries like TPC-H query 2, and the use of straight_join to improve performance. This week, we consider Query 17, described by the [...]
In this post we’ll describe a query that accrued significant performance advantages from using a relatively long index key. (This posting is by Zardosht and Bradley.) We ran across this query recently when interacting with a customer (who gave us permission to post this sanitized version of the story): SELECT name, Count(e2) AS CountOfe2 FROM [...]
The TokuDB storage engine for MySQL employs Fractal Tree technology. We’ve been planning to write a white paper explaining how fractal tree indexing works, but haven’t gotten to it yet. In the mean time, here are links to some academic papers that relate to our technology. Cache-Oblivious B-Trees by Michael A. Bender, Erik D. Demaine [...]
The talk I gave at the Percona Performance Conference at the MySQL Users Conference in April 2009 can be found at http://tokutek.com/presentations/kuszmaul-mysqluc-percona-09-slides.pdf. This talk provides some examples where covering indexes help, and then describes a performance model that can be used to understand and predict query performance. It covers clustering indexes (which are a kind [...]
Every time I visit the Sun Santa Clara Campus, I’m reminded of Mel Brooks’s movie “High Anxiety”. The campus was known as The Great Asylum for the Insane in the 19th century, and even includes a tower. High Anxiety, whenever you’re near. High Anxiety, it’s you that I fear. I went to the MySQL Storage [...]
Posted by Bradley C. Kuszmaul and David Wells Executive Summary: A MySQL straight join can speed up a query that is very similar to TPC-H Q2 by a factor of 159 on MySQL. Recently, we began looking at TPC-H performance on MySQL. Our early tests yielded unexpectedly poor performance for MyISAM, InnoDB and the Tokutek [...]
We modified the iiBench benchmark to perform deletions as well as insertions, and compared InnoDB to Tokutek’s Fractal TreeTM storage engine, both running on MySQL 5.1. I’ll post the revised iiBench tarball soon. Here is what the performance looks like: The iiBench-with-deletions benchmark works as follows. The benchmark employs a fact table with an autoincremented [...]
