Sorting a Terabyte in 197 seconds

Published on 17 August 2009 by bradley in TokuView

Sorting a Terabyte in 197 seconds I just returned from The 21st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), held in Calgary, where I gave a talk about my entry to the sorting contest. I sorted 1TB in 197s on a 400-node machine at MIT Lincoln Laboratory, a record which still stands today. [...]

0 comments | Continue Reading

Autoincrement Semantics

Published on 29 July 2009 by bradley in TokuView

In this post I’m going to talk about how TokuDB’s implementation of auto increment works, and contrast it to the behavior of MyISAM and InnoDB. We feel that the TokuDB behavior is easier to understand, more standard-compliant and offers higher performance (especially when implemented with Fractal Tree indexes). In TokuDB, each table can have an [...]

0 comments | Continue Reading

Summary: An alternate approach, offered in response to our original post, provides excellent improvements for smaller databases, but clustered indexes offer better performance as database size increases. (This posting is by Dave.) Jay Pipes suggested an alternate approach to improving MySQL performance of Query 17 on a TPC-H-like database. Add the index (l_partkey, l_quantity) to [...]

3 comments | Continue Reading

Improving TPC-H-like queries – Q17

Published on 15 June 2009 by bradley in TokuView

Executive Summary: A query like TPC-H Query 17 can be sped up by large factors by using straight_joins and clustering indexes. (This entry posted by Dave.) In a previous post, we wrote about queries like TPC-H query 2, and the use of straight_join to improve performance. This week, we consider Query 17, described by the [...]

8 comments | Continue Reading

Long Index Keys

Published on 01 June 2009 by bradley in TokuView

In this post we’ll describe a query that accrued significant performance advantages from using a relatively long index key. (This posting is by Zardosht and Bradley.) We ran across this query recently when interacting with a customer (who gave us permission to post this sanitized version of the story): SELECT name, Count(e2) AS CountOfe2 FROM [...]

12 comments | Continue Reading

The TokuDB storage engine for MySQL employs Fractal Tree technology. We’ve been planning to write a white paper explaining how fractal tree indexing works, but haven’t gotten to it yet. In the mean time, here are links to some academic papers that relate to our technology. Cache-Oblivious B-Trees by Michael A. Bender, Erik D. Demaine [...]

5 comments | Continue Reading

The talk I gave at the Percona Performance Conference at the MySQL Users Conference in April 2009 can be found at http://tokutek.com/presentations/kuszmaul-mysqluc-percona-09-slides.pdf. This talk provides some examples where covering indexes help, and then describes a performance model that can be used to understand and predict query performance. It covers clustering indexes (which are a kind [...]

0 comments | Continue Reading

High Anxiety Whenever You’re Near

Published on 26 April 2009 by bradley in TokuView

Every time I visit the Sun Santa Clara Campus, I’m reminded of Mel Brooks’s movie “High Anxiety”. The campus was known as The Great Asylum for the Insane in the 19th century, and even includes a tower. High Anxiety, whenever you’re near. High Anxiety, it’s you that I fear. I went to the MySQL Storage [...]

3 comments | Continue Reading

Improving TPC-H-like Queries – Q2

Published on 10 April 2009 by bradley in TokuView

Posted by Bradley C. Kuszmaul and David Wells Executive Summary: A MySQL straight join can speed up a query that is very similar to TPC-H Q2 by a factor of 159 on MySQL. Recently, we began looking at TPC-H performance on MySQL. Our early tests yielded unexpectedly poor performance for MyISAM, InnoDB and the Tokutek [...]

0 comments | Continue Reading

iiBench with deletes

Published on 29 January 2009 by bradley in TokuView

We modified the iiBench benchmark to perform deletions as well as insertions, and compared InnoDB to Tokutek’s Fractal TreeTM storage engine, both running on MySQL 5.1. I’ll post the revised iiBench tarball soon. Here is what the performance looks like: The iiBench-with-deletions benchmark works as follows. The benchmark employs a fact table with an autoincremented [...]

2 comments | Continue Reading