I saw Mark Callaghan’s post, and his graph showing miss rate as a function of cache size for InnoDB running MySQL. He plots miss rate against cache size and compares it to two simple models:
A linear model where the miss rate is (1-C/D)/50, and
A inverse-proportional model where the miss rate is D/(1000C).
He seemed happy [...]
We’re supporting the OpenSQL Camp, which will be held in Portland on November 14.
One of my objectives for the camp is to make progress on a universal storage engine API, to make it possible to use the same storage engines in MySQL, PostgreSQL, Ingres, or any other database. I’m also looking forward [...]
Sorting a Terabyte in 197 seconds
I just returned from The 21st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), held in Calgary, where I gave a talk about my entry to the sorting contest. I sorted 1TB in 197s on a 400-node machine at MIT Lincoln Laboratory, a record which still stands today. [...]
In this post I’m going to talk about how TokuDB’s implementation of auto increment works, and contrast it to the behavior of MyISAM and InnoDB. We feel that the TokuDB behavior is easier to understand, more standard-compliant and offers higher performance (especially when implemented with Fractal Tree indexes).
In TokuDB, each table can have an [...]
Summary: An alternate approach, offered in response to our original post, provides excellent improvements for smaller databases, but clustered indexes offer better performance as database size increases. (This posting is by Dave.)
Jay Pipes suggested an alternate approach to improving MySQL performance of Query 17 on a TPC-H-like database.
Add the index (l_partkey, l_quantity) [...]
Executive Summary: A query like TPC-H Query 17 can be sped up by large factors by using straight_joins and clustering indexes. (This entry posted by Dave.)
In a previous post, we wrote about queries like TPC-H query 2, and the use of straight_join to improve performance.
This week, we consider Query 17, described by the TPC-H [...]
In this post we’ll describe a query that accrued significant performance advantages from using a relatively long index key. (This posting is by Zardosht and Bradley.)
We ran across this query recently when interacting with a customer (who gave us permission to post this sanitized version of the story):
SELECT name,
[...]
Yesterday, I (Zardosht) posted an entry introducing clustering indexes. Here, I elaborate on three differences between a clustering index and a covering index:
Clustering indexes can create indexes that would otherwise bounce up against the limits on the maximum length and maximum number of columns in a MySQL index.
Clustering indexes simplify syntax making them easier [...]
In this posting I’ll describe TokuDB’s multiple clustering index feature. (This posting is by Zardosht.)
In general (not just for TokuDB) a clustered index or a clustering index is an index that stores the all of the data for the rows. Quoting the MySQL 5.1 reference manual:
Accessing a row through the clustered index [...]
