SpamapS.org – Full Frontal Nerdity

Clint Byrum's Personal Stuff

PBMS in Drizzle | Ramblings

PBMS in Drizzle | Ramblings.

For those not familiar with PBMS it does two things: provide a place (not in the table) for BLOBs to be stored (locally on disk or even out to S3) and provide a HTTP interface to get and store BLOBs.

This means you can do really neat things such as have your BLOBs replicated, consistent and all those nice databasey things as well as easily access them in a scalable way (everybody knows how to cache HTTP).

This is awesome. How many times have you added a URL to your database table and then had to write API’s of some sort to go fetch that URL at read time, and write that URL somewhat atomically at write time?

Drizzle isn’t even “done” yet, and already the plugins are flying out of the community. The fact that this is a plugin, and won’t affect *anybody* who doesn’t want it, is why I’m confident Drizzle is moving in the right directly. I’m not sure why it has taken so long, but this feels like its doing for the RDBMS what apache has done for HTTP serving… make it flexible and extensible, and folks will find interesting ways to use it.


July 8, 2010 at 2:36 pm Comments (0)

TokyoTyrant – MemcacheDB, but without the BDB?

This past April I was riding in a late model, 2 door rental car with an interesting trio for sure. On my right sat Patrick Galbraith, maintainer of DBD::mysql and author of the Federated storage engine. Directly in front of me manning the steering wheel (for those of you keen on spatial description, you may have noted at this point that its most likely I was seated in the back, left seat of a car which is designed to be driven on the right side of the road. EOUF [end of useless fact]), David Axmark, co-founder of MySQL. Immediately to his right sat Brian Aker, of (most recently) Drizzle fame.

This was one of those conversations that I felt grossly unprepared for. It was the 2009 MySQL User’s conference, and Patrick and I had been hacking on DBD::drizzle for most of the day. We had it 98% of the way there and were in need of food, so we were joining the Drizzle dev team for gourmet pizza.

As we navigated from the Santa Clara conference center to Mountain View’s quaint downtown, Brian, Patrick, and I were discussing memcached stuff. I mentioned my idea, and subsequent implementation of the Mogile+Memcached method for storing data more reliably in memcached. I knew in my head why we had chosen to read from all of the replica servers, not just the first one that worked, but I forgot (The reason, btw, is that if one of the servers had missed a write for some reason, you might get out-of-date data). I guess I was a little overwhelmed by Brian’s mountain of experience w/ memcached.

Anyway, the next thing I mentioned was that we had also tried MemcacheDB with some success. Brian wasn’t exactly impressed with MemcacheDB, and immediately suggested that we should be using Tokyo Tyrant instead. I had heard of Tokyo Cabinet, the new hotness in local key/value storage and retrieval, but what is this Tyrant you speak of?

I’ve been playing with Tokyo Tyrant ever since, and advocating for its usage at Adicio. Its pretty impressive. In addition to speaking memcached protocol, it apparently speaks HTTP/WEBDAV too. The ability to select hash, btree, and a host of other options is nice, though I’m sure some of these are available as obscure options to berkeleydb as well.

Anyway, I was curious what performance was like, so I did some tests on my little Xen instance, and came up with pretty graphs.

tokyotyrantvsmemcachedb1

I used the excellent Brutis tool to run these benchmarks using the most interesting platform for me at the moment.. which would be, php with the pecl Memcache module.

These numbers were specifically focused on usage that is typical to MemcacheDB. A wide range of keys (in this case, 10000 is “wide” since the testing system is very small), not-small items (2k or so), and lower write:read ratio (1:50). I had the tests restart each daemon after each run, and these numbers are the results of the average of 3 runs each test.

I also tried these from another xen instance on the same LAN, and things got a lot slower. Not really sure why as latency is in the sub-millisecond range.. but maybe Xen’s networking just isn’t very fast. Either way, the numbers for each combination didn’t change much.

What I find interesting is that memachedb in no-sync mode actually went faster than memached. Of course, in nosync mode, memcachedb is just throwing data at the disk. It doesn’t have to maintain LRU or slabs or anything.

Tokyo Tyrant was very consistent, and used *very* little RAM in all instances. I do recall reading that it compresses data. Maybe thats a default? Anyway, tokyo tyrant also was the most CPU hungry of the bunch, so I have to assume having more cores might have resulted in much better results.

I’d like to get together a set of 3 or 4 machines to test multiple client threads, and replication as well. Will post that as part 2 when I pull it together. For now, it looks like.

In case anybody wants to repeat these tests, I’ve included the results, and the scripts used to generate them in this tarball.

– Additional info, 6/4/2009
Another graph that some might find interesting, is this one detailing CPU usage. During all the tests, brutis used about 60% of the CPU available on the machine, so 40% is really 100%:

tokyotyranttests_cpu

This tells me that the CPU was the limiting factor for Tokyo Tyrant, and with a multi-core machine, we should see huge speed improvements. Stay tuned for those tests!


June 4, 2009 at 6:40 am Comments (0)

Parallel mysql replication?

Its always been a dream of mine. I’ve posted about parallel replication on Drizzle’s mailing list before. I think when faced with the problem of a big, highly concurrent master, and scaling out reads simply with lower cost slaves, this is going to be the only way to go.

So today I was really glad to see that somebody is trying out the idea. Seppo Jaakola from “Codership”, who I’ve never heard of before today, posted a link to an article on his blog about his experimentation with parallel replication slaves. The findings are pretty interesting.

I hope that he’ll be able to repeat his tests with a real world setup. The software they’ve written seems to have the right idea. The biggest issue I have with the tests is that the tests were run on tiny hardware. Hyperthreading? Single disks? Thats not really the point of having parallel replication slaves.

The idea is that you have maybe a gigantic real time write server for OLTP. This beast may have lots of medium-power CPU cores, and an obscene amount of RAM, and a lot of battery backed write cache for writes.

Now you know that there are tons of reads that shouldn’t ever be done against this server. You drop a few replication slaves in, and you realize that you need a box with as much disk storage as your central server, and probably just as much write cache. Pretty soon scaling out those reads is just not very cost effective.

However, if you could have lots of CPU cores, and lots of cheap disks, you could dispatch these writes to be done in parallel, and you wouldn’t need expensive disk systems or lots of RAM for each slave.

So, the idea is not to make slaves faster in a 1:1 size comparison. Its to make it easier for a cheap slave to keep up with a very busy, very expensive master.

I do see where another huge limiting factor is making sure things synchronize in commit order. I think thats an area where a lot of time needs to be spent on optimization. The order should already be known so that the commiter thread is just waiting for the next one in line, and if the next 100 are already done it can just rip through them quickly, not signal them that they can go. Something like this seems right:


id=first_commit_id();
while(wait_for_commit(id)) {
commit(id);
id++;
}

I applaud the efforts of Codeship, and I hope they’ll continue the project and maybe ship something that will rock all our worlds.


June 2, 2009 at 7:08 pm Comments (0)