SpamapS.org – Full Frontal Nerdity

Clint Byrum's Personal Stuff

Time for some ghetto monitoring

If you came here between April 28 and about an hour ago, you got a “couldn’t connect to database” error. Oops! Seems my limited memory EC2 instance got a little overwhelmed by php processes and decided the db server, drizzled, should die to make more room for PHP. Ooops! Time to drop pm.max_children.

I don’t have any monitoring setup for the site, so I just now figured it out. Until I get proper monitoring, I’ve installed this fancy bit of duct-tape upstart magic:

start on stopping
task
script
env | mail -s "$JOB is stopping!" me@myemail.com
end script

What does this do? Well is emails me whenever upstart gives up respawning something, or I manually stop a service.

Its not monitoring. I need monitoring. But this is a nice little hack to prevent a regression while I figure that out.

May 2, 2011 at 4:54 pm Comments (0)

The 2011 O’Reilly Open Mysql Drizzle Maria Monty Percona Xtra Galera Xeround Tungsten Cloud Database Conference and Expo

Or, for short, the “2011 O’Reilly MySQL Users Conference & Expo”. Yes thats the short name of the conference that, thus far, has brought me nothing but good info, good times, and insight into one of the most interesting open source communities around.

MySQL has been at the core of a real revolution in the way data driven applications have exploded on the internet. Its so easy to just install it, fire up php’s mysql driver, and boom, you’re saving and retrieving data. The *use* of MySQL has always been incredibly simple.

The politics has, at times, been confusing. Dual licensing was sort of an odd concept when MySQL AB was doing it “back in the day”. Nobody really understood how it worked or how they could sell something that was also “free”. But it worked out great for them. InnoDB got bought by Oracle and a lot of people thought “oh noes MySQL will have no transactional storage, Oracle will kill it.” Well we see where thats about 180 degrees from what actually happened (R.I.P. Falcon).

So this year, with the oddness of Oracle not being the top sponsor at an event that had driven a lot of the innovation and collaboration in the MySQL world (ironically, choosing instead to spend their time and effort on a conference called “Collaborate”), I thought “wonderful, more politics”.

But as Brian Aker says in his “State of the ecosystem” post, it was quite the opposite. The absence of the commercial entity responsible for MySQL took a lot of the purely business focused discussion down to almost a whisper, while big ideas and big thinking seemed to be extremely prominent.

Drizzle had quite a few sessions, including my own about what we’ve done with Drizzle in Ubuntu. This is particularly interesting to me because Drizzle is mostly driven by a community effort, though most of the heavy lifting work up until now has been sponsored by Sun then Rackspace. Its purely an idea of how a MySQL-like database should be written, and while it may be seeing limited production use now, the discussions were on how it can be used, what it does now, not where its going or who is going to pay for its development. Its such a good idea, I’m pretty convinced users will drive it in much the same way Apache was driven by users wanting to do interesting things with HTTP.

I saw a lot of interesting ideas around replication put forth as well. Galera, Tungsten, and Xeround all seem to be trying to build on MySQL’s success with replication and NDB (a.k.a. MySQL Cluster). I really like that there are multiple takes on how to make a multi-master highly available / scalable system work. Getting all the people using and developing these things into one conference center is always pretty interesting to me.

The keynotes were especially interesting, as they were delivered by people who are sitting at the interesection of the old MySQL world, and the new MySQL “ecosystem”. I missed Monty Widenius’s keynote but it strikes me that he is still leading the charge for a simple, scalable, powerful database system, proving that the core of MySQL is mostly unchanged. Martin Mickos delivered a really interesting take on how MySQL was part of the last revolution in computing (LAMP) and how it may very well be a big part of the next revolution (IaaS, aka “the cloud”). Brian Aker reinforced that MySQL as a concept, and specifically, Drizzle, are just part of your Infrastructure (the I in IaaS).

Then on Thursday, Baron Schwartz blew the whole place up. Go, watch the video if you weren’t there, or haven’t seen it. Baron has always been  insightful in his evaluation of the MySQL ecosystem. Maatkit came around when the community needed it, and on joining Percona I think he brought his clear thinking to Petr’s bold decision making at just the right time to help fuel their rise as one of the most respected consulting firms in the “WebScale” world. So when Baron got up and said that the database is still going to scale up, that MySQL isn’t going to lose to NoSQL or SomeSQL, but rather, that the infrastructure would adapt to the data requirements, it caught my attention, and got me nodding. And when he plainly called Oracle out for not supporting the conference, there was a hush over the croud followed by a big sigh. Its likely that those in attendance were the ones who understand that, and those who weren’t there were probably the ones who need to hear it. I’d guess by now they’ve seen the video or at least heard the call. Either way, thanks Baron for your insight and powerful thoughts.

This was my second MySQL Conference, and I hope it won’t be my last. The mix of users, developers, and business professionals has always struck me as quite unique, as MySQL sits at the intersection of a number of very powerful avenues. Lets hope that O’Reilly decides to do it again, *and* lets hope that Oracle gets on board as well.

April 27, 2011 at 5:49 pm Comments (0)

HTTP JSON AlsoSQL interface to Drizzle | Stewart Smith

HTTP JSON AlsoSQL interface to Drizzle | Ramblings. – This is what I’m talking about when I say Drizzle will be for HTTP what Apache was for MySQL. Its hyper flexible and quite performant. Stewart is a quite gifted programmer, but look how easy it was to integrate a JSON library and libevent into the server on a whim.

As a sysadmin with LAMP shops, I always had to stop innovating around the MySQL part of it. Linux I could hack on, apache I could hack on, and PHP/Perl/Python were built to be hacked on. But MySQL was always difficult beyond a few clever UDF’s.

I’m waiting for somewhere to adopt Drizzle and really start running wild with the plugins. Should be interesting!

April 21, 2011 at 3:08 pm Comments (0)

Ubuntu and Drizzle — Run Drizzle on your Narwhal: OReilly MySQL Conference & Expo 2011 – OReilly Conferences, April 11 – 14, 2011, Santa Clara, CA

Ubuntu and Drizzle — Run Drizzle on your Narwhal: OReilly MySQL Conference & Expo 2011 – OReilly Conferences, April 11 – 14, 2011, Santa Clara, CA.

I gave a talk this week in Santa Clara at the MySQL Users Conference. I think it went pretty well and I got a lot of feedback from Ubuntu users about the positives of having Drizzle available in Universe.The slides are available at the link above.

April 15, 2011 at 9:36 pm Comments (0)

presenting “blog on a narwhal”

Since we’re just about to 11.04 beta2, I figured its high time I start using Ubuntu Server for my personal blog.

What? Almost a year at Canonical and my blog wasn’t on Ubuntu server? Well, for over 5 years now, a personal friend has provided me with a free Xen virtual machine to run my blog on. I migrated it off of Debian then, which was sad for me, but back then I was so focused on working I didn’t have time or resources to be picky, so I said OK.

Fast forward to now, I’ve been working on Ubuntu Server and getting ribbed by my co-workers about that “crappy CentOS xen box” they’d see me logged into.

Well thats all over now. I decided to marry all the new tech I’ve been playing with lately into one glorious blog migration.

The old blog was:

  • Xen domU
  • 500MB RAM allocated
  • 9GB storage
  • CentOS 5.5
  • Apache + mod_php (5.3.5 from IUS project)
  • MySQL 5.0.77
  • WordPress 3.0.5 manually installed single-site

The new hotness is:

  • EC2 t1.micro (its upgradable! ;)
  • 692MB RAM
  • 8GB EBS
  • nginx + php5-fpm (5.3.5 from natty)
  • Drizzle 2011.03.13 (wordpress-plugin 0.0.2)
  • WordPress 3.0.5 from natty in multisite mode

The steps to migrate weren’t altogether complicated. A bit of configuration for nginx to have it serve my PHP using php5-fpm, and copying most of wp-content over. Drizzle couldn’t have been more straight forward:

  • Install drizzle7-client from EPEL on CentOS vm
  • drizzledump blog database (drizzledump automatically converts mysql schemas to drizzle compatible ones)
  • load it into drizzle on Ubuntu server

WordPress still needs *some* help to use Drizzle. Much of this will be handled by the wordpress-drizzle package from my ppa (add-apt-repository ppa:clint-fewbar/drizzle) which filters DDL to change things like LONGTEXT to TEXT. Because Drizzle has done away with the eeeeevil of datetimes with  0000-00-00 as their date (a non-existent date), we need to change all instances of that to ’0001-01-01′. In the future I’d like to see this abstracted out of wordpress even more so that it is more aware of the datetime fields and can use actual NULL values. I believe this can be done in the plugin by overloading the insert/update methods. I’ve begun working on that, but for now I’ll just have to keep patching wp-includes/post.php , which seems to be the main user of 0000-00-00 to denote a “draft” post.

We also have to alter the wp_posts table slightly. Thats because wordpress relies on mysql’s broken “NOT NULL” producing an “empty string” in varchars. This ALTER does that:

ALTER TABLE wp_posts MODIFY COLUMN post_mime_type VARCHAR(100) COLLATE utf8_general_ci DEFAULT '';

Anyway, goodbye CentOS, hello Ubuntu!

April 13, 2011 at 7:26 am Comments (0)

Drizzle, Maverick, PPA’s, and you

So, this week, Drizzle released its beta, which is really exciting. But at the same time, I decided to ask the Ubuntu MOTU pull it out of Ubuntu 10.10 (a.k.a. maverick) entirely. The reasons, may not be entirely obvious.

  • Licensing: There is some ambiguity on the licensing of certain non-critical source code in Drizzle that we weren’t certain Debian archive admins would accept. Since we like to follow Debian as closely as possible in Ubuntu, the MOTU sponsor we had was requesting that we upload into Debian first. Upon review of earlier packages, the debian archive maintainers pointed out some ambiguities in the copyright documentation, and it turns out, there are some ambiguities still in the source code. These things take time to sort out, though I’m confident we’ve figured most, if not all, of them out.
  • Beta status: Drizzle released their first beta just yesterday. This is great, and would be a good release to have in Maverick, but its going to change *a lot* before the “elliot” milestone is released in early 2011. Monty Taylor assures me that they’re going to be ready to release before feature freeze of Natty (11.04). Until then, they’re going to be fixing bugs in the betas and releasing those fixes. In the face of that, its probably better to point people at a PPA that will have the latest bug fix release in it, and tools included to help debug/fix the release as well.
  • Quality: Even though its clear that drizzle is beta to those following drizzle closely, it may not be entirely clear to everyone. Beta versions make it into Ubuntu all the time, but being a database engine, I’m hesitant to have the casual user try it out. In 6 months, Drizzle will be at a stable release stage, and all users should be feeling pretty good about running on it. That seems like the right time to put it into Debian and Ubuntu.

So, what should you do if you want to run Drizzle on Maverick?

There are two package archives maintained by the drizzle developers just for ubuntu.

The PPA for drizzle development – This should have the latest stable release, and all of the build-depends to rebuild it.
The Drizzle Trunk PPA – This should have the latest daily build of drizzle from the source code repository, which may have fixes made since the last stable release.

Those links include instructions for adding the PPA’s to your system, after that, just


apt-get install drizzle-server drizzle-client

And have fun!

Also, we’ll be discussing drizzle sometime at UDS-N in Orlando. So make sure to check the schedule out and join us (remotely or on site) if you want to chime in or hear what we’re going to do with the Narwhal and Drizzle.


September 30, 2010 at 7:12 pm Comments (0)

Drizzle7 Beta Released! now with MySQL migration! « LinuxJedis /dev/null

Drizzle7 Beta Released! now with MySQL migration! « LinuxJedis /dev/null.

Drizzle is a project that is near and dear to my heart.

To sum it up, Drizzle took all that was really good in MySQL, cut out all that was mediocre, and replaced some of it with really good stuff. The end product is, I think, something that is leaner, should be more stable, and definitely more flexible.

So go check out the beta! I guess I should use Andrew’s migration tool and see if I can migrate this blog to drizzle. :)


September 29, 2010 at 10:39 pm Comments (0)

PBMS in Drizzle | Ramblings

PBMS in Drizzle | Ramblings.

For those not familiar with PBMS it does two things: provide a place (not in the table) for BLOBs to be stored (locally on disk or even out to S3) and provide a HTTP interface to get and store BLOBs.

This means you can do really neat things such as have your BLOBs replicated, consistent and all those nice databasey things as well as easily access them in a scalable way (everybody knows how to cache HTTP).

This is awesome. How many times have you added a URL to your database table and then had to write API’s of some sort to go fetch that URL at read time, and write that URL somewhat atomically at write time?

Drizzle isn’t even “done” yet, and already the plugins are flying out of the community. The fact that this is a plugin, and won’t affect *anybody* who doesn’t want it, is why I’m confident Drizzle is moving in the right directly. I’m not sure why it has taken so long, but this feels like its doing for the RDBMS what apache has done for HTTP serving… make it flexible and extensible, and folks will find interesting ways to use it.


July 8, 2010 at 2:36 pm Comments (0)

TokyoTyrant – MemcacheDB, but without the BDB?

This past April I was riding in a late model, 2 door rental car with an interesting trio for sure. On my right sat Patrick Galbraith, maintainer of DBD::mysql and author of the Federated storage engine. Directly in front of me manning the steering wheel (for those of you keen on spatial description, you may have noted at this point that its most likely I was seated in the back, left seat of a car which is designed to be driven on the right side of the road. EOUF [end of useless fact]), David Axmark, co-founder of MySQL. Immediately to his right sat Brian Aker, of (most recently) Drizzle fame.

This was one of those conversations that I felt grossly unprepared for. It was the 2009 MySQL User’s conference, and Patrick and I had been hacking on DBD::drizzle for most of the day. We had it 98% of the way there and were in need of food, so we were joining the Drizzle dev team for gourmet pizza.

As we navigated from the Santa Clara conference center to Mountain View’s quaint downtown, Brian, Patrick, and I were discussing memcached stuff. I mentioned my idea, and subsequent implementation of the Mogile+Memcached method for storing data more reliably in memcached. I knew in my head why we had chosen to read from all of the replica servers, not just the first one that worked, but I forgot (The reason, btw, is that if one of the servers had missed a write for some reason, you might get out-of-date data). I guess I was a little overwhelmed by Brian’s mountain of experience w/ memcached.

Anyway, the next thing I mentioned was that we had also tried MemcacheDB with some success. Brian wasn’t exactly impressed with MemcacheDB, and immediately suggested that we should be using Tokyo Tyrant instead. I had heard of Tokyo Cabinet, the new hotness in local key/value storage and retrieval, but what is this Tyrant you speak of?

I’ve been playing with Tokyo Tyrant ever since, and advocating for its usage at Adicio. Its pretty impressive. In addition to speaking memcached protocol, it apparently speaks HTTP/WEBDAV too. The ability to select hash, btree, and a host of other options is nice, though I’m sure some of these are available as obscure options to berkeleydb as well.

Anyway, I was curious what performance was like, so I did some tests on my little Xen instance, and came up with pretty graphs.

tokyotyrantvsmemcachedb1

I used the excellent Brutis tool to run these benchmarks using the most interesting platform for me at the moment.. which would be, php with the pecl Memcache module.

These numbers were specifically focused on usage that is typical to MemcacheDB. A wide range of keys (in this case, 10000 is “wide” since the testing system is very small), not-small items (2k or so), and lower write:read ratio (1:50). I had the tests restart each daemon after each run, and these numbers are the results of the average of 3 runs each test.

I also tried these from another xen instance on the same LAN, and things got a lot slower. Not really sure why as latency is in the sub-millisecond range.. but maybe Xen’s networking just isn’t very fast. Either way, the numbers for each combination didn’t change much.

What I find interesting is that memachedb in no-sync mode actually went faster than memached. Of course, in nosync mode, memcachedb is just throwing data at the disk. It doesn’t have to maintain LRU or slabs or anything.

Tokyo Tyrant was very consistent, and used *very* little RAM in all instances. I do recall reading that it compresses data. Maybe thats a default? Anyway, tokyo tyrant also was the most CPU hungry of the bunch, so I have to assume having more cores might have resulted in much better results.

I’d like to get together a set of 3 or 4 machines to test multiple client threads, and replication as well. Will post that as part 2 when I pull it together. For now, it looks like.

In case anybody wants to repeat these tests, I’ve included the results, and the scripts used to generate them in this tarball.

– Additional info, 6/4/2009
Another graph that some might find interesting, is this one detailing CPU usage. During all the tests, brutis used about 60% of the CPU available on the machine, so 40% is really 100%:

tokyotyranttests_cpu

This tells me that the CPU was the limiting factor for Tokyo Tyrant, and with a multi-core machine, we should see huge speed improvements. Stay tuned for those tests!


June 4, 2009 at 6:40 am Comments (0)

Parallel mysql replication?

Its always been a dream of mine. I’ve posted about parallel replication on Drizzle’s mailing list before. I think when faced with the problem of a big, highly concurrent master, and scaling out reads simply with lower cost slaves, this is going to be the only way to go.

So today I was really glad to see that somebody is trying out the idea. Seppo Jaakola from “Codership”, who I’ve never heard of before today, posted a link to an article on his blog about his experimentation with parallel replication slaves. The findings are pretty interesting.

I hope that he’ll be able to repeat his tests with a real world setup. The software they’ve written seems to have the right idea. The biggest issue I have with the tests is that the tests were run on tiny hardware. Hyperthreading? Single disks? Thats not really the point of having parallel replication slaves.

The idea is that you have maybe a gigantic real time write server for OLTP. This beast may have lots of medium-power CPU cores, and an obscene amount of RAM, and a lot of battery backed write cache for writes.

Now you know that there are tons of reads that shouldn’t ever be done against this server. You drop a few replication slaves in, and you realize that you need a box with as much disk storage as your central server, and probably just as much write cache. Pretty soon scaling out those reads is just not very cost effective.

However, if you could have lots of CPU cores, and lots of cheap disks, you could dispatch these writes to be done in parallel, and you wouldn’t need expensive disk systems or lots of RAM for each slave.

So, the idea is not to make slaves faster in a 1:1 size comparison. Its to make it easier for a cheap slave to keep up with a very busy, very expensive master.

I do see where another huge limiting factor is making sure things synchronize in commit order. I think thats an area where a lot of time needs to be spent on optimization. The order should already be known so that the commiter thread is just waiting for the next one in line, and if the next 100 are already done it can just rip through them quickly, not signal them that they can go. Something like this seems right:


id=first_commit_id();
while(wait_for_commit(id)) {
commit(id);
id++;
}

I applaud the efforts of Codeship, and I hope they’ll continue the project and maybe ship something that will rock all our worlds.


June 2, 2009 at 7:08 pm Comments (0)