SpamapS.org – Full Frontal Nerdity

Clint Byrum's Personal Stuff

Seth’s Blog: Validation is overrated

Seth GodinSeth’s Blog: Validation is overrated.

“If you’re waiting for a boss or an editor or a college to tell you that you do good work, you’re handing over too much power to someone who doesn’t care nearly as much as you do.”

Just a bit of reminder that while feedback is great, getting it done is way better.


June 28, 2010 at 3:29 pm Comments (0)

Fast Moving Software Ignite talk at DevOps Day US 2010

Here is the PDF version of the Ignite format talk I gave at DevOps Day US 2010. Hopefully they’ll have the video of the ignite talks up soon.


June 26, 2010 at 6:31 am Comments (0)

Where did those numbers come from?


Did you ever hear a claim that sounded too bad to be true?
So this past Tuesday at Velocity 2010, Brett Piatt gave a workshop on the Cassandra database. I was seated in the audience and quite interested in everything he had to say about running Cassandra, given that I’ve been working on adding Cassandra and other scalable data stores to Ubuntu.

Then at one point, up popped a table that made me curious.

It looked a lot like this:

With a 50GB table

MySQL Cassandra
writes 300ms 0.19ms
reads 250ms 1.6ms

Actually it looked exactly like that, because it was copied from this page that is, as of this point, only available in its original form in google cache.

The page linked has *no* explanation of this table. Its basically just “OH DAAAAMN MySQL you got pwned”. But seriously, WTF?

I asked Brett where those numbers came from, and whether we could run the tests ourselves to compare our write performance to Cassandra’s I don’t mean to say “our write performance” as in MySQL’s, as this statement implies, but rather ours to the write performance of the Cassandra team’s. Brett claimed ignorance and just referred to the URL of the architecture page.

Ok fair enough. I figured I should investigate more ,so I asked on #cassandra on freenode. People pointed me to various other slide decks with the same table in them, but none with any explanation.

At some point, somebody rightfully recognized that having these numbers with no plausible explanation is ridiculous, and removed them from the site. Another person did in fact rightfully recognize why this may be the case.

Basically with a 50G table, assuming small records, you will have *a giant* B-Tree for the primary key of that table (assuming you have one) will take 30+ disk seeks to update. That means that at 10ms (meaning, HORRIBLE) per seek, we’ll take 300ms to write. This is contrasted to Cassandra which can just append, requiring at most one seek.

So anyway, Cassandra team, thanks for the explanation, and kudos for righting this problem. Unfortunately the misinformation tends to be viral, so I’m sure there are people out there who will forever believe that MySQL takes 300ms to update a 50G table.


June 26, 2010 at 6:22 am Comments (0)

Personal schedule for Clint Byrum: Velocity 2010, Web Performance & Operations Conference – O’Reilly Conferences, June 22 – 24, 2010, Santa Clara, CA

Attention Stalkers: You’ll need to forge a badge to follow me around in these sessions, as I believe the conference is sold out. That is, unless you already registered.

Personal schedule for Clint Byrum: Velocity 2010, Web Performance & Operations Conference – O’Reilly Conferences, June 22 – 24, 2010, Santa Clara, CA.

ooops.. fixed the link to actually work if you’re not logged in to oreilly.com as ME


June 17, 2010 at 5:29 am Comments (0)

Ubuntu Server BoF at Velocity 2010 « Ubuntu Server Blog

I’ll be moderating this. Come by and we can rap about Ubuntu Server!

Ubuntu Server BoF at Velocity 2010 « Ubuntu Server Blog.


June 17, 2010 at 5:22 am Comments (0)

Red Dead Redemption – 360

Source: cnj.craigslist.org
    asks “any takers?”    
June 12, 2010 at 11:24 pm Comments (0)

Embedding libraries makes packagers sad pandas!

STOP THE INSANITY!

So, in my role at Canonical, I’ve been asked to package some of the hotter “web 2.0″ and “cloud” server technologies for Ubuntu’s next release, 10.10, aka “Maverick Meerkat”.

While working on this, I’ve discovered something very frustrating from a packaging point of view thats been going on with fast moving open source projects. It would seem that rather than produce stable API’s that do not change, there is a preference to dump feature after feature into libraries and software products, and completely defenestrate API stability (or as benblack from #cassandra on freenode calls it, API stasis).

So who cares, right? People who choose to use these pieces of software are digging their own grave. Right? But thats not really the whole story. We have a bit of a new world order when it comes to “Web 2.0″. There’s this new concept of “devops“. People may be on to something there. As “devops” professionals, we need to pick things that scale and ship *on time*. That may mean shunning the traditional methods of loose coupling defined by rigid API’s. Maybe instead, we should just build a new moderately loose coupling for each project. As crazy as it sounds, its working out for a lot of people. Though it may be leaving some ticking time bombs for security long term.

To provide a concrete example, I’ll pick on the CPAN module Memcached::libmemcached. This is basically just a perl wrapper around the C library libmemcached. It seeks to provide perl programmers with access to all of the fantastic features that libmemcached has to offer. The only trouble is, it only supports everything that libmemcached had to offer in v0.31 of libmemcached.

Now, that is normal for a wrapper. Wrappers are going to lag their wrapped components newer features. But, they generally can take advantage of any behind the scenes improvement in a library, right? Just upgrade the shared library, and the systems dynamic linker will find it right? Well thats going to be tough, because there have been quite a few incompatible changes to the library since 0.31 was published last summer. And the API itself has grown massively, changing calls that are fundamental to its operation, if only slightly.

So, rather than be slave to somebody upgrading the shared library and recompiling Memcached::libmemcached, the maintainers simply embedded version 0.31 of libmemcached in their software distribution. The build process just statically links their wrapper against it. Problem solved, right?

However, from a larger scale software distributor’s standpoint, this presents us with a big problem. We have a lot of things depending on libmemcached, and we really only want to ship maybe 1 or 2 versions (an old compatible version and the latest one) of this library. What happens when a software vulnerability is found and published affecting “every version of libmemcached prior to 0.41″. Now we have to patch not only our v0.40 that we shipped, but also v0.31 inside Memcached::libmemcached. Even worse, what if we also had some ruby code we wanted to ship? The Ruby wrapper for libmemcached has v0.37 embedded. So now we have to patch and test three versions of this library. This gets ugly quickly.

From Memcached::libmemcached’s point of view, they want something known to be working *for the project today*. The original authors have moved on to other things now that they’ve shipped the product, and the current maintainers are somewhat annoyed by the incompatibilities between 0.31 and 0.40, and don’t have much impetus to advance the library. Even if they did, the perl code depending on it must be updated since the API changed so heavily.

Now, I am somewhat zen about “problems” and I like to stay solution focused, and present them as opportunities (HIPPIE ALERT!). So, what opportunity does this challenge present us, the packagers, the integrators, the distributors, with?

I think we have an opportunity to make things better for people using packaging, and people not using packaging. Rather than fighting the trends we see in the market, maybe we should embrace it. Right now, a debian format package (which Ubuntu uses) control file defines a field called “Depends”. This field tells the system “Before you install this package, you must install these other packages to make it work”. This is how we can get working, complex software to install very easily with a command like ‘apt-get install foo’.

However, it gets more difficult to maintain when we start depending on specific versions. Suddenly we can’t upgrade “superfoo” because it relies on libbar v0.1, but libbar v0.2 is out and most other things depend on that version.

What if, however, we added a field. “Embeds: libbar=0.1″. This would tell us that this package includes its own private version of libbar v0.1. When the maintenance problem occurs for libbar “versions prior to v0.3″, we can simply publish a test case that tests for the bad behavior in an automated fashion. If we see this bad behavior, we can then evaluate whether it is worth patching. Any gross incompatibilities with the test case will have to be solved manually, but even with the most aggressive API breaking behavior, that can probably be reduced to 2 – 3 customizations.

“But we’re already too busy patching software!”. This is a valid point that I don’t have a good answer for. However, the current pain we suffer in trying to package things from source is probably eating up a lot more time than backporting tests and fixes would. If we strive for test coverage and embedded software tracking, at least we can know where we’re *really* vulnerable, and not just assume. Meanwhile, we can know exactly which version of libraries are embedded in each software product, and so we can reliably notify users that they are vulnerable, allowing them to accept risks if they so desire.

Of course, this requires a tool that can find embedded software, and track it in these packages. I don’t think such a thing exists, but it would not be hard to write. If a dir contains 99% of the files from a given software package, then it can be suggested that it embeds said software package. If we can work that down to checking a list of 1000 md5sums against another list of 1000 md5sums, we should be able to suggest that we think we know what the software embeds, and sometimes even provide 100% certainty.

I look forward to fleshing out this idea in the coming months, as I see this happening more and more as we lower the barriers between developers and operations. Cloud computing has made it easy for a developer to simply say “give me 3 servers like that” and quickly deploy applications. I can see a lot of them deploying a lot more embedded software, and not really understanding what risk their taking. As Mathias Gug told me recently.. this should help us spend more time being “fire inspectors” and less time being “fire fighters”.


June 12, 2010 at 6:19 am Comments (0)

Soldiers surprising their loved ones. [VIDEO]

Source: wimp.com
June 10, 2010 at 7:25 pm Comments (0)

Soldiers surprising their loved ones. [VIDEO]

Source: wimp.com
June 10, 2010 at 7:25 pm Comments (0)

Life Lessons with Jon Lajoie

Source: www.youtube.com
    lol    
June 10, 2010 at 3:32 am Comments (0)

« Older Posts