SpamapS.org – Full Frontal Nerdity

Clint Byrum's Personal Stuff

Precise is coming

Almost 2 years ago, I stepped out of my comfort zone at a “SaaS” web company and joined the Canonical Server Team to work on Ubuntu Server development full time.

I didn’t really grasp what I had walked into, joining the team right after an LTS release. The 10.04 release was a monumental effort that spanned the previous 2 years. Call me a nerd if you want, but I get excited about a Free, unified desktop and server OS built entirely in the open, out of open source components, fully supported for 5 years on the server.

Winter, and the Precise Pangolin, are coming

And now, we’re about to do it again. Precise beta1 is looking really solid, and I am immensely proud to have been a tiny part of that.

So, what did we do on the sever team that has led to precise’s awesomeness:

Ubuntu 10.10 “Maverick Meerkat” – We helped out with getting CEPH into Debian and Ubuntu for 10.10, which proved to be important as it gave users a way to try out CEPH. CEPH will ship in main, and fully supported by Canonical in 12.04, which is pretty exciting! This was also the first release to feature pieces of OpenStack.

Ubuntu 11.04 “Natty Narwhal”Upstart got a lot better for server users in 11.04 with the addition of “override” files, and the shiny new Upstart Cookbook. We also finally figured out how to coordinate complicated boot sequences without having to rewrite upstart to track state. I wasn’t personally involved, but we also shipped the first really usable OpenStack release, “cactus”.

Ubuntu 11.10 “Oneiric Ocelot” – It seems small, but we fixed boot-up race conditions caused by services which need their network interfaces to be up before they start. Upstart also landed full chroot support, so you can run a chroot with its own upstart services inside of it, which is important for some use cases. This release also featured the debut of Juju, which is a new way to deploy and manage network services and applications.

Ubuntu 12.04 “Precise Pangolin” – OpenStack Essex is huge. Full keystone integration, lots of new features, and lots of satellite projects. Juju has really grown into a useful project now (give it a spin!). We also were able to transition to MySQL 5.5, which was no small feat. The amount of automated continuous integration testing that has gone into the precise cycle is staggering, and continues to grow as test cases are added. We’ll never find all the bugs this way, but we’ve at least found many of them before they ever reached a stable release this time.

There’s so much more in each of these, its amazing how much has been improved and refined in Ubuntu Server in just 2 years.

I’m pumped. A new LTS is exciting for us in Ubuntu Development, as it refocuses the more conservative users on all the work we’ve been doing. I would love to hear any feedback from the greater community. This is going to be great!

March 1, 2012 at 10:47 pm Comments (0)

CloudCamp San Diego – Wake up and smell the Enterprise

I took a little trip down to San Diego yesterday to see what these CloudCamp events are all about. There are so many, and they’re all over, I figure its a good chance to take a look at what might be the “Common man’s” view of the cloud. I spend so much time talking to people at a really deep level about what the cloud is, why we like it, why we hate it, etc. This “un-conference” was more about bringing a lot of that information, distilled for business owners and professionals who need to learn more about “this cloud thing”.

The lightning talks were quite basic. The most interesting one was given by a former lawyer who now runs IT for a medium sized law firm. Private cloud saves him money because he can now make a direct charge back to a client when they are taking up storage and computing space. This also allows him to keep his infrastructure more managable because they tend to give up resources more readily when there is a direct chargeback as opposed to just general service fees that try to cover this.

There was a breakout session about SQL vs. NoSQL. I joined and was shocked at how dominant the Microsoft representative was. She certainly tried to convince us “this isn’t about SQL Azure, its about SQL vs. NoSQL” but it was pretty much about all the things that suck more than SQL Azure, and about not mentioning anything that might compete directly with it. I brought up things like Drizzle, Cassandra, HDFS, Xeround, MongoDB, and MogileFS. These were all swiftly moved past, and not written on the white board. Her focus was on how SimpleDB differs from Amazon RDS, and how Microsoft Azure has its own key/value/column store for their cloud. The room was overpowered into silence for the most part.. there were about 20 developers and IT manager types in the room and they had no idea how this was going to help them take advantage of IaaS or PaaS clouds. I felt the session was interesting, but ultimately, completely pwned by the Microsoft rep. She ended by showing off 3D effects in their Silverlight based management tool. Anybody impressed deserves what they get, quite honestly.

One good thing that did come out of that session was the ensuing discussion for it where I ended up talking with a gentleman from a local San Diego startup that was just acquired. This is a startup of 3 people that is 100% in Amazon EC2 on Ubuntu with PHP and MySQL. They have their services spread accross 3 regions and were not affected at all by the recent outtages in us-east-1. Their feeling on the SQL Azure folks is that its for people who have money to burn. For him, he spends $3000 a month and it is entirely on EC2 instances and S3/EBS storage. The audience was stunned that it was so cheap, and that it was so easy to scale up and down as they add/remove clients. He echoed something that the MS guys said too.. that because their app was architected this way from the beginning, it was extremely cost effective, and wouldn’t even really save much money if they leased or owned servers instead of leasing instances, since they can calculate the costs and pass them directly on to the clients with this model, and their commitment is zero.

Later on I proposed a breakout session on how repeatable is your infrastructure (basically, infrastructure as code). There was almost no interest, as this was a very business oriented un-conference. The few people who attended were just using AMI’s to do everything. When something breaks, they fix it with parallel-ssh. For the one person who was using Windows in the cloud, he had no SSH, so fixing any system problems meant re-deploying his new AMI over and over.

Overall I thought it was interesting to see where the non-webops world is with knowledge of the cloud. I think the work we’re doing with Ensemble is really going to help people to deploy open source applications into private and public clouds so they don’t need 3D enabled silverlight interfaces to manage a simple database or a bug tracking system for their internal developers.

June 15, 2011 at 8:25 pm Comments (0)

Cars are so last century … but, so is Linux, right?

This past weekend, I attended the 2010 Los Angeles Auto Show. I’m not a huge car buff. I do think that BMW’s are the bomb, and I like Honda’s common sense vehicles, but really, I am NOT a car guy. However, I thought this was an interesting chance to take a look at an industry that, in my opinion, isn’t all that different than the one I’m in.

Now, that may surprise some. Its pretty easy to think that I work for a super advanced company that has started a revolution and sits on the bleeding edge of innovation. I mean, at Canonical, we’re doing all kinds of amazing stuff with “the cloud” and building software that makes peoples’ jaw drop when they see it in action sometimes.

But really, I think we’re more like CODA. CODA has built what looks to be a sustainable, practical electric car. The car itself is not visually stunning, but the idea behind it is. Make an electric car that anyone can buy *and* use. Make it fun, and make sure the business is sustainable. But, in no way is CODA challenging the ideas and revisions that have worked for the 100+ years that the car industry has existed.

CODA is still putting a steering wheel, gas pedals, and gear shift in the cockpit for the driver. There are doors, wipers, lights, and probably floor mats. In much the same way, in Ubuntu, we’re still putting our software out there with the intention that, while its created differently, and affords the user more capabilities, it is basically driven in much the same way as Windows 7 or OS X, mostly as a web, errrr, cloud terminal.

The exciting part is that for $3 of possibly more efficiently produced electricity, you can drive 100 miles. Even more exciting is that the CODA might actually compete with sensibly priced  (but larger) Honda and Toyota sedans, rather than like the Tesla cars that compete with Lexus and BMW’s.

Given this way of thinking, the auto show was extremely interesting. The electric car (open source?) has “arrived”, and the established players are buying the interesting enabling technology like batteries (android’s linux kernel, darwin for mac, etc) from companies like Tesla, and putting them in their established products.

Whether consumers care about either open source or electric cars is another story.. maybe the 2011 LA Auto Show will have an answer for me on at least one of them.


November 22, 2010 at 6:06 pm Comments (0)

Puppet Camp Report: Two very different days

I attended Puppet Camp in San Francisco this month, thanks to my benevolent employer Canonical’s sponsorship of the event.

It was quite an interesting ride. I’d consider myself an intermediate level puppet user, having only edited existing puppet configurations and used it for proof of concept work, not actual giant deployments. I went in large part to get in touch with users and potential users of Ubuntu Server to see what they think of it now, and what they want out of it in the future. Also Puppet is a really interesting technology that I think will be a key part of this march into the cloud that we’ve all begun.

The state of Puppet

This talk was given by Luke, and was a very frank discussion of where puppet is and where it should be going. He discussed in brief where puppet labs fit in to this discussion as well. In brief, puppet is stable and growing. Upon taking a survey of puppet users, the overwhelming majority are sysadmins, which is no surprise. Debian and Ubuntu have equal share amongst survey respondants, but RHEL and CentOS dominate the playing field.

As for the future, there were a couple of things mentioned. Puppet needs some kind of messaging infrasturcture, and it seems the mCollective will be it. They’re not ready to announce anything, but it seems like a logical choice.  There are also plans for centralized data services to make the data puppet has available to it available to other things.

mCollective

Given by mCollective’s author, whose name escapes me, this was a live demo of what mCollective can do for you. Its basically a highly scalable messaging framework that is not necessarily tied to puppet. You simply need to write an agent that will subscribe to your messages. Currently only ActiveMQ is supported, but it uses STOMP, so any queueing system that uses STOMP should be able to utilize the same driver.

Once you have these agents consuming messages, one must just become creative at what they can do. He currently has some puppet focused agents and client code to pull data out of puppet and act accordingly. Ultimately, you could do much of this with something like Capistrano and parallel ssh, but this seems to scale well. One audience member boasted that they have over 1000 nodes using mCollective to perform tasks.

The Un-Conference

Puppet Camp took the form of an “un conference”, where there were just a few talks, and a bunch of sessions based on what people wanted to talk about. I didn’t propose anything, as I did not come with an agenda, but I definitely was interested in a few of the topics:

Puppet CA

My colleague at Canonical, Mathias Gug, proposed a discussion of the puppet CA mechanics, and it definitely interested me. Puppet uses the PKI system to verify clients and servers. The default mode of operation is for a new client to contact the configured puppet master, and submit a “CSR” or “Certificate Signing Request” to it. The puppet master administrator then verifies that the CSR is from one of their hosts, and signs it, allowing both sides to communicate with some degree of certainty that the certificates are valid.

Well there’s another option, which is just “autosign”. This works great on a LAN where access is highly guarded, as it no longer requires you to verify that your machine submitted the CSR. However, if you have any doubts about your network security, this is dangerous. An attacker can use this access to download all of your configuration information, which could contain password hashes, hidden hostnames, and any number of other things that you probably don’t want to share.

When you add the cloud to this mix, its even more important that you not just trust any host. IaaS cloud instances come and go all the time, with different hostnames/IP’s and properties. Mathias had actually proposed an enhancement to puppet to add a unique ID attribute for CSR’s made in the cloud, but there was a problem with the ruby OpenSSL library that wouldn’t allow these attributes to be added to the certificate. We discussed possibly generating the certificate beforehand using the openssl binary, but this doesn’t look like it will work w/o code changes to Puppet. I am not sure where we’ll go from there.

Puppet Instrumentation

I’m always interested to see what people are doing to measure their success. I think a lot of times we throw up whatever graph or alert monitoring is pre-packaged with something, and figure we’ve done our part. There wasn’t a real consensus on what were the important things to measure. As usual, sysadmins who are running puppet are pressed for time, and often measurement of their own processes falls by the way side with the pressure to measure everybody else.

Other stuff

There were a number of other sessions and discussions, but none that really jumped out at me. On the second day, an employee from Google’s IT department gave a talk about google’s massive puppet infrastructure. He discussed that it is only used for IT support, not production systems, though he wasn’t able to go into much more detail. Also Twitter gave some info about how they use puppet for their production servers, and there was an interesting discussion about the line between code and infrastructure deployment. This stemmed from a question I asked about why they didn’t use their awesome bittorent based “murder” code distribution system to deploy puppet rules. The end of that was “because murder is for code, and this is infrastructure”.

Cloud10/Awstrial

So this was actually the coolest part of the trip. Early on the second day, during the announcements, the (sometimes hilarious) MC Deepak mentioned that there would be a beginner puppet session later in the day. He asked that attendees to that session try to have a machine ready, so that the prsenter, Dan Bode, could give them some examples to try out.

Some guys on the Canonical server team had been working on a project called “Cloud 10” for the release of Ubuntu 10.10, which was coming in just a couple of days. They had thrown together a django app called awstrial that could be used to fire up EC2 or UEC images for free, for a limited period. The reason for this was to allow people to try Ubuntu Server 10.10 out for an hour on EC2. I immediately wondered though.. “Maybe we could just provide the puppet beginner class with instances to try out!”

Huzzah! I mentioned this to Mathias, and he and I started bugging our team members about getting this setup. That was at 9:00am. By noon, 3 hours later, the app had been installed on a fresh EC2 instance, a DNS pointer had been created pointing to said instance, and the whole thing had been tweaked to reference puppet camp and allow the users to have 3 hours instead of 55 minutes.

As lunch began, Mathias announced that users could go to “puppet.ec42.net” in a browser and use their Launchpad or Ubuntu SSO credentials to spawn an instance.

A while later, when the beginner class started, 25 users had signed on and started instances. Unfortunately, the instances died after 55 minutes due to a bug in the code, but ultimately, the users were able to poke around with these instances and try out stuff Dan was suggesting. This made Canonical look good, it made Ubuntu look good, and it definitely has sparked a lot of discussion internally about what we might do with this little web app in the future to ease the process of demoing and training on Ubuntu Server.

And whats even more awesome about working at Canonical? This little web app, awstrial, is open source. Sweet, so anybody can help us out making it better, and even show us more creative ways to use it.


October 21, 2010 at 4:54 pm Comments (0)

Balance Your Cloud

Seems like eons ago (just under 6 months..) when I joined Canonical, and hopped on a plane headed for Brussels and UDS-Maverick.

What a whirlwind, attending sessions, meeting the real rock stars of the Ubuntu world, and getting to know my super distributed team.

One of the sessions was based on a blueprint for load balancing in the cloud. The idea was that rather than rely on amazon’s Elastic Load Balancer, you could build your own solution that you could possibly even move around between UEC, EC2, or even Rackspace clouds.

Well it got a lower priority to some other stuff, so unfortunately, many parts got dropped (like ELB compatible cli tools).

But, I managed to find the time to create a proof of concept for managing haproxy’s config file (perhaps my first real python project), and write up a HOWTO for using it.

Honestly, its not the best HOWTO I’ve ever written. Its got a lot of stuff left out. But, it should be enough to get most admins past the “tinker for a few hours” phase and into the “tinker for 40 minutes right before getting it working then passing out on your keyboard at 4:00am” phase. I know thats how far it got me..


October 5, 2010 at 7:14 am Comments (0)

Drizzle7 Beta Released! now with MySQL migration! « LinuxJedis /dev/null

Drizzle7 Beta Released! now with MySQL migration! « LinuxJedis /dev/null.

Drizzle is a project that is near and dear to my heart.

To sum it up, Drizzle took all that was really good in MySQL, cut out all that was mediocre, and replaced some of it with really good stuff. The end product is, I think, something that is leaner, should be more stable, and definitely more flexible.

So go check out the beta! I guess I should use Andrew’s migration tool and see if I can migrate this blog to drizzle. :)


September 29, 2010 at 10:39 pm Comments (0)

Where did those numbers come from?


Did you ever hear a claim that sounded too bad to be true?
So this past Tuesday at Velocity 2010, Brett Piatt gave a workshop on the Cassandra database. I was seated in the audience and quite interested in everything he had to say about running Cassandra, given that I’ve been working on adding Cassandra and other scalable data stores to Ubuntu.

Then at one point, up popped a table that made me curious.

It looked a lot like this:

With a 50GB table

MySQL Cassandra
writes 300ms 0.19ms
reads 250ms 1.6ms

Actually it looked exactly like that, because it was copied from this page that is, as of this point, only available in its original form in google cache.

The page linked has *no* explanation of this table. Its basically just “OH DAAAAMN MySQL you got pwned”. But seriously, WTF?

I asked Brett where those numbers came from, and whether we could run the tests ourselves to compare our write performance to Cassandra’s I don’t mean to say “our write performance” as in MySQL’s, as this statement implies, but rather ours to the write performance of the Cassandra team’s. Brett claimed ignorance and just referred to the URL of the architecture page.

Ok fair enough. I figured I should investigate more ,so I asked on #cassandra on freenode. People pointed me to various other slide decks with the same table in them, but none with any explanation.

At some point, somebody rightfully recognized that having these numbers with no plausible explanation is ridiculous, and removed them from the site. Another person did in fact rightfully recognize why this may be the case.

Basically with a 50G table, assuming small records, you will have *a giant* B-Tree for the primary key of that table (assuming you have one) will take 30+ disk seeks to update. That means that at 10ms (meaning, HORRIBLE) per seek, we’ll take 300ms to write. This is contrasted to Cassandra which can just append, requiring at most one seek.

So anyway, Cassandra team, thanks for the explanation, and kudos for righting this problem. Unfortunately the misinformation tends to be viral, so I’m sure there are people out there who will forever believe that MySQL takes 300ms to update a 50G table.


June 26, 2010 at 6:22 am Comments (0)

Embedding libraries makes packagers sad pandas!

STOP THE INSANITY!

So, in my role at Canonical, I’ve been asked to package some of the hotter “web 2.0″ and “cloud” server technologies for Ubuntu’s next release, 10.10, aka “Maverick Meerkat”.

While working on this, I’ve discovered something very frustrating from a packaging point of view thats been going on with fast moving open source projects. It would seem that rather than produce stable API’s that do not change, there is a preference to dump feature after feature into libraries and software products, and completely defenestrate API stability (or as benblack from #cassandra on freenode calls it, API stasis).

So who cares, right? People who choose to use these pieces of software are digging their own grave. Right? But thats not really the whole story. We have a bit of a new world order when it comes to “Web 2.0″. There’s this new concept of “devops“. People may be on to something there. As “devops” professionals, we need to pick things that scale and ship *on time*. That may mean shunning the traditional methods of loose coupling defined by rigid API’s. Maybe instead, we should just build a new moderately loose coupling for each project. As crazy as it sounds, its working out for a lot of people. Though it may be leaving some ticking time bombs for security long term.

To provide a concrete example, I’ll pick on the CPAN module Memcached::libmemcached. This is basically just a perl wrapper around the C library libmemcached. It seeks to provide perl programmers with access to all of the fantastic features that libmemcached has to offer. The only trouble is, it only supports everything that libmemcached had to offer in v0.31 of libmemcached.

Now, that is normal for a wrapper. Wrappers are going to lag their wrapped components newer features. But, they generally can take advantage of any behind the scenes improvement in a library, right? Just upgrade the shared library, and the systems dynamic linker will find it right? Well thats going to be tough, because there have been quite a few incompatible changes to the library since 0.31 was published last summer. And the API itself has grown massively, changing calls that are fundamental to its operation, if only slightly.

So, rather than be slave to somebody upgrading the shared library and recompiling Memcached::libmemcached, the maintainers simply embedded version 0.31 of libmemcached in their software distribution. The build process just statically links their wrapper against it. Problem solved, right?

However, from a larger scale software distributor’s standpoint, this presents us with a big problem. We have a lot of things depending on libmemcached, and we really only want to ship maybe 1 or 2 versions (an old compatible version and the latest one) of this library. What happens when a software vulnerability is found and published affecting “every version of libmemcached prior to 0.41″. Now we have to patch not only our v0.40 that we shipped, but also v0.31 inside Memcached::libmemcached. Even worse, what if we also had some ruby code we wanted to ship? The Ruby wrapper for libmemcached has v0.37 embedded. So now we have to patch and test three versions of this library. This gets ugly quickly.

From Memcached::libmemcached’s point of view, they want something known to be working *for the project today*. The original authors have moved on to other things now that they’ve shipped the product, and the current maintainers are somewhat annoyed by the incompatibilities between 0.31 and 0.40, and don’t have much impetus to advance the library. Even if they did, the perl code depending on it must be updated since the API changed so heavily.

Now, I am somewhat zen about “problems” and I like to stay solution focused, and present them as opportunities (HIPPIE ALERT!). So, what opportunity does this challenge present us, the packagers, the integrators, the distributors, with?

I think we have an opportunity to make things better for people using packaging, and people not using packaging. Rather than fighting the trends we see in the market, maybe we should embrace it. Right now, a debian format package (which Ubuntu uses) control file defines a field called “Depends”. This field tells the system “Before you install this package, you must install these other packages to make it work”. This is how we can get working, complex software to install very easily with a command like ‘apt-get install foo’.

However, it gets more difficult to maintain when we start depending on specific versions. Suddenly we can’t upgrade “superfoo” because it relies on libbar v0.1, but libbar v0.2 is out and most other things depend on that version.

What if, however, we added a field. “Embeds: libbar=0.1″. This would tell us that this package includes its own private version of libbar v0.1. When the maintenance problem occurs for libbar “versions prior to v0.3″, we can simply publish a test case that tests for the bad behavior in an automated fashion. If we see this bad behavior, we can then evaluate whether it is worth patching. Any gross incompatibilities with the test case will have to be solved manually, but even with the most aggressive API breaking behavior, that can probably be reduced to 2 – 3 customizations.

“But we’re already too busy patching software!”. This is a valid point that I don’t have a good answer for. However, the current pain we suffer in trying to package things from source is probably eating up a lot more time than backporting tests and fixes would. If we strive for test coverage and embedded software tracking, at least we can know where we’re *really* vulnerable, and not just assume. Meanwhile, we can know exactly which version of libraries are embedded in each software product, and so we can reliably notify users that they are vulnerable, allowing them to accept risks if they so desire.

Of course, this requires a tool that can find embedded software, and track it in these packages. I don’t think such a thing exists, but it would not be hard to write. If a dir contains 99% of the files from a given software package, then it can be suggested that it embeds said software package. If we can work that down to checking a list of 1000 md5sums against another list of 1000 md5sums, we should be able to suggest that we think we know what the software embeds, and sometimes even provide 100% certainty.

I look forward to fleshing out this idea in the coming months, as I see this happening more and more as we lower the barriers between developers and operations. Cloud computing has made it easy for a developer to simply say “give me 3 servers like that” and quickly deploy applications. I can see a lot of them deploying a lot more embedded software, and not really understanding what risk their taking. As Mathias Gug told me recently.. this should help us spend more time being “fire inspectors” and less time being “fire fighters”.


June 12, 2010 at 6:19 am Comments (0)

YouTube – RSA Animate – Drive

May 17, 2010 at 6:56 pm Comments (0)