SpamapS.org – Full Frontal Nerdity

Clint Byrum's Personal Stuff

So what is Ensemble anyway?

Have you heard of Ensemble? Are you excited about Cloud/Service Orchestration? What? Ok you’re not alone if you are scratching your head.

Ensemble is an implementation of a new idea that has been taking shape the last couple of years. Ever since Amazon hooked up a remote API to thousands of machines to provide access to their virtual infrastructure (and called it macaroni? err.. AWS), people have been dreaming up ways to take advantage of what is basically a robotic “NOC guy”. No longer do you have to pre-rack servers or call your vendor frantically to get servers sent next-day to your colo. Right?

Naturally, the system administrators that would normally be in charge of racking servers, applied their existing tools to the job, to mixed success. Config management is really good at modelling identical hosts. But with virtual hosts instantly available, this left those thinking at a higher level wanting more. Chef in particular implemented a nice set of tools and functionality to allow this high level “service” definition with their knife tools and simple ruby API.

But how easy are Chef’s cookbooks to share and use without modification? How easy are they to integrate together? Puppet has modules that are also capable of similar functionality, and the recent integration of Mcollective, plus puppet Faces, has certainly added a lot of the same things Chef had to support this kind of application modelling, but again, the modules seem to require a lot of convention and assumption, and tweaking to get useful.

Its my opinion, that this is very much like the way tarballs+autoconf became the de-facto standard for distributing free software. It was *so much* better than writing a Makefile by hand, and it achieved an enormous amount of portability, so developers adopted it rapidly. In fact, it is still the dominant way to distribute portable open source applications.

But at some point, the limitations of this became clear. There was a need for something more concise, that could distribute both the source, and binaries, built for a platform. There was some limited early success with tarballs built by convention. But then, Enter RPM and DPKG. These included ways to express facts about software, like its dependencies, architecture, and the revisions made to it to work on the target platform. This allowed distributors of software to more easily maintain their systems, and enabled users to manage the software in their environments.

At that point, some smart guy figured out that we should be able to download and automatically configure all of the software needed for one application to work properly, just from its packaging information. To my mind, apt-get was my first experience with this, though FreeBSD ports authors may disagree there. Either way, this made it very easy for admins and users to install software without spending hours in the 7 levels of dependency hell.

In many ways, Service Orchestration is a way of bringing the benefits of packaging to the cloud. It should allow us to build out our cloud in a sane way, taking advantage of the knowledge that has been gained by others. For the bits that we need to finely tune, it should step aside and allow that without compromising the system.

Ensemble is an implementation of this idea, and Principia is a collection of “Formulas” for Ensemble. They are tightly coupled to Ubuntu, as they are in many ways meant to be the dpkg and apt-get for Ubuntu in the cloud.

Its pretty easy to try out Ensemble and Principia on Ubuntu. Right now you’ll need an EC2 account with an access key setup, though we’re working on making this work with just your local machine for rapid development.

Its been pointed out to me that the version of principia-tools that was available at the time of this writing didn’t include /usr/share/principia-tools/tests. I’ve uploaded a fixed version to the ensemble PPA, so if you tried these instructions and failed, please try updating principia-tools. If that fails, you can get the tests with bzr branch lp:principia-tools.


sudo add-apt-repository ppa:ensemble/ppa
sudo apt-get update
sudo apt-get install principia-tools
export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxx
export AWS_SECRET_KEY_ID=0123456789ABCDEF
ensemble bootstrap
principia getall /some/path/for/formulas
/usr/share/principia-tools/tests/mediawiki.sh /some/path/for/formulas

What does this give you, well it should give you a 7 node mediawiki cluster of t1.micro’s in the us-east-1 region of EC2. I just ran it and now I have this:

machines:
  0: {dns-name: ec2-50-19-158-109.compute-1.amazonaws.com, instance-id: i-215dd84f}
  1: {dns-name: ec2-50-17-16-228.compute-1.amazonaws.com, instance-id: i-8d58dde3}
  2: {dns-name: ec2-72-44-49-114.compute-1.amazonaws.com, instance-id: i-9558ddfb}
  3: {dns-name: ec2-50-19-47-106.compute-1.amazonaws.com, instance-id: i-6d5bde03}
  4: {dns-name: ec2-174-129-132-248.compute-1.amazonaws.com, instance-id: i-7f5bde11}
  5: {dns-name: ec2-50-19-152-136.compute-1.amazonaws.com, instance-id: i-755bde1b}
  6: {dns-name: '', instance-id: i-4b5bde25}
services:
  demo-wiki:
    formula: local:mediawiki-62
    relations: {cache: wiki-cache, db: wiki-db, website: wiki-balancer}
    units:
      demo-wiki/0:
        machine: 2
        relations: {}
        state: null
      demo-wiki/1:
        machine: 6
        relations: {}
        state: null
  wiki-balancer:
    formula: local:haproxy-13
    relations: {reverseproxy: demo-wiki}
    units:
      wiki-balancer/0:
        machine: 4
        relations: {}
        state: null
  wiki-cache:
    formula: local:memcached-10
    relations: {cache: demo-wiki}
    units:
      wiki-cache/0:
        machine: 3
        relations: {}
        state: null
      wiki-cache/1:
        machine: 5
        relations: {}
        state: null
  wiki-db:
    formula: local:mysql-93
    relations: {db: demo-wiki}
    units:
      wiki-db/0:
        machine: 1
        relations: {}
        state: null

At the top you see the machines that ensemble spun up in EC2 in the ‘machines’ section. The numbers there correspond to the ‘machine: #’ in the service/units definitions below. If you look through, you’ll see above that wiki-balancer is machine 4, which has a hostname of ec2-174-129-132-248.compute-1.amazonaws.com. If you go to that hostname, once all relations are up (I like to use ‘watch ensemble status’ to see when this happens), you should see a working mediawiki. But not just a working mediawiki, a scalable one. If you want to pour on the traffic, spin up 3 more demo-wiki’s to handle the app server load:


ensemble add-unit demo-wiki
ensemble add-unit demo-wiki
ensemble add-unit demo-wiki

These will of course take a minute or two to spin up. Once they’re ready they’ll show up in the status output:

services:
  demo-wiki:
    formula: local:mediawiki-62
    relations: {cache: wiki-cache, db: wiki-db, website: wiki-balancer}
    units:
      demo-wiki/0:
        machine: 2
        relations:
          cache: {state: up}
          db: {state: up}
          website: {state: up}
        state: started
      demo-wiki/1:
        machine: 6
        relations:
          cache: {state: up}
          db: {state: up}
          website: {state: up}
        state: started
      demo-wiki/2:
        machine: 7
        relations:
          cache: {state: up}
          db: {state: up}
          website: {state: up}
        state: started
      demo-wiki/3:
        machine: 8
        relations:
          cache: {state: up}
          db: {state: up}
          website: {state: up}
        state: started
      demo-wiki/4:
        machine: 9
        relations:
          cache: {state: up}
          db: {state: up}
          website: {state: up}
        state: started

How about a little test then? After I got to this point, I logged in as WikiSysop (change the password folks! its change-me) and imported the Wikipedia exports for “Ubuntu” and “EC2″. After that I used harvestman to spider the site and then saved all the urls in a file, urls.txt. Alright! Now lets fire up *siege* from a machine outside the cluster, but in the same availability zone / security group (so at least we’re only dealing with EC2′s latency and not my net connection), and see if we can take this cluster down!


$ siege -i -c 5 -f urls.txt
...
Transactions: 563 hits
Availability: 100.00 %
Elapsed time: 95.58 secs
Data transferred: 2.64 MB
Response time: 0.35 secs
Transaction rate: 5.89 trans/sec
Throughput: 0.03 MB/sec
Concurrency: 2.04
Successful transactions: 544
Failed transactions: 0
Longest transaction: 13.54
Shortest transaction: 0.00

This is, btw, the best run I got out of t1.micro’s. Sometimes it would get quite ugly:


Transactions: 892 hits
Availability: 99.55 %
Elapsed time: 221.69 secs
Data transferred: 3.64 MB
Response time: 0.61 secs
Transaction rate: 4.02 trans/sec
Throughput: 0.02 MB/sec
Concurrency: 2.45
Successful transactions: 849
Failed transactions: 4
Longest transaction: 27.41
Shortest transaction: 0.00

Lets try the whole thing over with m1.small. First I edit ~/.ensemble/environments.yaml and add an override for the default-instance-type:


ensemble: environments

environments:
  sample:
    type: ec2
    default-instance-type: m1.small
    control-bucket: ensemble-12345678901234567890
    admin-secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Then I re-run the whole test:


Transactions: 290 hits
Availability: 98.98 %
Elapsed time: 81.79 secs
Data transferred: 0.78 MB
Response time: 0.53 secs
Transaction rate: 3.55 trans/sec
Throughput: 0.01 MB/sec
Concurrency: 1.89
Successful transactions: 277
Failed transactions: 3
Longest transaction: 1.50
Shortest transaction: 0.00

Oops! I forgot to add my 3 extra nodes. Note that these two m1.smalls are already almost keeping up. Now as I add these, I keep siege running. Its pretty cool to watch the response times drop as nodes come online to carry some of the load.

Now with 5 m1.small’s:


Transactions: 273 hits
Availability: 100.00 %
Elapsed time: 54.27 secs
Data transferred: 0.99 MB
Response time: 0.47 secs
Transaction rate: 5.03 trans/sec
Throughput: 0.02 MB/sec
Concurrency: 2.38
Successful transactions: 260
Failed transactions: 0
Longest transaction: 19.92
Shortest transaction: 0.00

And with higher concurrency raised from 5 to 10:


Transactions: 327 hits
Availability: 100.00 %
Elapsed time: 42.20 secs
Data transferred: 1.30 MB
Response time: 0.66 secs
Transaction rate: 7.75 trans/sec
Throughput: 0.03 MB/sec
Concurrency: 5.12
Successful transactions: 318
Failed transactions: 0
Longest transaction: 25.51
Shortest transaction: 0.00

And now if we add 2 more, for a total of 7 nodes, concurrency of 10 gets even better:


Transactions: 531 hits
Availability: 100.00 %
Elapsed time: 53.37 secs
Data transferred: 1.75 MB
Response time: 0.44 secs
Transaction rate: 9.95 trans/sec
Throughput: 0.03 MB/sec
Concurrency: 4.35
Successful transactions: 507
Failed transactions: 0
Longest transaction: 15.49
Shortest transaction: 0.00

And with 2 more (total of 9 units in demo-wiki serving the app):


Transactions: 354 hits
Availability: 100.00 %
Elapsed time: 34.41 secs
Data transferred: 1.23 MB
Response time: 0.41 secs
Transaction rate: 10.29 trans/sec
Throughput: 0.04 MB/sec
Concurrency: 4.22
Successful transactions: 337
Failed transactions: 0
Longest transaction: 11.45
Shortest transaction: 0.00

Anyway, this isn’t a Mediawiki benchmark. This is to show you how easy it is to scale up and down in response to load with Ensemble. We all know that scaling out works, these graphs show it nicely:

Response Time
Transactions per Second

Notice how the transactions/second went up all the time, but the response time went up drastically with the jump in concurrency. This is where you need to have the ability to scale quickly, and where, if you can live with the other limitations of EC2 or any other IaaS provider, the cloud should actually win you business, since better response time means more happy users.

Now that my siege is over, I can safely remove the unnecessary units one by one with ‘ensemble remove-unit demo-wiki/9′, etc. etc. There’s still a lot of room for sugar to be added. We could say “ensemble resize-service demo-wiki 5″ and it might just pick 5 to keep and remove the rest, or add 3 to fulfill the request. There are also a ton of other ideas just bubbling up that are really exciting.

Come say hi and hack on ensemble with us in Freenode, #ubuntu-ensemble and on the mailing list on the mailing list.

June 3, 2011 at 6:53 pm Comments (0)

Time for some ghetto monitoring

If you came here between April 28 and about an hour ago, you got a “couldn’t connect to database” error. Oops! Seems my limited memory EC2 instance got a little overwhelmed by php processes and decided the db server, drizzled, should die to make more room for PHP. Ooops! Time to drop pm.max_children.

I don’t have any monitoring setup for the site, so I just now figured it out. Until I get proper monitoring, I’ve installed this fancy bit of duct-tape upstart magic:

start on stopping
task
script
env | mail -s "$JOB is stopping!" me@myemail.com
end script

What does this do? Well is emails me whenever upstart gives up respawning something, or I manually stop a service.

Its not monitoring. I need monitoring. But this is a nice little hack to prevent a regression while I figure that out.

May 2, 2011 at 4:54 pm Comments (0)

The 2011 O’Reilly Open Mysql Drizzle Maria Monty Percona Xtra Galera Xeround Tungsten Cloud Database Conference and Expo

Or, for short, the “2011 O’Reilly MySQL Users Conference & Expo”. Yes thats the short name of the conference that, thus far, has brought me nothing but good info, good times, and insight into one of the most interesting open source communities around.

MySQL has been at the core of a real revolution in the way data driven applications have exploded on the internet. Its so easy to just install it, fire up php’s mysql driver, and boom, you’re saving and retrieving data. The *use* of MySQL has always been incredibly simple.

The politics has, at times, been confusing. Dual licensing was sort of an odd concept when MySQL AB was doing it “back in the day”. Nobody really understood how it worked or how they could sell something that was also “free”. But it worked out great for them. InnoDB got bought by Oracle and a lot of people thought “oh noes MySQL will have no transactional storage, Oracle will kill it.” Well we see where thats about 180 degrees from what actually happened (R.I.P. Falcon).

So this year, with the oddness of Oracle not being the top sponsor at an event that had driven a lot of the innovation and collaboration in the MySQL world (ironically, choosing instead to spend their time and effort on a conference called “Collaborate”), I thought “wonderful, more politics”.

But as Brian Aker says in his “State of the ecosystem” post, it was quite the opposite. The absence of the commercial entity responsible for MySQL took a lot of the purely business focused discussion down to almost a whisper, while big ideas and big thinking seemed to be extremely prominent.

Drizzle had quite a few sessions, including my own about what we’ve done with Drizzle in Ubuntu. This is particularly interesting to me because Drizzle is mostly driven by a community effort, though most of the heavy lifting work up until now has been sponsored by Sun then Rackspace. Its purely an idea of how a MySQL-like database should be written, and while it may be seeing limited production use now, the discussions were on how it can be used, what it does now, not where its going or who is going to pay for its development. Its such a good idea, I’m pretty convinced users will drive it in much the same way Apache was driven by users wanting to do interesting things with HTTP.

I saw a lot of interesting ideas around replication put forth as well. Galera, Tungsten, and Xeround all seem to be trying to build on MySQL’s success with replication and NDB (a.k.a. MySQL Cluster). I really like that there are multiple takes on how to make a multi-master highly available / scalable system work. Getting all the people using and developing these things into one conference center is always pretty interesting to me.

The keynotes were especially interesting, as they were delivered by people who are sitting at the interesection of the old MySQL world, and the new MySQL “ecosystem”. I missed Monty Widenius’s keynote but it strikes me that he is still leading the charge for a simple, scalable, powerful database system, proving that the core of MySQL is mostly unchanged. Martin Mickos delivered a really interesting take on how MySQL was part of the last revolution in computing (LAMP) and how it may very well be a big part of the next revolution (IaaS, aka “the cloud”). Brian Aker reinforced that MySQL as a concept, and specifically, Drizzle, are just part of your Infrastructure (the I in IaaS).

Then on Thursday, Baron Schwartz blew the whole place up. Go, watch the video if you weren’t there, or haven’t seen it. Baron has always been  insightful in his evaluation of the MySQL ecosystem. Maatkit came around when the community needed it, and on joining Percona I think he brought his clear thinking to Petr’s bold decision making at just the right time to help fuel their rise as one of the most respected consulting firms in the “WebScale” world. So when Baron got up and said that the database is still going to scale up, that MySQL isn’t going to lose to NoSQL or SomeSQL, but rather, that the infrastructure would adapt to the data requirements, it caught my attention, and got me nodding. And when he plainly called Oracle out for not supporting the conference, there was a hush over the croud followed by a big sigh. Its likely that those in attendance were the ones who understand that, and those who weren’t there were probably the ones who need to hear it. I’d guess by now they’ve seen the video or at least heard the call. Either way, thanks Baron for your insight and powerful thoughts.

This was my second MySQL Conference, and I hope it won’t be my last. The mix of users, developers, and business professionals has always struck me as quite unique, as MySQL sits at the intersection of a number of very powerful avenues. Lets hope that O’Reilly decides to do it again, *and* lets hope that Oracle gets on board as well.

April 27, 2011 at 5:49 pm Comments (0)

presenting “blog on a narwhal”

Since we’re just about to 11.04 beta2, I figured its high time I start using Ubuntu Server for my personal blog.

What? Almost a year at Canonical and my blog wasn’t on Ubuntu server? Well, for over 5 years now, a personal friend has provided me with a free Xen virtual machine to run my blog on. I migrated it off of Debian then, which was sad for me, but back then I was so focused on working I didn’t have time or resources to be picky, so I said OK.

Fast forward to now, I’ve been working on Ubuntu Server and getting ribbed by my co-workers about that “crappy CentOS xen box” they’d see me logged into.

Well thats all over now. I decided to marry all the new tech I’ve been playing with lately into one glorious blog migration.

The old blog was:

  • Xen domU
  • 500MB RAM allocated
  • 9GB storage
  • CentOS 5.5
  • Apache + mod_php (5.3.5 from IUS project)
  • MySQL 5.0.77
  • WordPress 3.0.5 manually installed single-site

The new hotness is:

  • EC2 t1.micro (its upgradable! ;)
  • 692MB RAM
  • 8GB EBS
  • nginx + php5-fpm (5.3.5 from natty)
  • Drizzle 2011.03.13 (wordpress-plugin 0.0.2)
  • WordPress 3.0.5 from natty in multisite mode

The steps to migrate weren’t altogether complicated. A bit of configuration for nginx to have it serve my PHP using php5-fpm, and copying most of wp-content over. Drizzle couldn’t have been more straight forward:

  • Install drizzle7-client from EPEL on CentOS vm
  • drizzledump blog database (drizzledump automatically converts mysql schemas to drizzle compatible ones)
  • load it into drizzle on Ubuntu server

WordPress still needs *some* help to use Drizzle. Much of this will be handled by the wordpress-drizzle package from my ppa (add-apt-repository ppa:clint-fewbar/drizzle) which filters DDL to change things like LONGTEXT to TEXT. Because Drizzle has done away with the eeeeevil of datetimes with  0000-00-00 as their date (a non-existent date), we need to change all instances of that to ’0001-01-01′. In the future I’d like to see this abstracted out of wordpress even more so that it is more aware of the datetime fields and can use actual NULL values. I believe this can be done in the plugin by overloading the insert/update methods. I’ve begun working on that, but for now I’ll just have to keep patching wp-includes/post.php , which seems to be the main user of 0000-00-00 to denote a “draft” post.

We also have to alter the wp_posts table slightly. Thats because wordpress relies on mysql’s broken “NOT NULL” producing an “empty string” in varchars. This ALTER does that:

ALTER TABLE wp_posts MODIFY COLUMN post_mime_type VARCHAR(100) COLLATE utf8_general_ci DEFAULT '';

Anyway, goodbye CentOS, hello Ubuntu!

April 13, 2011 at 7:26 am Comments (0)

Fewbar.com migrating to new server..

I’ve begun a migration of fewbar.com to a new box.. more details to follow… the site may be a little weird for about 1 hour while DNS propagates.

April 13, 2011 at 4:36 am Comments (0)

Puppet Camp Report: Two very different days

I attended Puppet Camp in San Francisco this month, thanks to my benevolent employer Canonical’s sponsorship of the event.

It was quite an interesting ride. I’d consider myself an intermediate level puppet user, having only edited existing puppet configurations and used it for proof of concept work, not actual giant deployments. I went in large part to get in touch with users and potential users of Ubuntu Server to see what they think of it now, and what they want out of it in the future. Also Puppet is a really interesting technology that I think will be a key part of this march into the cloud that we’ve all begun.

The state of Puppet

This talk was given by Luke, and was a very frank discussion of where puppet is and where it should be going. He discussed in brief where puppet labs fit in to this discussion as well. In brief, puppet is stable and growing. Upon taking a survey of puppet users, the overwhelming majority are sysadmins, which is no surprise. Debian and Ubuntu have equal share amongst survey respondants, but RHEL and CentOS dominate the playing field.

As for the future, there were a couple of things mentioned. Puppet needs some kind of messaging infrasturcture, and it seems the mCollective will be it. They’re not ready to announce anything, but it seems like a logical choice.  There are also plans for centralized data services to make the data puppet has available to it available to other things.

mCollective

Given by mCollective’s author, whose name escapes me, this was a live demo of what mCollective can do for you. Its basically a highly scalable messaging framework that is not necessarily tied to puppet. You simply need to write an agent that will subscribe to your messages. Currently only ActiveMQ is supported, but it uses STOMP, so any queueing system that uses STOMP should be able to utilize the same driver.

Once you have these agents consuming messages, one must just become creative at what they can do. He currently has some puppet focused agents and client code to pull data out of puppet and act accordingly. Ultimately, you could do much of this with something like Capistrano and parallel ssh, but this seems to scale well. One audience member boasted that they have over 1000 nodes using mCollective to perform tasks.

The Un-Conference

Puppet Camp took the form of an “un conference”, where there were just a few talks, and a bunch of sessions based on what people wanted to talk about. I didn’t propose anything, as I did not come with an agenda, but I definitely was interested in a few of the topics:

Puppet CA

My colleague at Canonical, Mathias Gug, proposed a discussion of the puppet CA mechanics, and it definitely interested me. Puppet uses the PKI system to verify clients and servers. The default mode of operation is for a new client to contact the configured puppet master, and submit a “CSR” or “Certificate Signing Request” to it. The puppet master administrator then verifies that the CSR is from one of their hosts, and signs it, allowing both sides to communicate with some degree of certainty that the certificates are valid.

Well there’s another option, which is just “autosign”. This works great on a LAN where access is highly guarded, as it no longer requires you to verify that your machine submitted the CSR. However, if you have any doubts about your network security, this is dangerous. An attacker can use this access to download all of your configuration information, which could contain password hashes, hidden hostnames, and any number of other things that you probably don’t want to share.

When you add the cloud to this mix, its even more important that you not just trust any host. IaaS cloud instances come and go all the time, with different hostnames/IP’s and properties. Mathias had actually proposed an enhancement to puppet to add a unique ID attribute for CSR’s made in the cloud, but there was a problem with the ruby OpenSSL library that wouldn’t allow these attributes to be added to the certificate. We discussed possibly generating the certificate beforehand using the openssl binary, but this doesn’t look like it will work w/o code changes to Puppet. I am not sure where we’ll go from there.

Puppet Instrumentation

I’m always interested to see what people are doing to measure their success. I think a lot of times we throw up whatever graph or alert monitoring is pre-packaged with something, and figure we’ve done our part. There wasn’t a real consensus on what were the important things to measure. As usual, sysadmins who are running puppet are pressed for time, and often measurement of their own processes falls by the way side with the pressure to measure everybody else.

Other stuff

There were a number of other sessions and discussions, but none that really jumped out at me. On the second day, an employee from Google’s IT department gave a talk about google’s massive puppet infrastructure. He discussed that it is only used for IT support, not production systems, though he wasn’t able to go into much more detail. Also Twitter gave some info about how they use puppet for their production servers, and there was an interesting discussion about the line between code and infrastructure deployment. This stemmed from a question I asked about why they didn’t use their awesome bittorent based “murder” code distribution system to deploy puppet rules. The end of that was “because murder is for code, and this is infrastructure”.

Cloud10/Awstrial

So this was actually the coolest part of the trip. Early on the second day, during the announcements, the (sometimes hilarious) MC Deepak mentioned that there would be a beginner puppet session later in the day. He asked that attendees to that session try to have a machine ready, so that the prsenter, Dan Bode, could give them some examples to try out.

Some guys on the Canonical server team had been working on a project called “Cloud 10” for the release of Ubuntu 10.10, which was coming in just a couple of days. They had thrown together a django app called awstrial that could be used to fire up EC2 or UEC images for free, for a limited period. The reason for this was to allow people to try Ubuntu Server 10.10 out for an hour on EC2. I immediately wondered though.. “Maybe we could just provide the puppet beginner class with instances to try out!”

Huzzah! I mentioned this to Mathias, and he and I started bugging our team members about getting this setup. That was at 9:00am. By noon, 3 hours later, the app had been installed on a fresh EC2 instance, a DNS pointer had been created pointing to said instance, and the whole thing had been tweaked to reference puppet camp and allow the users to have 3 hours instead of 55 minutes.

As lunch began, Mathias announced that users could go to “puppet.ec42.net” in a browser and use their Launchpad or Ubuntu SSO credentials to spawn an instance.

A while later, when the beginner class started, 25 users had signed on and started instances. Unfortunately, the instances died after 55 minutes due to a bug in the code, but ultimately, the users were able to poke around with these instances and try out stuff Dan was suggesting. This made Canonical look good, it made Ubuntu look good, and it definitely has sparked a lot of discussion internally about what we might do with this little web app in the future to ease the process of demoing and training on Ubuntu Server.

And whats even more awesome about working at Canonical? This little web app, awstrial, is open source. Sweet, so anybody can help us out making it better, and even show us more creative ways to use it.


October 21, 2010 at 4:54 pm Comments (0)

Balance Your Cloud

Seems like eons ago (just under 6 months..) when I joined Canonical, and hopped on a plane headed for Brussels and UDS-Maverick.

What a whirlwind, attending sessions, meeting the real rock stars of the Ubuntu world, and getting to know my super distributed team.

One of the sessions was based on a blueprint for load balancing in the cloud. The idea was that rather than rely on amazon’s Elastic Load Balancer, you could build your own solution that you could possibly even move around between UEC, EC2, or even Rackspace clouds.

Well it got a lower priority to some other stuff, so unfortunately, many parts got dropped (like ELB compatible cli tools).

But, I managed to find the time to create a proof of concept for managing haproxy’s config file (perhaps my first real python project), and write up a HOWTO for using it.

Honestly, its not the best HOWTO I’ve ever written. Its got a lot of stuff left out. But, it should be enough to get most admins past the “tinker for a few hours” phase and into the “tinker for 40 minutes right before getting it working then passing out on your keyboard at 4:00am” phase. I know thats how far it got me..


October 5, 2010 at 7:14 am Comments (0)

Cloud Computing Security

Cloud Computing Security.

The linked presentation above came up in a discussion the other day on IRC about what to do with certificates and SSH host keys.

I hadn’t really thought about this. Sometimes it feels like once you put on your “somebody else is thinking about security” blinders, the world just starts moving faster and the ideas get more interesting. Unfortunately, at this high speed, I have to wonder if the impact may not be fatal for some heavy cloud (ab)users.

To “see what I’m on about”,  skip ahead to slide #66 to see the bits about random numbers.

I keep thinking back to the days where I would open up “pSSH” on my Palm Treo 650 and it would warn me “This device has no real random number capabilities, so the crypto is probably pretty sketchy, be careful.” Unfortunately, our ssh clients on cloud instances aren’t telling us that. Somebody needs to put “fix random seeding in the cloud” on their todo list. Oh wait, I just did.


July 7, 2010 at 3:52 pm Comments (0)

“Protecting “Cloud” Secrets with Grendel”

May 28, 2010 at 8:03 am Comments (0)

UDS Maverick – day2 highlights

  • btrfs – BTRFS is pretty awesome, with filesystem level snapshotting and compression, it promises to make some waves on the server and small devices. Unfortunately, its still marked as EXPERIMENTAL by its own developers, and there are known bugs. However, you can choose to play with it in Ubuntu 10.04, which should be helpful for people finding and submitting bugs so the developers can feel better about people using it. There is a desire to have it as the default filesystem for the next Ubuntu LTS release, which is pretty exciting.
  • Monitoring is too easy – Any time I see 10+ implementations of the same idea, I figure its probably something that is easy enough that people tend to write their own instead of searching for a solution. Monitoring and graphing seem to be in this category, with many solutions such as nagios, opennms, zenoss, munin, ganglia… the list goes on and on. We talked a lot about what to do in Ubuntu Server to make sure this is done well and makes sense, and basically ran out of time. The best part of the session though, was that we decided to focus on solving the data collection problem first, so each server takes responsibility for itself, and then allow centralized aggregation on another level.
  • Server Community – There is some desire to have people test Ubuntu Server before a release, especially for the LTS releases. A beta program was proposed, but there is some doubt (my own included) that this will actually get people to test before the .0 release. Basically I have to think that as a server admin, people aren’t interested in even trying something in an unstable state. They’ll take the .0 and build a new server rev, but they’re not going to go around upgrading stable servers. This needs more thought and discussion definitely.

Sitting in the first session for Wednesday now listening to a session about the next 6 months of Ubuntu Enterprise Cloud and Eucalyptus development. Very exciting stuff!


May 12, 2010 at 7:57 am Comments (0)

Newer Posts »