– Full Frontal Nerdity

Clint Byrum's Personal Stuff

Juju Community Survey

If you’ve heard of Juju (and chances are, if you’re reading my blog, you have!), take 5 minutes and go take our Juju Community Survey

November 6, 2012 at 10:22 pm Comments (0)

Nagios is from Mars, and MySQL is from Venus (Monitoring Part 2)

In my previous post about Nagios, I showed how the rich Nagios charm simplifies adding basic monitoring to Juju environments. But, we need more than that. Services know more about how to verify they are working than a monitoring system could ever guess.

So, for that, we have the ‘monitors‘ interface. Currently sparsely documented, as it is extremely new, the idea is simple.

  • Providing charms define what monitoring systems should look at generically
  • Requiring charms monitor any of the things they can, ignoring those they cannot.

This is defined through a YAML file. Here is the example.monitors.yaml included with the nagios charm:

# Version of the spec, mostly ignored but 0.3 is the current one
version: '0.3'
# Dict with just 'local' and 'remote' as parts
    # local monitors need an agent to be handled. See nrpe charm for
    # some example implementations
        # procrunning checks for a running process named X (no path)
            # Multiple procrunning can be defined, this is the "name" of it
                min: 1
                max: 1
                executable: nagios3
    # Remote monitors can be polled directly by a remote system
        # do a request on the HTTP protocol
                port: 80
                path: /nagios3/
                # expected status response (otherwise just look for 200)
                status: 'HTTP/1.1 401'
                # Use as the Host: header (the server address will still be used to connect() to)
            # Named basic check
                username: monitors
                password: abcdefg123456

There are two main classes of monitors: local, remote. This is in reference to the service unit’s location. Local monitors are intended to be run inside the same machine/container as the service. Remote monitors are then, quite obviously, meant to be run outside the machine/container. So, above, you see remote monitors for the mysql protocol and http, and a local monitor to see if processes are running.

The MySQL charm now includes some of these monitors:

version: '0.3'
                name: MySQL Running
                min: 1
                max: 1
                executable: mysqld
                user: monitors

The remote part is fed directly to nagios, which knows how to monitor mysql remotely, and so translates it into a check_mysql command. The local bits are ignored by Nagios. But, when we also relate the subordinate charm NRPE to a MySQL service, then we’ll have an agent which understands local. It actually converts those into remote monitors of type ‘nrpe’ which Nagios does understand. So upon relating NRPE to Nagios, each subordinate unit feeds its unique NRPE monitors back to Nagios and they are added to the target units’ monitors.

Honestly, this all sounds very complicated. But luckily, you don’t have to really grasp it to take advantage of it in a charm. The whole point is this: All one needs to do is write a monitors.yaml, and add the monitors relation with this joined hook to your charm:

# .. Anything you need to do to enable the monitoring host to access ports/users/etc goes here
relation-set monitors="$(cat monitors.yaml)" target-id=${JUJU_UNIT_NAME//\//-} target-address=$(unit-get private-address)

If you have local things you want to give to a monitoring agent, you can use the ‘local-monitors’ interface, which is basically the same as monitors, but only ever used in container scoped relations required by subordinate charms such as NRPE or collectd.

Now you can easily provide monitors to any monitoring system. If Nagios doesn’t support what you want to monitor, its fairly easy to add support. And as more monitoring systems are charmed and have the monitors interface added, your charm will be more useful out of the box.

In the next post, which will wrap up this series on monitoring, I’ll talk about how to add monitors support to some other monitoring systems such as collectd, and also how to write a subordinate charm to communicate your monitors to an external monitoring service.

September 5, 2012 at 5:50 am Comments (0)

Juju and Nagios, sittin’ in a tree.. (Part 1)

Monitoring. Could it get any more nerdy than monitoring? Well I think we can make monitoring cool again…


If you’re using Juju, Nagios is about to get a lot easier to leverage into your environment. Anyone who has ever tried to automate their Nagios configuration, knows that it can be daunting. Nagios is so flexible and has so many options, its hard to get right when doing it by hand. Automating it requires even more thought. Part of this is because monitoring itself is a bit hard to genercise. There are lots of types of monitors. Nagios really focuses on two of these:

  • Service monitoring – Make a script that pretends to be a user and see if your synthetic monitor sees what you expect.
  • Resource monitoring – Look at the counters and metrics afforded a user of a normal system.

The trick is, the service monitoring wants to interrogate the real services from outside of the machine, while the resource monitoring wants to see things only visible with privileged access. This is why we have NRPE, or “Nagios Remote Plugin Executor” (and NSCA, and munin, but ignore those for now). NRPE is a little daemon that runs on a server and will run a nagios plugin script, returning the result when asked by Nagios. With this you get those privileged things like how much RAM and disk space is used. Normally when you want to use Nagios, you need to sit down and figure out how to tell it to monitor all of your stuff. This involves creating generic objects, figuring out how to get your list of hosts into nagios’s config files, and how to get the classifications for said hosts into nagios. Does anybody trying to make sure their pager goes off when things are broken actually want to learn Nagios? So, here’s how to get Nagios in your Juju environment. First lets assume you have deployed a stack of applications.

juju deploy mysql wikidb                # single MySQL db server
juju deploy haproxy wikibalancer        # and single haproxy load balancer
juju deploy -n 5 mediawiki wiki-app     # 5 app-server nodes to handle mediawiki
juju deploy memcached wiki-cache        # memcached
juju add-relation wikidb:db wiki-app:db # use wikidb service as r/w db for app
juju add-relation wiki-app wikibalancer # load balance wiki-app behind haproxy
juju add-relation wiki-cache wiki-app   # use wiki-cache service for wiki-app

This gives one a nice stack of services that is pretty common in most applications today, with a DB and cache for persistent and ephemeral storage and then many app nodes to scale the heavy lifting.

Now you have your app running, but what about when it breaks? How will you find out? Well this is where Nagios comes in:

juju deploy nagios                          # custom nagios charm
juju add-relation nagios wikidb             # monitor wikidb via nagios
juju add-relation nagios wiki-app           # ""
juju add-relation nagios wikibalancer       # ""

You now should have nagios monitoring things. You can check it out by exposing it and then browsing to the hostname of the nagios instance at ‘http://x.x.x.x/nagios3′. You can find out the password for the ‘nagiosadmin’ user by catting a file that the charm leaves for this purpose:

juju ssh nagios/0 sudo cat /var/lib/juju/nagios.passwd

Now, the checks are very sparse at the moment. This is because we have used the generic monitoring interface which can just monitor the basic things (SSH, ping, etc). We can add some resource monitoring by deploying NRPE:

juju deploy nrpe                          # create a subordinate NRPE service
juju add-relation nrpe wikibalancer       # Put NRPE on wikibalancer
juju add-relation nrpe wiki-app           # Put NRPE on wiki-app
juju add-relation nrpe:monitors nagios:monitors # Tells Nagios to monitor all NRPEs

Now we will get memory stats, root filesystem, etc.

You may have noticed we left off wikidb, that is because it will show you an ambiguous relation warning when you try this:

juju add-relation nrpe wikidb # Put NRPE on wikidb

ERROR Ambiguous relation 'nrpe mysql'; could refer to:
  'nrpe:general-info mysql:juju-info' (juju-info client / juju-info server)
  'nrpe:local-monitors mysql:local-monitors' (local-monitors client / local-monitors server)

This is because mysql has special support to be able to specify its own local monitors in addition to those in the usual basic group (more on this in part 2). To get around this we use:

juju add-relation nrpe:local-monitors wikidb:local-monitors


This is a perfect example of how Juju’s encapsulation around services pays off for re-usability. By wrapping a service like Nagios in a charm, we can start to really develop a set of best practices for using that service and collaborate around making it better for everyone.

Of course, Chef and Puppet users can get this done with existing Nagios modules. Puppet, in particular, has really great Nagios support. However, I want to take a step back and explain why I think Juju has a place along side those methods and will accelerate systems engineering in new directions.

While there is some level of encapsulation in the methods that Chef and Puppet put forth, they’re not fully encapsulated in the way that they interact with other components in a Chef or Puppet system. In most cases, you still have to edit your own service configs to add specific Nagios integration. This works for the custom case, but it does not make it easy for users to collaborate on the way to deploy well known systems. It will also be hard to swap out components for new, better methods as they emerge. Every time you mention Nagios in your code, you are pushing Nagios deeper into your system engineering.

With the method I’ve outlined above, any charmed service can be monitored for basic stats (including the 80 or so that are in the official charm store). You might ask though, what about custom Nagios plugins, or specifying more elaborate but somewhat generic service checks. That is all coming. I will show some examples in my next post about this. I will also go on later to show how Nagios + NRPE can be replaced with collectd, or some other system, without changing the charms that have implemented rich monitoring support.

So, while this at least starts to bring the official Nagios charm up to par with configuration management’s rich Nagios ability, it also sets the stage for replacing Nagios with other things. The key difference here is that as you’ll see in the next few parts, none of the charms will have to mention “Nagios”. They’ll just describe what things to monitor, and Nagios, Collectd, or whatever other system you have in place will find a way to interpret that and monitor it.

August 7, 2012 at 11:47 pm Comments (0)

Juju constraints unbinds your machines

This week, William “I code more than you will ever be able to” Reade announced that Juju has a new feature called ‘Constraints’.

This is really, really cool and brings juju into a new area of capability for deploying big and little sites.

To be clear, this allows you to abstract things pretty effectively.

Consider this:

juju deploy mysql --constraints mem=10G
juju deploy statusnet --constraints cpu=1

This will result in your mysql service being on an extra large instance since it has 15GB of RAM. Your statusnet instances will be m1.small’s since that will have just 1 ECU.

Even cooler than this is now if you want a mysql slave in a different availability zone:

juju deploy mysql --constraints ec2-zone=a mysql-a
juju deploy mysql --constraints ec2-zone=b mysql-b
juju add-relation mysql-a:master mysql-b:slave
juju add-relation statusnet mysql-a

Now if mysql-a goes down

juju remove-relation statusnet mysql-a
juju add-relation statusnet mysql-b

Much and more is possible, but this really does make juju even more compelling as a tool for simple, easy deployment. Edit: fixed ec2-zone to be the single character, per William’s feedback.

April 16, 2012 at 10:48 pm Comments (0)

Configurate your juju’s

I was reading Jorge’s Stomp Box earlier today, and somebody mentioned how it would be an even better trick if it were easier to configure juju quickly.

Ask and ye shall receive. I hacked a new sub-command into the experimental ‘juju-jitsu’ wrapper. I’ll let the scrape from my terminal do the talking. You can get it with:

bzr branch lp:juju-jitsu

And try it with

juju setup-environment

clint@clint-MacBookPro:~/src/juju-jitsu/juju-jitsu$ ./wrap-juju
Aliasing juju to /home/clint/src/juju-jitsu/juju-jitsu/juju-jitsu-wrapper...
(juju-jitsu) clint@clint-MacBookPro:~/src/juju-jitsu/juju-jitsu$ juju setup-environment
Name for environment name : mybox
What provider do you want to use? (ec2,local) type [local]:
Default "series", a.k.a. release codename of Ubunt default-series [precise]:
local dir to store logs/directory structure/charm data-dir [~/.juju/data]:
data-dir: ~/.juju/data
default-series: precise
type: local

Do you want to

[s]ave this to /home/clint/.juju/environments.yaml
[d]iff with existing /home/clint/.juju/environments.yaml

[sdq]: d
diff: /home/clint/.juju/environments.yaml: No such file or directory
[sdq]: s
(juju-jitsu) clint@clint-MacBookPro:~/src/juju-jitsu/juju-jitsu$ cat ~/.juju/environments.yaml
data-dir: ~/.juju/data
default-series: precise
type: local
(juju-jitsu) clint@clint-MacBookPro:~/src/juju-jitsu/juju-jitsu$ juju setup-environment
Name for environment name : mycloud
What provider do you want to use? (ec2,local) type [local]: ec2
Default "series", a.k.a. release codename of Ubunt default-series [precise]:
S3 Bucket to store data in control-bucket [juju-jitsu-D8mzlogDmvPjTpASolJnXK6HxwAW8YA8]:
Zookeeper Secret admin-secret [qUJIUiwji-jiAN-XTSC1ztxUrm2XrYys]:
(AWS_SECRET_ACCESS_KEY) secret-key [xYxYxYxYxYxYxYxYxYxY/WvWvWvWvWvWv]:
Default Instance Type (m1.small, c1.medium, etc default-instance-type :
Default AMI default-image-id :
EC2 Region region :
EC2 URI ec2-uri :
S3 URI s3-uri :
data-dir: ~/.juju/data
default-series: precise
type: local
admin-secret: qUJIUiwji-jiAN-XTSC1ztxUrm2XrYys
control-bucket: juju-jitsu-D8mzlogDmvPjTpASolJnXK6HxwAW8YA8
default-series: precise
secret-key: xYxYxYxYxYxYxYxYxYxY/WvWvWvWvWvWv
type: ec2

Do you want to

[s]ave this to /home/clint/.juju/environments.yaml
[d]iff with existing /home/clint/.juju/environments.yaml

[sdq]: d
--- /home/clint/.juju/environments.yaml 2012-03-15 16:36:47.939298045 -0700
+++ /home/clint/.juju/.environments.yaml.QqwCt_ 2012-03-15 16:37:20.629394484 -0700
@@ -3,3 +3,10 @@
data-dir: ~/.juju/data
default-series: precise
type: local
+ mycloud:
+ admin-secret: qUJIUiwji-jiAN-XTSC1ztxUrm2XrYys
+ control-bucket: juju-jitsu-D8mzlogDmvPjTpASolJnXK6HxwAW8YA8
+ default-series: precise
+ secret-key: xYxYxYxYxYxYxYxYxYxY/WvWvWvWvWvWv
+ type: ec2
[sdq]: s
2012-03-15 16:37:23,799 juju-jitsu Backing up /home/clint/.juju/environments.yaml to /home/clint/.juju/environments.yaml.2
(juju-jitsu) clint@clint-MacBookPro:~/src/juju-jitsu/juju-jitsu$ bzr info
Repository tree (format: 2a)
shared repository: /home/clint/src/juju-jitsu
repository branch: .

Related branches:
push branch: bzr+ssh://
parent branch: bzr+ssh://
(juju-jitsu) clint@clint-MacBookPro:~/src/juju-jitsu/juju-jitsu$
March 15, 2012 at 11:46 pm Comments (0)

Precise is coming

Almost 2 years ago, I stepped out of my comfort zone at a “SaaS” web company and joined the Canonical Server Team to work on Ubuntu Server development full time.

I didn’t really grasp what I had walked into, joining the team right after an LTS release. The 10.04 release was a monumental effort that spanned the previous 2 years. Call me a nerd if you want, but I get excited about a Free, unified desktop and server OS built entirely in the open, out of open source components, fully supported for 5 years on the server.

Winter, and the Precise Pangolin, are coming

And now, we’re about to do it again. Precise beta1 is looking really solid, and I am immensely proud to have been a tiny part of that.

So, what did we do on the sever team that has led to precise’s awesomeness:

Ubuntu 10.10 “Maverick Meerkat” – We helped out with getting CEPH into Debian and Ubuntu for 10.10, which proved to be important as it gave users a way to try out CEPH. CEPH will ship in main, and fully supported by Canonical in 12.04, which is pretty exciting! This was also the first release to feature pieces of OpenStack.

Ubuntu 11.04 “Natty Narwhal”Upstart got a lot better for server users in 11.04 with the addition of “override” files, and the shiny new Upstart Cookbook. We also finally figured out how to coordinate complicated boot sequences without having to rewrite upstart to track state. I wasn’t personally involved, but we also shipped the first really usable OpenStack release, “cactus”.

Ubuntu 11.10 “Oneiric Ocelot” – It seems small, but we fixed boot-up race conditions caused by services which need their network interfaces to be up before they start. Upstart also landed full chroot support, so you can run a chroot with its own upstart services inside of it, which is important for some use cases. This release also featured the debut of Juju, which is a new way to deploy and manage network services and applications.

Ubuntu 12.04 “Precise Pangolin” – OpenStack Essex is huge. Full keystone integration, lots of new features, and lots of satellite projects. Juju has really grown into a useful project now (give it a spin!). We also were able to transition to MySQL 5.5, which was no small feat. The amount of automated continuous integration testing that has gone into the precise cycle is staggering, and continues to grow as test cases are added. We’ll never find all the bugs this way, but we’ve at least found many of them before they ever reached a stable release this time.

There’s so much more in each of these, its amazing how much has been improved and refined in Ubuntu Server in just 2 years.

I’m pumped. A new LTS is exciting for us in Ubuntu Development, as it refocuses the more conservative users on all the work we’ve been doing. I would love to hear any feedback from the greater community. This is going to be great!

March 1, 2012 at 10:47 pm Comments (0)

But will it scale? – Taking Limesurvey horizontal with juju…

One of the really cool things about using the cloud, and especially juju, is that it instantly enables things that often times take a lot of thought to even try out in traditional environments. While I was developing some little PHP apps “back in the day”, I knew eventually they’d need to go to more than one server, but testing them for that meant, well, finding and configuring multiple servers. Even with VMs, I had to go allocate one and configure it. Oops, I’m out of time, throw it on one server, pray, move to next task.

This left a very serious question in my mind.. “When the time comes, will my app actually scale?”

Have I forgotten some huge piece to make sure it is stateless, or will it scale horizontally the way I intended it to? Things have changed though, and now we have the ability to start virtual machines via an API on several providers, and actually *test* whether our app will scale.

This brings us to our story. Recently, Nick Barcet created a juju charm for Limesurvey. This is a really cool little app that lets users create rich, multi faceted surveys and invite the internet to vote on things, answer questions, etc. etc. This is your standard “LAMP” application, and it seems written in a way that will allow it to scale out.

However, when Nick submitted the charm for the official juju charms collection, I wanted to see if it actually would scale the way I knew LAMP apps should. So, I fired up juju on ec2, threw in some haproxy, and related it to my limesurvey service, and then started adding units. This is incredibly simple with juju:

juju deploy --repository charms local:mysql
juju deploy --repository charms local:limesurvey
juju deploy --repository charms local:haproxy
juju add-relation mysql limesurvey
juju add-relation limesurvey haproxy
juju add-unit limesurvey
juju expose haproxy

Lo and behold, it didn’t scale. There were a few issues with the default recommendations of limesurvey that Nick had encoded into the charm. These were simple things, like assuming that the local hostname would be the hostname people use to access the site.

Once that was solved, there were some other scaling problems immediately revealed. First on the ticket was that Limesurvey, by default, uses MyISAM for its storage engine in MySQL. This is a huge mistake, and I can’t imagine why *anybody* would use MyISAM in a modern application. MyISAM uses a “whole table” locking scheme for both reads and writes, so whenever anything writes to any part of the table, all reads and writes must wait for that to finish. InnoDB, available since MySQL 4.0, and the default storage engine for MySQL 5.5 and later, doesn’t suffer from this problem as it implements an MVCC model and row-level locks to allow concurrent reads and writes.

The MyISAM locks caused request timeouts when I pointed siege at the load balancer, because too many requests were stacking up waiting for updates to complete before even reading from the table. This is especially critical on something like the session storage that limesurvey does in the database, as it effectively meant that only one user can do anything at a time with the database.

Scalability testing in 10 minutes or less, with a server investment of about $1US. Who knew it could be this easy? Granted, I stopped at three app server nodes, and we didn’t even get to scaling out the database (something limesurvey doesn’t really have native support for). But these are things that are already solved, and that have been encoded in charms already. Now we just have to suggest small app changes to allow users to take advantage of all those well know best practices sitting in charms.

(check the bug comments for the results, I’d be interested if somebody wants to repeat the test).

So, in a situation where one needs to deploy now, and scale later, I think juju will prove quite useful. It should be on anybody’s radar who wants to get off the ground quickly.

December 23, 2011 at 1:41 am Comments (0)

Juju ODS Demo – The Home Version

A few weeks ago I gave a live demo during Canonical CEO Jane Silber’s keynote at the Essex OpenStack Conference, which was held in Boston October 4-7 (See my previous post for details of the conference and summit). The demo was meant to showcase our new favorite cloud technology at Canonical, juju. In order to do this, we deployed hadoop on top of our private OpenStack cloud (also deployed earlier in the week via juju and Ubuntu Orchestra) and fed it a “real” workload (a big giant chunk of data to sort) in less than 5 minutes.

I’ve had a few requests to explain how it works, so, here is a step by step on how to repeat said demo.

First, you need to setup juju to be able to talk to your cloud. The simplest way to do this is to sign up for an AWS account on Amazon, and get EC2 credentials (a secret key and a key ID is needed).

If you install juju in Ubuntu 11.10, or from the daily build PPA in any other release, you’ll get a skeleton environments.yaml just by running ‘juju’.

Once this is done, edit ~/.juju/environments.yaml to add your access-key: and secret-key:. Optionally, you can set them in AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in the environment.

Now, you need the “magic” bit that turns juju status changes into commands for the “gource” source code visualization tool. Its available here:

(wgettable here)

You’ll also need to install the ‘gource’ visualization tool. I only tried this on Ubuntu 11.10, but it is available on other releases as well.

Make sure your desired target environment is either the only one in .juju/environments.yaml, or set to be the default with ‘default: xxxx’ at the root of the file. You need ‘juju status’ to return something meaningful (after bootstrap) for to work.

Now, in its own terminal, run this, note that cof_orange_hex.png is part of the official Ubuntu logo packs, but I forget where I got that. You may omit that commandline argument if you like, and a generic “person” image will be used.

python -u | gource --highlight-dirs \
--file-idle-time 1000000 \
--log-format custom \
--default-user-image cof_orange_hex.png \
--user-friction 0.5 \

This will not show anything until juju bootstrap is done and ‘juju status’ shows the machine 0 running. If you already have services deployed, it should build the tree rapidly.

So next if you haven’t done it already

juju bootstrap

Once your instance starts up, you should see a gource window pop up and the first two bits, the bootstrap node and the machine 0 node, will be added.

Once this is done, you can just deploy/add-relation/etc. to your heart’s content.

To setup a local repo of charms, we did this:

mkdir charms
bzr init-repo charms/oneiric
cd charms/oneiric
bzr branch lp:~mark-mims/+junk/charm-hadoop-master hadoop-master
bzr branch lp:~mark-mims/+junk/charm-hadoop-slave hadoop-slave
bzr branch lp:~mark-mims/+junk/charm-ganglia ganglia

Those particular charms were specifically made for the demo, but most of the changes have been folded back in to the main “charm collection”, so you can probabl change lp:~mark-mims/+junk/charm- to lp:charm/.

You will also need a file in your current directory called ‘config.yaml’ with this content:

namenode: job_size: 100000 job_maps: 10 job_reduces: 10 job_data_dir: in_one job_output_dir: out_one 
These numbers heavily control how the job runs with 1 or 100 hadoop instances. If you want to spend a couple of bucks in Amazon, and fire up 20 nodes, then raise job_maps to 100 and job_reduces to 100. Also job_size to 10000000. Otherwise its over very fast!

We started the demo after bootstrap was already done, so the next step is to deploy Hadoop/HDFS and ganglia to keep an eye on the nodes as they came up.

juju deploy --repository . --config config.yaml hadoop-master namenode
juju deploy --repository . hadoop-slave datacluster
juju deploy --repository . ganglia jobmonitor
juju add-relation namenode datacluster
juju add-relation datacluster jobmonitor
juju expose jobmonitor

This should get you a tree in gource showing all of the machines, services, and relations that are setup.

You can scale out hadoop next with this command. Here I only create 4, but it could be 100.. depending on how fast you need your data map/reduced.

for i in 1 2 3 4 ; do juju add-unit datacluster ; done

Finally, to start the teragen/terasort:

juju ssh namenode/0

$ sudo su -u hdfs sh /usr/lib/hadoop/

You may also want to note the hostname of the machine assigned to the jobmonitor node so you can bring it up in a browser. You will be able to see it in ‘juju status’.

Its worth noting that we had a fail rate of about 1 in 20 tries while practicing the demo because of this bug:

This causes the “juju expose jobmonitor” to fail, which means you may not be able to reach the ganglia instance. You can fix this by stopping/starting the provisioning agent on the bootstrap node. That is easier said than done, but can be scripted. Its fixed in juju’s trunk, so if you are using the daily build, not the distro version, you shouldn’t see that issue.

So once you’re done, you’ll probably want to get rid of all these nodes you’ve created. Juju has a tool that strips everything down that it has brought up, which can be dangerous if you have data on the nodes, so be careful!

juju destroy-environment

It does not have a ‘–force’ or ‘-y’, by design. Make sure to keep the gource running when you do this. Say ‘y’, and then enjoy the show at the end. :)

I’d be interested to hear from anybody who is brave enough to try this how their experience is!

October 22, 2011 at 12:03 am Comments (0)

OpenStack – an amoeba on a mission

According to NASA, 70% of the earth is covered by clouds. Apparently, at least 70% of our computing needs can be covered by clouds as well. That seems to be the shared belief by the rather large crowd that gathered in Boston last week for the Essex edition of the OpenStack Design Summit and subsequent OpenStack Conference.

The amount of energy and corporate investment in OpenStack is staggering when one considers that it didn’t exist 2 years ago, and didn’t really do much more than spawn VM’s and store objects until this month with the Diablo release, which added some more capabilities, but from my point of view, mostly just refined those abilities and set the stage for the future.

Attending as a member of the Ubuntu Server team and a Canonical employee was quite a gratifying experience. Ubuntu Server has been the platform of choice for OpenStack’s development, and that has definitely led to a lot of people running OpenStack on Ubuntu Server. Its always nice to hear that your work is part of something greater.

On the surface, one might be concerned at a lack of vision in the OpenStack project. With so many competing interests, it may appear that it has no clear vision and is just growing toward the latest source of funding or food, much like an amoeba swallowing up its next meal. But the leadership of the project seems to understand that there is still a much greater mission here, that without intense focus the project will expend enormous energy and accomplish little more than falling a little less behind established players in the marketplace.

Its a bit vindicating for one of my more intense current interests, Juju, that others who are close to this discussion, like OpenStackers, are thinking along the same lines. In talking with Puppet and Chef guys and with people who are using the cloud, its clear to me that my hunch is right; chef and puppet are not really the same thing as Juju. The new project from Cisco, Donabe, seems to be thinking exactly like Juju, wanting to encapsulate and describe each service in what they call “Network Containers”. Also I’m told the desires of the Neutronium PaaS project are pretty similar as well.

Ultimately we don’t think that the current limitations of known PaaS stacks are always worth the effort to integrate with them. We do want to have a lot of the same capabilities without having to duplicate all the effort to set them up. We want to be able to make use of well understood technologies without having to understand every detail of their deployment and configuration. If I want to make use of MySQL or memcached, I should understand how they work, but I shouldn’t have to duplicate the effort that others have had to put in to make them work.

Chef and Puppet have made some inroads into this by making such things highly repeatable and getting them all into source control. However, its my belief that their implementations both limit the network effect that they can have to build up a full set of sharable services. Juju, I think, will really be a boost to those who have spent a lot on solid config management, as that config management will be easy to chop up into Juju charms, and then that will open up all the other existing charms for immediate use in such a shop.

Getting back to how this relates to OpenStack, it was also quite exhilarating to do a live keynote demo of Juju in all of its alpha glory. To raise the tightrope a little higher, it was driving OpenStack Diablo, which some might call beta-quality. We also got rid of the safety nets entirely, and had it running on top of Ubuntu 11.10 (pre-release). We had a few kinks through the week, but the awesome team I had around me was able to iron them all out and made both our CEO, Jane Silber, and me look very good up there. That includes my fellow server team members, the OpenStack developers, Canonical IS pro’s, the Juju dev team, and my main collaborator in the whole thing, Jorge Castro.

I hope to attend the next ODS, to see how much closer OpenStack is to completing its mission in 6 months. What is that mission currently? Quite simple really.. the mission is, figure out the mission.

October 9, 2011 at 6:07 am Comments (0)

CloudCamp San Diego – Wake up and smell the Enterprise

I took a little trip down to San Diego yesterday to see what these CloudCamp events are all about. There are so many, and they’re all over, I figure its a good chance to take a look at what might be the “Common man’s” view of the cloud. I spend so much time talking to people at a really deep level about what the cloud is, why we like it, why we hate it, etc. This “un-conference” was more about bringing a lot of that information, distilled for business owners and professionals who need to learn more about “this cloud thing”.

The lightning talks were quite basic. The most interesting one was given by a former lawyer who now runs IT for a medium sized law firm. Private cloud saves him money because he can now make a direct charge back to a client when they are taking up storage and computing space. This also allows him to keep his infrastructure more managable because they tend to give up resources more readily when there is a direct chargeback as opposed to just general service fees that try to cover this.

There was a breakout session about SQL vs. NoSQL. I joined and was shocked at how dominant the Microsoft representative was. She certainly tried to convince us “this isn’t about SQL Azure, its about SQL vs. NoSQL” but it was pretty much about all the things that suck more than SQL Azure, and about not mentioning anything that might compete directly with it. I brought up things like Drizzle, Cassandra, HDFS, Xeround, MongoDB, and MogileFS. These were all swiftly moved past, and not written on the white board. Her focus was on how SimpleDB differs from Amazon RDS, and how Microsoft Azure has its own key/value/column store for their cloud. The room was overpowered into silence for the most part.. there were about 20 developers and IT manager types in the room and they had no idea how this was going to help them take advantage of IaaS or PaaS clouds. I felt the session was interesting, but ultimately, completely pwned by the Microsoft rep. She ended by showing off 3D effects in their Silverlight based management tool. Anybody impressed deserves what they get, quite honestly.

One good thing that did come out of that session was the ensuing discussion for it where I ended up talking with a gentleman from a local San Diego startup that was just acquired. This is a startup of 3 people that is 100% in Amazon EC2 on Ubuntu with PHP and MySQL. They have their services spread accross 3 regions and were not affected at all by the recent outtages in us-east-1. Their feeling on the SQL Azure folks is that its for people who have money to burn. For him, he spends $3000 a month and it is entirely on EC2 instances and S3/EBS storage. The audience was stunned that it was so cheap, and that it was so easy to scale up and down as they add/remove clients. He echoed something that the MS guys said too.. that because their app was architected this way from the beginning, it was extremely cost effective, and wouldn’t even really save much money if they leased or owned servers instead of leasing instances, since they can calculate the costs and pass them directly on to the clients with this model, and their commitment is zero.

Later on I proposed a breakout session on how repeatable is your infrastructure (basically, infrastructure as code). There was almost no interest, as this was a very business oriented un-conference. The few people who attended were just using AMI’s to do everything. When something breaks, they fix it with parallel-ssh. For the one person who was using Windows in the cloud, he had no SSH, so fixing any system problems meant re-deploying his new AMI over and over.

Overall I thought it was interesting to see where the non-webops world is with knowledge of the cloud. I think the work we’re doing with Ensemble is really going to help people to deploy open source applications into private and public clouds so they don’t need 3D enabled silverlight interfaces to manage a simple database or a bug tracking system for their internal developers.

June 15, 2011 at 8:25 pm Comments (0)

« Older Posts