SpamapS.org – Full Frontal Nerdity

Clint Byrum's Personal Stuff

The bitter part of the Bittersweet news

Excellent.

Excellent!

"Vest over t-shirt pwns half-shirt, Bill!"

That is the word that I would use to describe the work done by my fellow engineers at Canonical over the past 2.5 years. However, it is time to move on.

Bill and Ted say "whu?"

"I think I'm gonna hurl, Bill"

Its not an easy thing to move on from what is truly the best job I’ve ever had. However, it is time. I’ll discuss more here after my last day at Canonical, which will be very soon, December 5th. Suffice to say, I won’t disappear from Ubuntu, so stay tuned!

November 30, 2012 at 4:42 pm Comments (0)

Juju Community Survey

If you’ve heard of Juju (and chances are, if you’re reading my blog, you have!), take 5 minutes and go take our Juju Community Survey

November 6, 2012 at 10:22 pm Comments (0)

Nagios is from Mars, and MySQL is from Venus (Monitoring Part 2)

In my previous post about Nagios, I showed how the rich Nagios charm simplifies adding basic monitoring to Juju environments. But, we need more than that. Services know more about how to verify they are working than a monitoring system could ever guess.

So, for that, we have the ‘monitors‘ interface. Currently sparsely documented, as it is extremely new, the idea is simple.

  • Providing charms define what monitoring systems should look at generically
  • Requiring charms monitor any of the things they can, ignoring those they cannot.

This is defined through a YAML file. Here is the example.monitors.yaml included with the nagios charm:

# Version of the spec, mostly ignored but 0.3 is the current one
version: '0.3'
# Dict with just 'local' and 'remote' as parts
monitors:
    # local monitors need an agent to be handled. See nrpe charm for
    # some example implementations
    local:
        # procrunning checks for a running process named X (no path)
        procrunning:
            # Multiple procrunning can be defined, this is the "name" of it
            nagios3:
                min: 1
                max: 1
                executable: nagios3
    # Remote monitors can be polled directly by a remote system
    remote:
        # do a request on the HTTP protocol
        http:
            nagios:
                port: 80
                path: /nagios3/
                # expected status response (otherwise just look for 200)
                status: 'HTTP/1.1 401'
                # Use as the Host: header (the server address will still be used to connect() to)
                host: www.fewbar.com
        mysql:
            # Named basic check
            basic:
                username: monitors
                password: abcdefg123456

There are two main classes of monitors: local, remote. This is in reference to the service unit’s location. Local monitors are intended to be run inside the same machine/container as the service. Remote monitors are then, quite obviously, meant to be run outside the machine/container. So, above, you see remote monitors for the mysql protocol and http, and a local monitor to see if processes are running.

The MySQL charm now includes some of these monitors:

version: '0.3'
monitors:
    local:
        procrunning:
            mysqld:
                name: MySQL Running
                min: 1
                max: 1
                executable: mysqld
    remote:
        mysql:
            basic:
                user: monitors

The remote part is fed directly to nagios, which knows how to monitor mysql remotely, and so translates it into a check_mysql command. The local bits are ignored by Nagios. But, when we also relate the subordinate charm NRPE to a MySQL service, then we’ll have an agent which understands local. It actually converts those into remote monitors of type ‘nrpe’ which Nagios does understand. So upon relating NRPE to Nagios, each subordinate unit feeds its unique NRPE monitors back to Nagios and they are added to the target units’ monitors.

Honestly, this all sounds very complicated. But luckily, you don’t have to really grasp it to take advantage of it in a charm. The whole point is this: All one needs to do is write a monitors.yaml, and add the monitors relation with this joined hook to your charm:

#!/bin/bash
# .. Anything you need to do to enable the monitoring host to access ports/users/etc goes here
relation-set monitors="$(cat monitors.yaml)" target-id=${JUJU_UNIT_NAME//\//-} target-address=$(unit-get private-address)

If you have local things you want to give to a monitoring agent, you can use the ‘local-monitors’ interface, which is basically the same as monitors, but only ever used in container scoped relations required by subordinate charms such as NRPE or collectd.

Now you can easily provide monitors to any monitoring system. If Nagios doesn’t support what you want to monitor, its fairly easy to add support. And as more monitoring systems are charmed and have the monitors interface added, your charm will be more useful out of the box.

In the next post, which will wrap up this series on monitoring, I’ll talk about how to add monitors support to some other monitoring systems such as collectd, and also how to write a subordinate charm to communicate your monitors to an external monitoring service.

September 5, 2012 at 5:50 am Comments (0)

Juju and Nagios, sittin’ in a tree.. (Part 1)

Monitoring. Could it get any more nerdy than monitoring? Well I think we can make monitoring cool again…

 

If you’re using Juju, Nagios is about to get a lot easier to leverage into your environment. Anyone who has ever tried to automate their Nagios configuration, knows that it can be daunting. Nagios is so flexible and has so many options, its hard to get right when doing it by hand. Automating it requires even more thought. Part of this is because monitoring itself is a bit hard to genercise. There are lots of types of monitors. Nagios really focuses on two of these:

  • Service monitoring – Make a script that pretends to be a user and see if your synthetic monitor sees what you expect.
  • Resource monitoring – Look at the counters and metrics afforded a user of a normal system.

The trick is, the service monitoring wants to interrogate the real services from outside of the machine, while the resource monitoring wants to see things only visible with privileged access. This is why we have NRPE, or “Nagios Remote Plugin Executor” (and NSCA, and munin, but ignore those for now). NRPE is a little daemon that runs on a server and will run a nagios plugin script, returning the result when asked by Nagios. With this you get those privileged things like how much RAM and disk space is used. Normally when you want to use Nagios, you need to sit down and figure out how to tell it to monitor all of your stuff. This involves creating generic objects, figuring out how to get your list of hosts into nagios’s config files, and how to get the classifications for said hosts into nagios. Does anybody trying to make sure their pager goes off when things are broken actually want to learn Nagios? So, here’s how to get Nagios in your Juju environment. First lets assume you have deployed a stack of applications.

juju deploy mysql wikidb                # single MySQL db server
juju deploy haproxy wikibalancer        # and single haproxy load balancer
juju deploy -n 5 mediawiki wiki-app     # 5 app-server nodes to handle mediawiki
juju deploy memcached wiki-cache        # memcached
juju add-relation wikidb:db wiki-app:db # use wikidb service as r/w db for app
juju add-relation wiki-app wikibalancer # load balance wiki-app behind haproxy
juju add-relation wiki-cache wiki-app   # use wiki-cache service for wiki-app

This gives one a nice stack of services that is pretty common in most applications today, with a DB and cache for persistent and ephemeral storage and then many app nodes to scale the heavy lifting.

Now you have your app running, but what about when it breaks? How will you find out? Well this is where Nagios comes in:

juju deploy nagios                          # custom nagios charm
juju add-relation nagios wikidb             # monitor wikidb via nagios
juju add-relation nagios wiki-app           # ""
juju add-relation nagios wikibalancer       # ""

You now should have nagios monitoring things. You can check it out by exposing it and then browsing to the hostname of the nagios instance at ‘http://x.x.x.x/nagios3′. You can find out the password for the ‘nagiosadmin’ user by catting a file that the charm leaves for this purpose:

juju ssh nagios/0 sudo cat /var/lib/juju/nagios.passwd

Now, the checks are very sparse at the moment. This is because we have used the generic monitoring interface which can just monitor the basic things (SSH, ping, etc). We can add some resource monitoring by deploying NRPE:

juju deploy nrpe                          # create a subordinate NRPE service
juju add-relation nrpe wikibalancer       # Put NRPE on wikibalancer
juju add-relation nrpe wiki-app           # Put NRPE on wiki-app
juju add-relation nrpe:monitors nagios:monitors # Tells Nagios to monitor all NRPEs

Now we will get memory stats, root filesystem, etc.

You may have noticed we left off wikidb, that is because it will show you an ambiguous relation warning when you try this:

juju add-relation nrpe wikidb # Put NRPE on wikidb

ERROR Ambiguous relation 'nrpe mysql'; could refer to:
  'nrpe:general-info mysql:juju-info' (juju-info client / juju-info server)
  'nrpe:local-monitors mysql:local-monitors' (local-monitors client / local-monitors server)

This is because mysql has special support to be able to specify its own local monitors in addition to those in the usual basic group (more on this in part 2). To get around this we use:

juju add-relation nrpe:local-monitors wikidb:local-monitors

 

This is a perfect example of how Juju’s encapsulation around services pays off for re-usability. By wrapping a service like Nagios in a charm, we can start to really develop a set of best practices for using that service and collaborate around making it better for everyone.

Of course, Chef and Puppet users can get this done with existing Nagios modules. Puppet, in particular, has really great Nagios support. However, I want to take a step back and explain why I think Juju has a place along side those methods and will accelerate systems engineering in new directions.

While there is some level of encapsulation in the methods that Chef and Puppet put forth, they’re not fully encapsulated in the way that they interact with other components in a Chef or Puppet system. In most cases, you still have to edit your own service configs to add specific Nagios integration. This works for the custom case, but it does not make it easy for users to collaborate on the way to deploy well known systems. It will also be hard to swap out components for new, better methods as they emerge. Every time you mention Nagios in your code, you are pushing Nagios deeper into your system engineering.

With the method I’ve outlined above, any charmed service can be monitored for basic stats (including the 80 or so that are in the official charm store). You might ask though, what about custom Nagios plugins, or specifying more elaborate but somewhat generic service checks. That is all coming. I will show some examples in my next post about this. I will also go on later to show how Nagios + NRPE can be replaced with collectd, or some other system, without changing the charms that have implemented rich monitoring support.

So, while this at least starts to bring the official Nagios charm up to par with configuration management’s rich Nagios ability, it also sets the stage for replacing Nagios with other things. The key difference here is that as you’ll see in the next few parts, none of the charms will have to mention “Nagios”. They’ll just describe what things to monitor, and Nagios, Collectd, or whatever other system you have in place will find a way to interpret that and monitor it.

August 7, 2012 at 11:47 pm Comments (0)

Juju constraints unbinds your machines

This week, William “I code more than you will ever be able to” Reade announced that Juju has a new feature called ‘Constraints’.

This is really, really cool and brings juju into a new area of capability for deploying big and little sites.

To be clear, this allows you to abstract things pretty effectively.

Consider this:

juju deploy mysql --constraints mem=10G
juju deploy statusnet --constraints cpu=1

This will result in your mysql service being on an extra large instance since it has 15GB of RAM. Your statusnet instances will be m1.small’s since that will have just 1 ECU.

Even cooler than this is now if you want a mysql slave in a different availability zone:

juju deploy mysql --constraints ec2-zone=a mysql-a
juju deploy mysql --constraints ec2-zone=b mysql-b
juju add-relation mysql-a:master mysql-b:slave
juju add-relation statusnet mysql-a

Now if mysql-a goes down

juju remove-relation statusnet mysql-a
juju add-relation statusnet mysql-b

Much and more is possible, but this really does make juju even more compelling as a tool for simple, easy deployment. Edit: fixed ec2-zone to be the single character, per William’s feedback.

April 16, 2012 at 10:48 pm Comments (0)

Configurate your juju’s

I was reading Jorge’s Stomp Box earlier today, and somebody mentioned how it would be an even better trick if it were easier to configure juju quickly.

Ask and ye shall receive. I hacked a new sub-command into the experimental ‘juju-jitsu’ wrapper. I’ll let the scrape from my terminal do the talking. You can get it with:


bzr branch lp:juju-jitsu

And try it with


juju-jitsu/wrap-juju
juju setup-environment

clint@clint-MacBookPro:~/src/juju-jitsu/juju-jitsu$ ./wrap-juju
Aliasing juju to /home/clint/src/juju-jitsu/juju-jitsu/juju-jitsu-wrapper...
(juju-jitsu) clint@clint-MacBookPro:~/src/juju-jitsu/juju-jitsu$ juju setup-environment
Name for environment name : mybox
What provider do you want to use? (ec2,local) type [local]:
Default "series", a.k.a. release codename of Ubunt default-series [precise]:
local dir to store logs/directory structure/charm data-dir [~/.juju/data]:
environments:
mybox:
data-dir: ~/.juju/data
default-series: precise
type: local

Do you want to

[s]ave this to /home/clint/.juju/environments.yaml
[d]iff with existing /home/clint/.juju/environments.yaml
[q]uit

[sdq]: d
diff: /home/clint/.juju/environments.yaml: No such file or directory
[sdq]: s
(juju-jitsu) clint@clint-MacBookPro:~/src/juju-jitsu/juju-jitsu$ cat ~/.juju/environments.yaml
environments:
mybox:
data-dir: ~/.juju/data
default-series: precise
type: local
(juju-jitsu) clint@clint-MacBookPro:~/src/juju-jitsu/juju-jitsu$ juju setup-environment
Name for environment name : mycloud
What provider do you want to use? (ec2,local) type [local]: ec2
Default "series", a.k.a. release codename of Ubunt default-series [precise]:
S3 Bucket to store data in control-bucket [juju-jitsu-D8mzlogDmvPjTpASolJnXK6HxwAW8YA8]:
Zookeeper Secret admin-secret [qUJIUiwji-jiAN-XTSC1ztxUrm2XrYys]:
(AWS_ACCESS_KEY_ID) access-key [XXXXXXXXXXXXXXXXXXXX]:
(AWS_SECRET_ACCESS_KEY) secret-key [xYxYxYxYxYxYxYxYxYxY/WvWvWvWvWvWv]:
Default Instance Type (m1.small, c1.medium, etc default-instance-type :
Default AMI default-image-id :
EC2 Region region :
EC2 URI ec2-uri :
S3 URI s3-uri :
environments:
mybox:
data-dir: ~/.juju/data
default-series: precise
type: local
mycloud:
access-key: XXXXXXXXXXXXXXXXXXXX
admin-secret: qUJIUiwji-jiAN-XTSC1ztxUrm2XrYys
control-bucket: juju-jitsu-D8mzlogDmvPjTpASolJnXK6HxwAW8YA8
default-series: precise
secret-key: xYxYxYxYxYxYxYxYxYxY/WvWvWvWvWvWv
type: ec2

Do you want to

[s]ave this to /home/clint/.juju/environments.yaml
[d]iff with existing /home/clint/.juju/environments.yaml
[q]uit

[sdq]:
[sdq]: d
--- /home/clint/.juju/environments.yaml 2012-03-15 16:36:47.939298045 -0700
+++ /home/clint/.juju/.environments.yaml.QqwCt_ 2012-03-15 16:37:20.629394484 -0700
@@ -3,3 +3,10 @@
data-dir: ~/.juju/data
default-series: precise
type: local
+ mycloud:
+ access-key: XXXXXXXXXXXXXXXXXXXX
+ admin-secret: qUJIUiwji-jiAN-XTSC1ztxUrm2XrYys
+ control-bucket: juju-jitsu-D8mzlogDmvPjTpASolJnXK6HxwAW8YA8
+ default-series: precise
+ secret-key: xYxYxYxYxYxYxYxYxYxY/WvWvWvWvWvWv
+ type: ec2
[sdq]: s
2012-03-15 16:37:23,799 juju-jitsu Backing up /home/clint/.juju/environments.yaml to /home/clint/.juju/environments.yaml.2
(juju-jitsu) clint@clint-MacBookPro:~/src/juju-jitsu/juju-jitsu$ bzr info
Repository tree (format: 2a)
Location:
shared repository: /home/clint/src/juju-jitsu
repository branch: .

Related branches:
push branch: bzr+ssh://bazaar.launchpad.net/+branch/juju-jitsu/
parent branch: bzr+ssh://bazaar.launchpad.net/+branch/juju-jitsu/
(juju-jitsu) clint@clint-MacBookPro:~/src/juju-jitsu/juju-jitsu$
March 15, 2012 at 11:46 pm Comments (0)

Precise is coming

Almost 2 years ago, I stepped out of my comfort zone at a “SaaS” web company and joined the Canonical Server Team to work on Ubuntu Server development full time.

I didn’t really grasp what I had walked into, joining the team right after an LTS release. The 10.04 release was a monumental effort that spanned the previous 2 years. Call me a nerd if you want, but I get excited about a Free, unified desktop and server OS built entirely in the open, out of open source components, fully supported for 5 years on the server.

Winter, and the Precise Pangolin, are coming

And now, we’re about to do it again. Precise beta1 is looking really solid, and I am immensely proud to have been a tiny part of that.

So, what did we do on the sever team that has led to precise’s awesomeness:

Ubuntu 10.10 “Maverick Meerkat” – We helped out with getting CEPH into Debian and Ubuntu for 10.10, which proved to be important as it gave users a way to try out CEPH. CEPH will ship in main, and fully supported by Canonical in 12.04, which is pretty exciting! This was also the first release to feature pieces of OpenStack.

Ubuntu 11.04 “Natty Narwhal”Upstart got a lot better for server users in 11.04 with the addition of “override” files, and the shiny new Upstart Cookbook. We also finally figured out how to coordinate complicated boot sequences without having to rewrite upstart to track state. I wasn’t personally involved, but we also shipped the first really usable OpenStack release, “cactus”.

Ubuntu 11.10 “Oneiric Ocelot” – It seems small, but we fixed boot-up race conditions caused by services which need their network interfaces to be up before they start. Upstart also landed full chroot support, so you can run a chroot with its own upstart services inside of it, which is important for some use cases. This release also featured the debut of Juju, which is a new way to deploy and manage network services and applications.

Ubuntu 12.04 “Precise Pangolin” – OpenStack Essex is huge. Full keystone integration, lots of new features, and lots of satellite projects. Juju has really grown into a useful project now (give it a spin!). We also were able to transition to MySQL 5.5, which was no small feat. The amount of automated continuous integration testing that has gone into the precise cycle is staggering, and continues to grow as test cases are added. We’ll never find all the bugs this way, but we’ve at least found many of them before they ever reached a stable release this time.

There’s so much more in each of these, its amazing how much has been improved and refined in Ubuntu Server in just 2 years.

I’m pumped. A new LTS is exciting for us in Ubuntu Development, as it refocuses the more conservative users on all the work we’ve been doing. I would love to hear any feedback from the greater community. This is going to be great!

March 1, 2012 at 10:47 pm Comments (0)

But will it scale? – Taking Limesurvey horizontal with juju…

One of the really cool things about using the cloud, and especially juju, is that it instantly enables things that often times take a lot of thought to even try out in traditional environments. While I was developing some little PHP apps “back in the day”, I knew eventually they’d need to go to more than one server, but testing them for that meant, well, finding and configuring multiple servers. Even with VMs, I had to go allocate one and configure it. Oops, I’m out of time, throw it on one server, pray, move to next task.

This left a very serious question in my mind.. “When the time comes, will my app actually scale?”

Have I forgotten some huge piece to make sure it is stateless, or will it scale horizontally the way I intended it to? Things have changed though, and now we have the ability to start virtual machines via an API on several providers, and actually *test* whether our app will scale.

This brings us to our story. Recently, Nick Barcet created a juju charm for Limesurvey. This is a really cool little app that lets users create rich, multi faceted surveys and invite the internet to vote on things, answer questions, etc. etc. This is your standard “LAMP” application, and it seems written in a way that will allow it to scale out.

However, when Nick submitted the charm for the official juju charms collection, I wanted to see if it actually would scale the way I knew LAMP apps should. So, I fired up juju on ec2, threw in some haproxy, and related it to my limesurvey service, and then started adding units. This is incredibly simple with juju:

juju deploy --repository charms local:mysql
juju deploy --repository charms local:limesurvey
juju deploy --repository charms local:haproxy
juju add-relation mysql limesurvey
juju add-relation limesurvey haproxy
juju add-unit limesurvey
juju expose haproxy

Lo and behold, it didn’t scale. There were a few issues with the default recommendations of limesurvey that Nick had encoded into the charm. These were simple things, like assuming that the local hostname would be the hostname people use to access the site.

Once that was solved, there were some other scaling problems immediately revealed. First on the ticket was that Limesurvey, by default, uses MyISAM for its storage engine in MySQL. This is a huge mistake, and I can’t imagine why *anybody* would use MyISAM in a modern application. MyISAM uses a “whole table” locking scheme for both reads and writes, so whenever anything writes to any part of the table, all reads and writes must wait for that to finish. InnoDB, available since MySQL 4.0, and the default storage engine for MySQL 5.5 and later, doesn’t suffer from this problem as it implements an MVCC model and row-level locks to allow concurrent reads and writes.

The MyISAM locks caused request timeouts when I pointed siege at the load balancer, because too many requests were stacking up waiting for updates to complete before even reading from the table. This is especially critical on something like the session storage that limesurvey does in the database, as it effectively meant that only one user can do anything at a time with the database.

Scalability testing in 10 minutes or less, with a server investment of about $1US. Who knew it could be this easy? Granted, I stopped at three app server nodes, and we didn’t even get to scaling out the database (something limesurvey doesn’t really have native support for). But these are things that are already solved, and that have been encoded in charms already. Now we just have to suggest small app changes to allow users to take advantage of all those well know best practices sitting in charms.

(check the bug comments for the results, I’d be interested if somebody wants to repeat the test).

So, in a situation where one needs to deploy now, and scale later, I think juju will prove quite useful. It should be on anybody’s radar who wants to get off the ground quickly.

December 23, 2011 at 1:41 am Comments (0)

‘service foo restart’ on Ubuntu 12.04 will “Do the right thing”

Just a quick note.. I was watching Artur Bergman’s rant about Full Stack awareness at Velocity Europe with glee, until I saw how we, the Ubuntu devs, had drawn his ire for breaking something so simple, ‘restart’.

I’ve heard this before, and I realized that its true! Restart should ignore that the service is stopped, and just start it, otherwise just restart it as I requested.

So, first off, instead of:

/etc/init.d/apache2 restart

or (for upstart controlled services)

restart mysql

get in the habit of doing:

service apache2 restart

This command will do the right thing more often than not (including clearing out the environment for sysvinit jobs, which can help solve the “why does the service only work when I restart it manually”). And as of 12.04, it will start doing the right thing with restart too.

Its not perfect, and there are still bugs in upstart, but the version of ‘sysvinit’ that I just uploaded to Precise Pangolin (the future 12.04 / LTS release of Ubuntu) at least gives scripters and sysadmins a chance at uniformity by making upstart jobs and sysvinit scripts do the same thing with restart. I don’t think we can backport this all the way to 10.04, so sorry for that. However, please, users, keep the rants coming.. we need them!

November 16, 2011 at 1:38 am Comments (0)

Juju ODS Demo – The Home Version

A few weeks ago I gave a live demo during Canonical CEO Jane Silber’s keynote at the Essex OpenStack Conference, which was held in Boston October 4-7 (See my previous post for details of the conference and summit). The demo was meant to showcase our new favorite cloud technology at Canonical, juju. In order to do this, we deployed hadoop on top of our private OpenStack cloud (also deployed earlier in the week via juju and Ubuntu Orchestra) and fed it a “real” workload (a big giant chunk of data to sort) in less than 5 minutes.

I’ve had a few requests to explain how it works, so, here is a step by step on how to repeat said demo.

First, you need to setup juju to be able to talk to your cloud. The simplest way to do this is to sign up for an AWS account on Amazon, and get EC2 credentials (a secret key and a key ID is needed).

If you install juju in Ubuntu 11.10, or from the daily build PPA in any other release, you’ll get a skeleton environments.yaml just by running ‘juju’.

Once this is done, edit ~/.juju/environments.yaml to add your access-key: and secret-key:. Optionally, you can set them in AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in the environment.

Now, you need the “magic” bit that turns juju status changes into commands for the “gource” source code visualization tool. Its available here:

http://bazaar.launchpad.net/~clint-fewbar/juju/gource-output/view/head:/misc/status2gource.py

(wgettable here)

http://bazaar.launchpad.net/~clint-fewbar/juju/gource-output/download/head:/status2gource.py-20110908235607-pfnddi4d114nl8qd-1/status2gource.py

You’ll also need to install the ‘gource’ visualization tool. I only tried this on Ubuntu 11.10, but it is available on other releases as well.

Make sure your desired target environment is either the only one in .juju/environments.yaml, or set to be the default with ‘default: xxxx’ at the root of the file. You need ‘juju status’ to return something meaningful (after bootstrap) for status2gource.py to work.

Now, in its own terminal, run this, note that cof_orange_hex.png is part of the official Ubuntu logo packs, but I forget where I got that. You may omit that commandline argument if you like, and a generic “person” image will be used.

python -u status2gource.py | gource --highlight-dirs \
--file-idle-time 1000000 \
--log-format custom \
--default-user-image cof_orange_hex.png \
--user-friction 0.5 \
-

This will not show anything until juju bootstrap is done and ‘juju status’ shows the machine 0 running. If you already have services deployed, it should build the tree rapidly.

So next if you haven’t done it already

juju bootstrap

Once your instance starts up, you should see a gource window pop up and the first two bits, the bootstrap node and the machine 0 node, will be added.

Once this is done, you can just deploy/add-relation/etc. to your heart’s content.

To setup a local repo of charms, we did this:

mkdir charms
bzr init-repo charms/oneiric
cd charms/oneiric
bzr branch lp:~mark-mims/+junk/charm-hadoop-master hadoop-master
bzr branch lp:~mark-mims/+junk/charm-hadoop-slave hadoop-slave
bzr branch lp:~mark-mims/+junk/charm-ganglia ganglia

Those particular charms were specifically made for the demo, but most of the changes have been folded back in to the main “charm collection”, so you can probabl change lp:~mark-mims/+junk/charm- to lp:charm/.

You will also need a file in your current directory called ‘config.yaml’ with this content:

namenode: job_size: 100000 job_maps: 10 job_reduces: 10 job_data_dir: in_one job_output_dir: out_one 
These numbers heavily control how the job runs with 1 or 100 hadoop instances. If you want to spend a couple of bucks in Amazon, and fire up 20 nodes, then raise job_maps to 100 and job_reduces to 100. Also job_size to 10000000. Otherwise its over very fast!

We started the demo after bootstrap was already done, so the next step is to deploy Hadoop/HDFS and ganglia to keep an eye on the nodes as they came up.

juju deploy --repository . --config config.yaml hadoop-master namenode
juju deploy --repository . hadoop-slave datacluster
juju deploy --repository . ganglia jobmonitor
juju add-relation namenode datacluster
juju add-relation datacluster jobmonitor
juju expose jobmonitor

This should get you a tree in gource showing all of the machines, services, and relations that are setup.

You can scale out hadoop next with this command. Here I only create 4, but it could be 100.. depending on how fast you need your data map/reduced.

for i in 1 2 3 4 ; do juju add-unit datacluster ; done

Finally, to start the teragen/terasort:

juju ssh namenode/0

$ sudo su -u hdfs sh /usr/lib/hadoop/terasort.sh

You may also want to note the hostname of the machine assigned to the jobmonitor node so you can bring it up in a browser. You will be able to see it in ‘juju status’.

Its worth noting that we had a fail rate of about 1 in 20 tries while practicing the demo because of this bug:

https://bugs.launchpad.net/juju/+bug/872378

This causes the “juju expose jobmonitor” to fail, which means you may not be able to reach the ganglia instance. You can fix this by stopping/starting the provisioning agent on the bootstrap node. That is easier said than done, but can be scripted. Its fixed in juju’s trunk, so if you are using the daily build, not the distro version, you shouldn’t see that issue.

So once you’re done, you’ll probably want to get rid of all these nodes you’ve created. Juju has a tool that strips everything down that it has brought up, which can be dangerous if you have data on the nodes, so be careful!


juju destroy-environment

It does not have a ‘–force’ or ‘-y’, by design. Make sure to keep the gource running when you do this. Say ‘y’, and then enjoy the show at the end. :)

I’d be interested to hear from anybody who is brave enough to try this how their experience is!

October 22, 2011 at 12:03 am Comments (0)

« Older Posts