SpamapS.org – Full Frontal Nerdity

Clint Byrum's Personal Stuff

Juju ODS Demo – The Home Version

A few weeks ago I gave a live demo during Canonical CEO Jane Silber’s keynote at the Essex OpenStack Conference, which was held in Boston October 4-7 (See my previous post for details of the conference and summit). The demo was meant to showcase our new favorite cloud technology at Canonical, juju. In order to do this, we deployed hadoop on top of our private OpenStack cloud (also deployed earlier in the week via juju and Ubuntu Orchestra) and fed it a “real” workload (a big giant chunk of data to sort) in less than 5 minutes.

I’ve had a few requests to explain how it works, so, here is a step by step on how to repeat said demo.

First, you need to setup juju to be able to talk to your cloud. The simplest way to do this is to sign up for an AWS account on Amazon, and get EC2 credentials (a secret key and a key ID is needed).

If you install juju in Ubuntu 11.10, or from the daily build PPA in any other release, you’ll get a skeleton environments.yaml just by running ‘juju’.

Once this is done, edit ~/.juju/environments.yaml to add your access-key: and secret-key:. Optionally, you can set them in AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in the environment.

Now, you need the “magic” bit that turns juju status changes into commands for the “gource” source code visualization tool. Its available here:

http://bazaar.launchpad.net/~clint-fewbar/juju/gource-output/view/head:/misc/status2gource.py

(wgettable here)

http://bazaar.launchpad.net/~clint-fewbar/juju/gource-output/download/head:/status2gource.py-20110908235607-pfnddi4d114nl8qd-1/status2gource.py

You’ll also need to install the ‘gource’ visualization tool. I only tried this on Ubuntu 11.10, but it is available on other releases as well.

Make sure your desired target environment is either the only one in .juju/environments.yaml, or set to be the default with ‘default: xxxx’ at the root of the file. You need ‘juju status’ to return something meaningful (after bootstrap) for status2gource.py to work.

Now, in its own terminal, run this, note that cof_orange_hex.png is part of the official Ubuntu logo packs, but I forget where I got that. You may omit that commandline argument if you like, and a generic “person” image will be used.

python -u status2gource.py | gource --highlight-dirs \
--file-idle-time 1000000 \
--log-format custom \
--default-user-image cof_orange_hex.png \
--user-friction 0.5 \
-

This will not show anything until juju bootstrap is done and ‘juju status’ shows the machine 0 running. If you already have services deployed, it should build the tree rapidly.

So next if you haven’t done it already

juju bootstrap

Once your instance starts up, you should see a gource window pop up and the first two bits, the bootstrap node and the machine 0 node, will be added.

Once this is done, you can just deploy/add-relation/etc. to your heart’s content.

To setup a local repo of charms, we did this:

mkdir charms
bzr init-repo charms/oneiric
cd charms/oneiric
bzr branch lp:~mark-mims/+junk/charm-hadoop-master hadoop-master
bzr branch lp:~mark-mims/+junk/charm-hadoop-slave hadoop-slave
bzr branch lp:~mark-mims/+junk/charm-ganglia ganglia

Those particular charms were specifically made for the demo, but most of the changes have been folded back in to the main “charm collection”, so you can probabl change lp:~mark-mims/+junk/charm- to lp:charm/.

You will also need a file in your current directory called ‘config.yaml’ with this content:

namenode: job_size: 100000 job_maps: 10 job_reduces: 10 job_data_dir: in_one job_output_dir: out_one 
These numbers heavily control how the job runs with 1 or 100 hadoop instances. If you want to spend a couple of bucks in Amazon, and fire up 20 nodes, then raise job_maps to 100 and job_reduces to 100. Also job_size to 10000000. Otherwise its over very fast!

We started the demo after bootstrap was already done, so the next step is to deploy Hadoop/HDFS and ganglia to keep an eye on the nodes as they came up.

juju deploy --repository . --config config.yaml hadoop-master namenode
juju deploy --repository . hadoop-slave datacluster
juju deploy --repository . ganglia jobmonitor
juju add-relation namenode datacluster
juju add-relation datacluster jobmonitor
juju expose jobmonitor

This should get you a tree in gource showing all of the machines, services, and relations that are setup.

You can scale out hadoop next with this command. Here I only create 4, but it could be 100.. depending on how fast you need your data map/reduced.

for i in 1 2 3 4 ; do juju add-unit datacluster ; done

Finally, to start the teragen/terasort:

juju ssh namenode/0

$ sudo su -u hdfs sh /usr/lib/hadoop/terasort.sh

You may also want to note the hostname of the machine assigned to the jobmonitor node so you can bring it up in a browser. You will be able to see it in ‘juju status’.

Its worth noting that we had a fail rate of about 1 in 20 tries while practicing the demo because of this bug:

https://bugs.launchpad.net/juju/+bug/872378

This causes the “juju expose jobmonitor” to fail, which means you may not be able to reach the ganglia instance. You can fix this by stopping/starting the provisioning agent on the bootstrap node. That is easier said than done, but can be scripted. Its fixed in juju’s trunk, so if you are using the daily build, not the distro version, you shouldn’t see that issue.

So once you’re done, you’ll probably want to get rid of all these nodes you’ve created. Juju has a tool that strips everything down that it has brought up, which can be dangerous if you have data on the nodes, so be careful!


juju destroy-environment

It does not have a ‘–force’ or ‘-y’, by design. Make sure to keep the gource running when you do this. Say ‘y’, and then enjoy the show at the end. :)

I’d be interested to hear from anybody who is brave enough to try this how their experience is!

October 22, 2011 at 12:03 am Comments (0)

OpenStack – an amoeba on a mission

According to NASA, 70% of the earth is covered by clouds. Apparently, at least 70% of our computing needs can be covered by clouds as well. That seems to be the shared belief by the rather large crowd that gathered in Boston last week for the Essex edition of the OpenStack Design Summit and subsequent OpenStack Conference.

The amount of energy and corporate investment in OpenStack is staggering when one considers that it didn’t exist 2 years ago, and didn’t really do much more than spawn VM’s and store objects until this month with the Diablo release, which added some more capabilities, but from my point of view, mostly just refined those abilities and set the stage for the future.

Attending as a member of the Ubuntu Server team and a Canonical employee was quite a gratifying experience. Ubuntu Server has been the platform of choice for OpenStack’s development, and that has definitely led to a lot of people running OpenStack on Ubuntu Server. Its always nice to hear that your work is part of something greater.

On the surface, one might be concerned at a lack of vision in the OpenStack project. With so many competing interests, it may appear that it has no clear vision and is just growing toward the latest source of funding or food, much like an amoeba swallowing up its next meal. But the leadership of the project seems to understand that there is still a much greater mission here, that without intense focus the project will expend enormous energy and accomplish little more than falling a little less behind established players in the marketplace.

Its a bit vindicating for one of my more intense current interests, Juju, that others who are close to this discussion, like OpenStackers, are thinking along the same lines. In talking with Puppet and Chef guys and with people who are using the cloud, its clear to me that my hunch is right; chef and puppet are not really the same thing as Juju. The new project from Cisco, Donabe, seems to be thinking exactly like Juju, wanting to encapsulate and describe each service in what they call “Network Containers”. Also I’m told the desires of the Neutronium PaaS project are pretty similar as well.

Ultimately we don’t think that the current limitations of known PaaS stacks are always worth the effort to integrate with them. We do want to have a lot of the same capabilities without having to duplicate all the effort to set them up. We want to be able to make use of well understood technologies without having to understand every detail of their deployment and configuration. If I want to make use of MySQL or memcached, I should understand how they work, but I shouldn’t have to duplicate the effort that others have had to put in to make them work.

Chef and Puppet have made some inroads into this by making such things highly repeatable and getting them all into source control. However, its my belief that their implementations both limit the network effect that they can have to build up a full set of sharable services. Juju, I think, will really be a boost to those who have spent a lot on solid config management, as that config management will be easy to chop up into Juju charms, and then that will open up all the other existing charms for immediate use in such a shop.

Getting back to how this relates to OpenStack, it was also quite exhilarating to do a live keynote demo of Juju in all of its alpha glory. To raise the tightrope a little higher, it was driving OpenStack Diablo, which some might call beta-quality. We also got rid of the safety nets entirely, and had it running on top of Ubuntu 11.10 (pre-release). We had a few kinks through the week, but the awesome team I had around me was able to iron them all out and made both our CEO, Jane Silber, and me look very good up there. That includes my fellow server team members, the OpenStack developers, Canonical IS pro’s, the Juju dev team, and my main collaborator in the whole thing, Jorge Castro.

I hope to attend the next ODS, to see how much closer OpenStack is to completing its mission in 6 months. What is that mission currently? Quite simple really.. the mission is, figure out the mission.

October 9, 2011 at 6:07 am Comments (0)