SpamapS.org – Full Frontal Nerdity

Clint Byrum's Personal Stuff

Juju constraints unbinds your machines

This week, William “I code more than you will ever be able to” Reade announced that Juju has a new feature called ‘Constraints’.

This is really, really cool and brings juju into a new area of capability for deploying big and little sites.

To be clear, this allows you to abstract things pretty effectively.

Consider this:

juju deploy mysql --constraints mem=10G
juju deploy statusnet --constraints cpu=1

This will result in your mysql service being on an extra large instance since it has 15GB of RAM. Your statusnet instances will be m1.small’s since that will have just 1 ECU.

Even cooler than this is now if you want a mysql slave in a different availability zone:

juju deploy mysql --constraints ec2-zone=a mysql-a
juju deploy mysql --constraints ec2-zone=b mysql-b
juju add-relation mysql-a:master mysql-b:slave
juju add-relation statusnet mysql-a

Now if mysql-a goes down

juju remove-relation statusnet mysql-a
juju add-relation statusnet mysql-b

Much and more is possible, but this really does make juju even more compelling as a tool for simple, easy deployment. Edit: fixed ec2-zone to be the single character, per William’s feedback.

April 16, 2012 at 10:48 pm Comments (0)

But will it scale? – Taking Limesurvey horizontal with juju…

One of the really cool things about using the cloud, and especially juju, is that it instantly enables things that often times take a lot of thought to even try out in traditional environments. While I was developing some little PHP apps “back in the day”, I knew eventually they’d need to go to more than one server, but testing them for that meant, well, finding and configuring multiple servers. Even with VMs, I had to go allocate one and configure it. Oops, I’m out of time, throw it on one server, pray, move to next task.

This left a very serious question in my mind.. “When the time comes, will my app actually scale?”

Have I forgotten some huge piece to make sure it is stateless, or will it scale horizontally the way I intended it to? Things have changed though, and now we have the ability to start virtual machines via an API on several providers, and actually *test* whether our app will scale.

This brings us to our story. Recently, Nick Barcet created a juju charm for Limesurvey. This is a really cool little app that lets users create rich, multi faceted surveys and invite the internet to vote on things, answer questions, etc. etc. This is your standard “LAMP” application, and it seems written in a way that will allow it to scale out.

However, when Nick submitted the charm for the official juju charms collection, I wanted to see if it actually would scale the way I knew LAMP apps should. So, I fired up juju on ec2, threw in some haproxy, and related it to my limesurvey service, and then started adding units. This is incredibly simple with juju:

juju deploy --repository charms local:mysql
juju deploy --repository charms local:limesurvey
juju deploy --repository charms local:haproxy
juju add-relation mysql limesurvey
juju add-relation limesurvey haproxy
juju add-unit limesurvey
juju expose haproxy

Lo and behold, it didn’t scale. There were a few issues with the default recommendations of limesurvey that Nick had encoded into the charm. These were simple things, like assuming that the local hostname would be the hostname people use to access the site.

Once that was solved, there were some other scaling problems immediately revealed. First on the ticket was that Limesurvey, by default, uses MyISAM for its storage engine in MySQL. This is a huge mistake, and I can’t imagine why *anybody* would use MyISAM in a modern application. MyISAM uses a “whole table” locking scheme for both reads and writes, so whenever anything writes to any part of the table, all reads and writes must wait for that to finish. InnoDB, available since MySQL 4.0, and the default storage engine for MySQL 5.5 and later, doesn’t suffer from this problem as it implements an MVCC model and row-level locks to allow concurrent reads and writes.

The MyISAM locks caused request timeouts when I pointed siege at the load balancer, because too many requests were stacking up waiting for updates to complete before even reading from the table. This is especially critical on something like the session storage that limesurvey does in the database, as it effectively meant that only one user can do anything at a time with the database.

Scalability testing in 10 minutes or less, with a server investment of about $1US. Who knew it could be this easy? Granted, I stopped at three app server nodes, and we didn’t even get to scaling out the database (something limesurvey doesn’t really have native support for). But these are things that are already solved, and that have been encoded in charms already. Now we just have to suggest small app changes to allow users to take advantage of all those well know best practices sitting in charms.

(check the bug comments for the results, I’d be interested if somebody wants to repeat the test).

So, in a situation where one needs to deploy now, and scale later, I think juju will prove quite useful. It should be on anybody’s radar who wants to get off the ground quickly.

December 23, 2011 at 1:41 am Comments (0)

Juju ODS Demo – The Home Version

A few weeks ago I gave a live demo during Canonical CEO Jane Silber’s keynote at the Essex OpenStack Conference, which was held in Boston October 4-7 (See my previous post for details of the conference and summit). The demo was meant to showcase our new favorite cloud technology at Canonical, juju. In order to do this, we deployed hadoop on top of our private OpenStack cloud (also deployed earlier in the week via juju and Ubuntu Orchestra) and fed it a “real” workload (a big giant chunk of data to sort) in less than 5 minutes.

I’ve had a few requests to explain how it works, so, here is a step by step on how to repeat said demo.

First, you need to setup juju to be able to talk to your cloud. The simplest way to do this is to sign up for an AWS account on Amazon, and get EC2 credentials (a secret key and a key ID is needed).

If you install juju in Ubuntu 11.10, or from the daily build PPA in any other release, you’ll get a skeleton environments.yaml just by running ‘juju’.

Once this is done, edit ~/.juju/environments.yaml to add your access-key: and secret-key:. Optionally, you can set them in AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in the environment.

Now, you need the “magic” bit that turns juju status changes into commands for the “gource” source code visualization tool. Its available here:

http://bazaar.launchpad.net/~clint-fewbar/juju/gource-output/view/head:/misc/status2gource.py

(wgettable here)

http://bazaar.launchpad.net/~clint-fewbar/juju/gource-output/download/head:/status2gource.py-20110908235607-pfnddi4d114nl8qd-1/status2gource.py

You’ll also need to install the ‘gource’ visualization tool. I only tried this on Ubuntu 11.10, but it is available on other releases as well.

Make sure your desired target environment is either the only one in .juju/environments.yaml, or set to be the default with ‘default: xxxx’ at the root of the file. You need ‘juju status’ to return something meaningful (after bootstrap) for status2gource.py to work.

Now, in its own terminal, run this, note that cof_orange_hex.png is part of the official Ubuntu logo packs, but I forget where I got that. You may omit that commandline argument if you like, and a generic “person” image will be used.

python -u status2gource.py | gource --highlight-dirs \
--file-idle-time 1000000 \
--log-format custom \
--default-user-image cof_orange_hex.png \
--user-friction 0.5 \
-

This will not show anything until juju bootstrap is done and ‘juju status’ shows the machine 0 running. If you already have services deployed, it should build the tree rapidly.

So next if you haven’t done it already

juju bootstrap

Once your instance starts up, you should see a gource window pop up and the first two bits, the bootstrap node and the machine 0 node, will be added.

Once this is done, you can just deploy/add-relation/etc. to your heart’s content.

To setup a local repo of charms, we did this:

mkdir charms
bzr init-repo charms/oneiric
cd charms/oneiric
bzr branch lp:~mark-mims/+junk/charm-hadoop-master hadoop-master
bzr branch lp:~mark-mims/+junk/charm-hadoop-slave hadoop-slave
bzr branch lp:~mark-mims/+junk/charm-ganglia ganglia

Those particular charms were specifically made for the demo, but most of the changes have been folded back in to the main “charm collection”, so you can probabl change lp:~mark-mims/+junk/charm- to lp:charm/.

You will also need a file in your current directory called ‘config.yaml’ with this content:

namenode: job_size: 100000 job_maps: 10 job_reduces: 10 job_data_dir: in_one job_output_dir: out_one 
These numbers heavily control how the job runs with 1 or 100 hadoop instances. If you want to spend a couple of bucks in Amazon, and fire up 20 nodes, then raise job_maps to 100 and job_reduces to 100. Also job_size to 10000000. Otherwise its over very fast!

We started the demo after bootstrap was already done, so the next step is to deploy Hadoop/HDFS and ganglia to keep an eye on the nodes as they came up.

juju deploy --repository . --config config.yaml hadoop-master namenode
juju deploy --repository . hadoop-slave datacluster
juju deploy --repository . ganglia jobmonitor
juju add-relation namenode datacluster
juju add-relation datacluster jobmonitor
juju expose jobmonitor

This should get you a tree in gource showing all of the machines, services, and relations that are setup.

You can scale out hadoop next with this command. Here I only create 4, but it could be 100.. depending on how fast you need your data map/reduced.

for i in 1 2 3 4 ; do juju add-unit datacluster ; done

Finally, to start the teragen/terasort:

juju ssh namenode/0

$ sudo su -u hdfs sh /usr/lib/hadoop/terasort.sh

You may also want to note the hostname of the machine assigned to the jobmonitor node so you can bring it up in a browser. You will be able to see it in ‘juju status’.

Its worth noting that we had a fail rate of about 1 in 20 tries while practicing the demo because of this bug:

https://bugs.launchpad.net/juju/+bug/872378

This causes the “juju expose jobmonitor” to fail, which means you may not be able to reach the ganglia instance. You can fix this by stopping/starting the provisioning agent on the bootstrap node. That is easier said than done, but can be scripted. Its fixed in juju’s trunk, so if you are using the daily build, not the distro version, you shouldn’t see that issue.

So once you’re done, you’ll probably want to get rid of all these nodes you’ve created. Juju has a tool that strips everything down that it has brought up, which can be dangerous if you have data on the nodes, so be careful!


juju destroy-environment

It does not have a ‘–force’ or ‘-y’, by design. Make sure to keep the gource running when you do this. Say ‘y’, and then enjoy the show at the end. :)

I’d be interested to hear from anybody who is brave enough to try this how their experience is!

October 22, 2011 at 12:03 am Comments (0)

CloudCamp San Diego – Wake up and smell the Enterprise

I took a little trip down to San Diego yesterday to see what these CloudCamp events are all about. There are so many, and they’re all over, I figure its a good chance to take a look at what might be the “Common man’s” view of the cloud. I spend so much time talking to people at a really deep level about what the cloud is, why we like it, why we hate it, etc. This “un-conference” was more about bringing a lot of that information, distilled for business owners and professionals who need to learn more about “this cloud thing”.

The lightning talks were quite basic. The most interesting one was given by a former lawyer who now runs IT for a medium sized law firm. Private cloud saves him money because he can now make a direct charge back to a client when they are taking up storage and computing space. This also allows him to keep his infrastructure more managable because they tend to give up resources more readily when there is a direct chargeback as opposed to just general service fees that try to cover this.

There was a breakout session about SQL vs. NoSQL. I joined and was shocked at how dominant the Microsoft representative was. She certainly tried to convince us “this isn’t about SQL Azure, its about SQL vs. NoSQL” but it was pretty much about all the things that suck more than SQL Azure, and about not mentioning anything that might compete directly with it. I brought up things like Drizzle, Cassandra, HDFS, Xeround, MongoDB, and MogileFS. These were all swiftly moved past, and not written on the white board. Her focus was on how SimpleDB differs from Amazon RDS, and how Microsoft Azure has its own key/value/column store for their cloud. The room was overpowered into silence for the most part.. there were about 20 developers and IT manager types in the room and they had no idea how this was going to help them take advantage of IaaS or PaaS clouds. I felt the session was interesting, but ultimately, completely pwned by the Microsoft rep. She ended by showing off 3D effects in their Silverlight based management tool. Anybody impressed deserves what they get, quite honestly.

One good thing that did come out of that session was the ensuing discussion for it where I ended up talking with a gentleman from a local San Diego startup that was just acquired. This is a startup of 3 people that is 100% in Amazon EC2 on Ubuntu with PHP and MySQL. They have their services spread accross 3 regions and were not affected at all by the recent outtages in us-east-1. Their feeling on the SQL Azure folks is that its for people who have money to burn. For him, he spends $3000 a month and it is entirely on EC2 instances and S3/EBS storage. The audience was stunned that it was so cheap, and that it was so easy to scale up and down as they add/remove clients. He echoed something that the MS guys said too.. that because their app was architected this way from the beginning, it was extremely cost effective, and wouldn’t even really save much money if they leased or owned servers instead of leasing instances, since they can calculate the costs and pass them directly on to the clients with this model, and their commitment is zero.

Later on I proposed a breakout session on how repeatable is your infrastructure (basically, infrastructure as code). There was almost no interest, as this was a very business oriented un-conference. The few people who attended were just using AMI’s to do everything. When something breaks, they fix it with parallel-ssh. For the one person who was using Windows in the cloud, he had no SSH, so fixing any system problems meant re-deploying his new AMI over and over.

Overall I thought it was interesting to see where the non-webops world is with knowledge of the cloud. I think the work we’re doing with Ensemble is really going to help people to deploy open source applications into private and public clouds so they don’t need 3D enabled silverlight interfaces to manage a simple database or a bug tracking system for their internal developers.

June 15, 2011 at 8:25 pm Comments (0)

So what is Ensemble anyway?

Have you heard of Ensemble? Are you excited about Cloud/Service Orchestration? What? Ok you’re not alone if you are scratching your head.

Ensemble is an implementation of a new idea that has been taking shape the last couple of years. Ever since Amazon hooked up a remote API to thousands of machines to provide access to their virtual infrastructure (and called it macaroni? err.. AWS), people have been dreaming up ways to take advantage of what is basically a robotic “NOC guy”. No longer do you have to pre-rack servers or call your vendor frantically to get servers sent next-day to your colo. Right?

Naturally, the system administrators that would normally be in charge of racking servers, applied their existing tools to the job, to mixed success. Config management is really good at modelling identical hosts. But with virtual hosts instantly available, this left those thinking at a higher level wanting more. Chef in particular implemented a nice set of tools and functionality to allow this high level “service” definition with their knife tools and simple ruby API.

But how easy are Chef’s cookbooks to share and use without modification? How easy are they to integrate together? Puppet has modules that are also capable of similar functionality, and the recent integration of Mcollective, plus puppet Faces, has certainly added a lot of the same things Chef had to support this kind of application modelling, but again, the modules seem to require a lot of convention and assumption, and tweaking to get useful.

Its my opinion, that this is very much like the way tarballs+autoconf became the de-facto standard for distributing free software. It was *so much* better than writing a Makefile by hand, and it achieved an enormous amount of portability, so developers adopted it rapidly. In fact, it is still the dominant way to distribute portable open source applications.

But at some point, the limitations of this became clear. There was a need for something more concise, that could distribute both the source, and binaries, built for a platform. There was some limited early success with tarballs built by convention. But then, Enter RPM and DPKG. These included ways to express facts about software, like its dependencies, architecture, and the revisions made to it to work on the target platform. This allowed distributors of software to more easily maintain their systems, and enabled users to manage the software in their environments.

At that point, some smart guy figured out that we should be able to download and automatically configure all of the software needed for one application to work properly, just from its packaging information. To my mind, apt-get was my first experience with this, though FreeBSD ports authors may disagree there. Either way, this made it very easy for admins and users to install software without spending hours in the 7 levels of dependency hell.

In many ways, Service Orchestration is a way of bringing the benefits of packaging to the cloud. It should allow us to build out our cloud in a sane way, taking advantage of the knowledge that has been gained by others. For the bits that we need to finely tune, it should step aside and allow that without compromising the system.

Ensemble is an implementation of this idea, and Principia is a collection of “Formulas” for Ensemble. They are tightly coupled to Ubuntu, as they are in many ways meant to be the dpkg and apt-get for Ubuntu in the cloud.

Its pretty easy to try out Ensemble and Principia on Ubuntu. Right now you’ll need an EC2 account with an access key setup, though we’re working on making this work with just your local machine for rapid development.

Its been pointed out to me that the version of principia-tools that was available at the time of this writing didn’t include /usr/share/principia-tools/tests. I’ve uploaded a fixed version to the ensemble PPA, so if you tried these instructions and failed, please try updating principia-tools. If that fails, you can get the tests with bzr branch lp:principia-tools.


sudo add-apt-repository ppa:ensemble/ppa
sudo apt-get update
sudo apt-get install principia-tools
export AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxx
export AWS_SECRET_KEY_ID=0123456789ABCDEF
ensemble bootstrap
principia getall /some/path/for/formulas
/usr/share/principia-tools/tests/mediawiki.sh /some/path/for/formulas

What does this give you, well it should give you a 7 node mediawiki cluster of t1.micro’s in the us-east-1 region of EC2. I just ran it and now I have this:

machines:
  0: {dns-name: ec2-50-19-158-109.compute-1.amazonaws.com, instance-id: i-215dd84f}
  1: {dns-name: ec2-50-17-16-228.compute-1.amazonaws.com, instance-id: i-8d58dde3}
  2: {dns-name: ec2-72-44-49-114.compute-1.amazonaws.com, instance-id: i-9558ddfb}
  3: {dns-name: ec2-50-19-47-106.compute-1.amazonaws.com, instance-id: i-6d5bde03}
  4: {dns-name: ec2-174-129-132-248.compute-1.amazonaws.com, instance-id: i-7f5bde11}
  5: {dns-name: ec2-50-19-152-136.compute-1.amazonaws.com, instance-id: i-755bde1b}
  6: {dns-name: '', instance-id: i-4b5bde25}
services:
  demo-wiki:
    formula: local:mediawiki-62
    relations: {cache: wiki-cache, db: wiki-db, website: wiki-balancer}
    units:
      demo-wiki/0:
        machine: 2
        relations: {}
        state: null
      demo-wiki/1:
        machine: 6
        relations: {}
        state: null
  wiki-balancer:
    formula: local:haproxy-13
    relations: {reverseproxy: demo-wiki}
    units:
      wiki-balancer/0:
        machine: 4
        relations: {}
        state: null
  wiki-cache:
    formula: local:memcached-10
    relations: {cache: demo-wiki}
    units:
      wiki-cache/0:
        machine: 3
        relations: {}
        state: null
      wiki-cache/1:
        machine: 5
        relations: {}
        state: null
  wiki-db:
    formula: local:mysql-93
    relations: {db: demo-wiki}
    units:
      wiki-db/0:
        machine: 1
        relations: {}
        state: null

At the top you see the machines that ensemble spun up in EC2 in the ‘machines’ section. The numbers there correspond to the ‘machine: #’ in the service/units definitions below. If you look through, you’ll see above that wiki-balancer is machine 4, which has a hostname of ec2-174-129-132-248.compute-1.amazonaws.com. If you go to that hostname, once all relations are up (I like to use ‘watch ensemble status’ to see when this happens), you should see a working mediawiki. But not just a working mediawiki, a scalable one. If you want to pour on the traffic, spin up 3 more demo-wiki’s to handle the app server load:


ensemble add-unit demo-wiki
ensemble add-unit demo-wiki
ensemble add-unit demo-wiki

These will of course take a minute or two to spin up. Once they’re ready they’ll show up in the status output:

services:
  demo-wiki:
    formula: local:mediawiki-62
    relations: {cache: wiki-cache, db: wiki-db, website: wiki-balancer}
    units:
      demo-wiki/0:
        machine: 2
        relations:
          cache: {state: up}
          db: {state: up}
          website: {state: up}
        state: started
      demo-wiki/1:
        machine: 6
        relations:
          cache: {state: up}
          db: {state: up}
          website: {state: up}
        state: started
      demo-wiki/2:
        machine: 7
        relations:
          cache: {state: up}
          db: {state: up}
          website: {state: up}
        state: started
      demo-wiki/3:
        machine: 8
        relations:
          cache: {state: up}
          db: {state: up}
          website: {state: up}
        state: started
      demo-wiki/4:
        machine: 9
        relations:
          cache: {state: up}
          db: {state: up}
          website: {state: up}
        state: started

How about a little test then? After I got to this point, I logged in as WikiSysop (change the password folks! its change-me) and imported the Wikipedia exports for “Ubuntu” and “EC2″. After that I used harvestman to spider the site and then saved all the urls in a file, urls.txt. Alright! Now lets fire up *siege* from a machine outside the cluster, but in the same availability zone / security group (so at least we’re only dealing with EC2′s latency and not my net connection), and see if we can take this cluster down!


$ siege -i -c 5 -f urls.txt
...
Transactions: 563 hits
Availability: 100.00 %
Elapsed time: 95.58 secs
Data transferred: 2.64 MB
Response time: 0.35 secs
Transaction rate: 5.89 trans/sec
Throughput: 0.03 MB/sec
Concurrency: 2.04
Successful transactions: 544
Failed transactions: 0
Longest transaction: 13.54
Shortest transaction: 0.00

This is, btw, the best run I got out of t1.micro’s. Sometimes it would get quite ugly:


Transactions: 892 hits
Availability: 99.55 %
Elapsed time: 221.69 secs
Data transferred: 3.64 MB
Response time: 0.61 secs
Transaction rate: 4.02 trans/sec
Throughput: 0.02 MB/sec
Concurrency: 2.45
Successful transactions: 849
Failed transactions: 4
Longest transaction: 27.41
Shortest transaction: 0.00

Lets try the whole thing over with m1.small. First I edit ~/.ensemble/environments.yaml and add an override for the default-instance-type:


ensemble: environments

environments:
  sample:
    type: ec2
    default-instance-type: m1.small
    control-bucket: ensemble-12345678901234567890
    admin-secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Then I re-run the whole test:


Transactions: 290 hits
Availability: 98.98 %
Elapsed time: 81.79 secs
Data transferred: 0.78 MB
Response time: 0.53 secs
Transaction rate: 3.55 trans/sec
Throughput: 0.01 MB/sec
Concurrency: 1.89
Successful transactions: 277
Failed transactions: 3
Longest transaction: 1.50
Shortest transaction: 0.00

Oops! I forgot to add my 3 extra nodes. Note that these two m1.smalls are already almost keeping up. Now as I add these, I keep siege running. Its pretty cool to watch the response times drop as nodes come online to carry some of the load.

Now with 5 m1.small’s:


Transactions: 273 hits
Availability: 100.00 %
Elapsed time: 54.27 secs
Data transferred: 0.99 MB
Response time: 0.47 secs
Transaction rate: 5.03 trans/sec
Throughput: 0.02 MB/sec
Concurrency: 2.38
Successful transactions: 260
Failed transactions: 0
Longest transaction: 19.92
Shortest transaction: 0.00

And with higher concurrency raised from 5 to 10:


Transactions: 327 hits
Availability: 100.00 %
Elapsed time: 42.20 secs
Data transferred: 1.30 MB
Response time: 0.66 secs
Transaction rate: 7.75 trans/sec
Throughput: 0.03 MB/sec
Concurrency: 5.12
Successful transactions: 318
Failed transactions: 0
Longest transaction: 25.51
Shortest transaction: 0.00

And now if we add 2 more, for a total of 7 nodes, concurrency of 10 gets even better:


Transactions: 531 hits
Availability: 100.00 %
Elapsed time: 53.37 secs
Data transferred: 1.75 MB
Response time: 0.44 secs
Transaction rate: 9.95 trans/sec
Throughput: 0.03 MB/sec
Concurrency: 4.35
Successful transactions: 507
Failed transactions: 0
Longest transaction: 15.49
Shortest transaction: 0.00

And with 2 more (total of 9 units in demo-wiki serving the app):


Transactions: 354 hits
Availability: 100.00 %
Elapsed time: 34.41 secs
Data transferred: 1.23 MB
Response time: 0.41 secs
Transaction rate: 10.29 trans/sec
Throughput: 0.04 MB/sec
Concurrency: 4.22
Successful transactions: 337
Failed transactions: 0
Longest transaction: 11.45
Shortest transaction: 0.00

Anyway, this isn’t a Mediawiki benchmark. This is to show you how easy it is to scale up and down in response to load with Ensemble. We all know that scaling out works, these graphs show it nicely:

Response Time
Transactions per Second

Notice how the transactions/second went up all the time, but the response time went up drastically with the jump in concurrency. This is where you need to have the ability to scale quickly, and where, if you can live with the other limitations of EC2 or any other IaaS provider, the cloud should actually win you business, since better response time means more happy users.

Now that my siege is over, I can safely remove the unnecessary units one by one with ‘ensemble remove-unit demo-wiki/9′, etc. etc. There’s still a lot of room for sugar to be added. We could say “ensemble resize-service demo-wiki 5″ and it might just pick 5 to keep and remove the rest, or add 3 to fulfill the request. There are also a ton of other ideas just bubbling up that are really exciting.

Come say hi and hack on ensemble with us in Freenode, #ubuntu-ensemble and on the mailing list on the mailing list.

June 3, 2011 at 6:53 pm Comments (0)

presenting “blog on a narwhal”

Since we’re just about to 11.04 beta2, I figured its high time I start using Ubuntu Server for my personal blog.

What? Almost a year at Canonical and my blog wasn’t on Ubuntu server? Well, for over 5 years now, a personal friend has provided me with a free Xen virtual machine to run my blog on. I migrated it off of Debian then, which was sad for me, but back then I was so focused on working I didn’t have time or resources to be picky, so I said OK.

Fast forward to now, I’ve been working on Ubuntu Server and getting ribbed by my co-workers about that “crappy CentOS xen box” they’d see me logged into.

Well thats all over now. I decided to marry all the new tech I’ve been playing with lately into one glorious blog migration.

The old blog was:

  • Xen domU
  • 500MB RAM allocated
  • 9GB storage
  • CentOS 5.5
  • Apache + mod_php (5.3.5 from IUS project)
  • MySQL 5.0.77
  • WordPress 3.0.5 manually installed single-site

The new hotness is:

  • EC2 t1.micro (its upgradable! ;)
  • 692MB RAM
  • 8GB EBS
  • nginx + php5-fpm (5.3.5 from natty)
  • Drizzle 2011.03.13 (wordpress-plugin 0.0.2)
  • WordPress 3.0.5 from natty in multisite mode

The steps to migrate weren’t altogether complicated. A bit of configuration for nginx to have it serve my PHP using php5-fpm, and copying most of wp-content over. Drizzle couldn’t have been more straight forward:

  • Install drizzle7-client from EPEL on CentOS vm
  • drizzledump blog database (drizzledump automatically converts mysql schemas to drizzle compatible ones)
  • load it into drizzle on Ubuntu server

WordPress still needs *some* help to use Drizzle. Much of this will be handled by the wordpress-drizzle package from my ppa (add-apt-repository ppa:clint-fewbar/drizzle) which filters DDL to change things like LONGTEXT to TEXT. Because Drizzle has done away with the eeeeevil of datetimes with  0000-00-00 as their date (a non-existent date), we need to change all instances of that to ’0001-01-01′. In the future I’d like to see this abstracted out of wordpress even more so that it is more aware of the datetime fields and can use actual NULL values. I believe this can be done in the plugin by overloading the insert/update methods. I’ve begun working on that, but for now I’ll just have to keep patching wp-includes/post.php , which seems to be the main user of 0000-00-00 to denote a “draft” post.

We also have to alter the wp_posts table slightly. Thats because wordpress relies on mysql’s broken “NOT NULL” producing an “empty string” in varchars. This ALTER does that:

ALTER TABLE wp_posts MODIFY COLUMN post_mime_type VARCHAR(100) COLLATE utf8_general_ci DEFAULT '';

Anyway, goodbye CentOS, hello Ubuntu!

April 13, 2011 at 7:26 am Comments (0)

Puppet Camp Report: Two very different days

I attended Puppet Camp in San Francisco this month, thanks to my benevolent employer Canonical’s sponsorship of the event.

It was quite an interesting ride. I’d consider myself an intermediate level puppet user, having only edited existing puppet configurations and used it for proof of concept work, not actual giant deployments. I went in large part to get in touch with users and potential users of Ubuntu Server to see what they think of it now, and what they want out of it in the future. Also Puppet is a really interesting technology that I think will be a key part of this march into the cloud that we’ve all begun.

The state of Puppet

This talk was given by Luke, and was a very frank discussion of where puppet is and where it should be going. He discussed in brief where puppet labs fit in to this discussion as well. In brief, puppet is stable and growing. Upon taking a survey of puppet users, the overwhelming majority are sysadmins, which is no surprise. Debian and Ubuntu have equal share amongst survey respondants, but RHEL and CentOS dominate the playing field.

As for the future, there were a couple of things mentioned. Puppet needs some kind of messaging infrasturcture, and it seems the mCollective will be it. They’re not ready to announce anything, but it seems like a logical choice.  There are also plans for centralized data services to make the data puppet has available to it available to other things.

mCollective

Given by mCollective’s author, whose name escapes me, this was a live demo of what mCollective can do for you. Its basically a highly scalable messaging framework that is not necessarily tied to puppet. You simply need to write an agent that will subscribe to your messages. Currently only ActiveMQ is supported, but it uses STOMP, so any queueing system that uses STOMP should be able to utilize the same driver.

Once you have these agents consuming messages, one must just become creative at what they can do. He currently has some puppet focused agents and client code to pull data out of puppet and act accordingly. Ultimately, you could do much of this with something like Capistrano and parallel ssh, but this seems to scale well. One audience member boasted that they have over 1000 nodes using mCollective to perform tasks.

The Un-Conference

Puppet Camp took the form of an “un conference”, where there were just a few talks, and a bunch of sessions based on what people wanted to talk about. I didn’t propose anything, as I did not come with an agenda, but I definitely was interested in a few of the topics:

Puppet CA

My colleague at Canonical, Mathias Gug, proposed a discussion of the puppet CA mechanics, and it definitely interested me. Puppet uses the PKI system to verify clients and servers. The default mode of operation is for a new client to contact the configured puppet master, and submit a “CSR” or “Certificate Signing Request” to it. The puppet master administrator then verifies that the CSR is from one of their hosts, and signs it, allowing both sides to communicate with some degree of certainty that the certificates are valid.

Well there’s another option, which is just “autosign”. This works great on a LAN where access is highly guarded, as it no longer requires you to verify that your machine submitted the CSR. However, if you have any doubts about your network security, this is dangerous. An attacker can use this access to download all of your configuration information, which could contain password hashes, hidden hostnames, and any number of other things that you probably don’t want to share.

When you add the cloud to this mix, its even more important that you not just trust any host. IaaS cloud instances come and go all the time, with different hostnames/IP’s and properties. Mathias had actually proposed an enhancement to puppet to add a unique ID attribute for CSR’s made in the cloud, but there was a problem with the ruby OpenSSL library that wouldn’t allow these attributes to be added to the certificate. We discussed possibly generating the certificate beforehand using the openssl binary, but this doesn’t look like it will work w/o code changes to Puppet. I am not sure where we’ll go from there.

Puppet Instrumentation

I’m always interested to see what people are doing to measure their success. I think a lot of times we throw up whatever graph or alert monitoring is pre-packaged with something, and figure we’ve done our part. There wasn’t a real consensus on what were the important things to measure. As usual, sysadmins who are running puppet are pressed for time, and often measurement of their own processes falls by the way side with the pressure to measure everybody else.

Other stuff

There were a number of other sessions and discussions, but none that really jumped out at me. On the second day, an employee from Google’s IT department gave a talk about google’s massive puppet infrastructure. He discussed that it is only used for IT support, not production systems, though he wasn’t able to go into much more detail. Also Twitter gave some info about how they use puppet for their production servers, and there was an interesting discussion about the line between code and infrastructure deployment. This stemmed from a question I asked about why they didn’t use their awesome bittorent based “murder” code distribution system to deploy puppet rules. The end of that was “because murder is for code, and this is infrastructure”.

Cloud10/Awstrial

So this was actually the coolest part of the trip. Early on the second day, during the announcements, the (sometimes hilarious) MC Deepak mentioned that there would be a beginner puppet session later in the day. He asked that attendees to that session try to have a machine ready, so that the prsenter, Dan Bode, could give them some examples to try out.

Some guys on the Canonical server team had been working on a project called “Cloud 10” for the release of Ubuntu 10.10, which was coming in just a couple of days. They had thrown together a django app called awstrial that could be used to fire up EC2 or UEC images for free, for a limited period. The reason for this was to allow people to try Ubuntu Server 10.10 out for an hour on EC2. I immediately wondered though.. “Maybe we could just provide the puppet beginner class with instances to try out!”

Huzzah! I mentioned this to Mathias, and he and I started bugging our team members about getting this setup. That was at 9:00am. By noon, 3 hours later, the app had been installed on a fresh EC2 instance, a DNS pointer had been created pointing to said instance, and the whole thing had been tweaked to reference puppet camp and allow the users to have 3 hours instead of 55 minutes.

As lunch began, Mathias announced that users could go to “puppet.ec42.net” in a browser and use their Launchpad or Ubuntu SSO credentials to spawn an instance.

A while later, when the beginner class started, 25 users had signed on and started instances. Unfortunately, the instances died after 55 minutes due to a bug in the code, but ultimately, the users were able to poke around with these instances and try out stuff Dan was suggesting. This made Canonical look good, it made Ubuntu look good, and it definitely has sparked a lot of discussion internally about what we might do with this little web app in the future to ease the process of demoing and training on Ubuntu Server.

And whats even more awesome about working at Canonical? This little web app, awstrial, is open source. Sweet, so anybody can help us out making it better, and even show us more creative ways to use it.


October 21, 2010 at 4:54 pm Comments (0)