SpamapS.org – Full Frontal Nerdity

Clint Byrum's Personal Stuff

Cars are so last century … but, so is Linux, right?

This past weekend, I attended the 2010 Los Angeles Auto Show. I’m not a huge car buff. I do think that BMW’s are the bomb, and I like Honda’s common sense vehicles, but really, I am NOT a car guy. However, I thought this was an interesting chance to take a look at an industry that, in my opinion, isn’t all that different than the one I’m in.

Now, that may surprise some. Its pretty easy to think that I work for a super advanced company that has started a revolution and sits on the bleeding edge of innovation. I mean, at Canonical, we’re doing all kinds of amazing stuff with “the cloud” and building software that makes peoples’ jaw drop when they see it in action sometimes.

But really, I think we’re more like CODA. CODA has built what looks to be a sustainable, practical electric car. The car itself is not visually stunning, but the idea behind it is. Make an electric car that anyone can buy *and* use. Make it fun, and make sure the business is sustainable. But, in no way is CODA challenging the ideas and revisions that have worked for the 100+ years that the car industry has existed.

CODA is still putting a steering wheel, gas pedals, and gear shift in the cockpit for the driver. There are doors, wipers, lights, and probably floor mats. In much the same way, in Ubuntu, we’re still putting our software out there with the intention that, while its created differently, and affords the user more capabilities, it is basically driven in much the same way as Windows 7 or OS X, mostly as a web, errrr, cloud terminal.

The exciting part is that for $3 of possibly more efficiently produced electricity, you can drive 100 miles. Even more exciting is that the CODA might actually compete with sensibly priced  (but larger) Honda and Toyota sedans, rather than like the Tesla cars that compete with Lexus and BMW’s.

Given this way of thinking, the auto show was extremely interesting. The electric car (open source?) has “arrived”, and the established players are buying the interesting enabling technology like batteries (android’s linux kernel, darwin for mac, etc) from companies like Tesla, and putting them in their established products.

Whether consumers care about either open source or electric cars is another story.. maybe the 2011 LA Auto Show will have an answer for me on at least one of them.


November 22, 2010 at 6:06 pm Comments (0)

Drizzle7 Beta Released! now with MySQL migration! « LinuxJedis /dev/null

Drizzle7 Beta Released! now with MySQL migration! « LinuxJedis /dev/null.

Drizzle is a project that is near and dear to my heart.

To sum it up, Drizzle took all that was really good in MySQL, cut out all that was mediocre, and replaced some of it with really good stuff. The end product is, I think, something that is leaner, should be more stable, and definitely more flexible.

So go check out the beta! I guess I should use Andrew’s migration tool and see if I can migrate this blog to drizzle. :)


September 29, 2010 at 10:39 pm Comments (0)

Love for my sponsors

No not New Deal Tobacco & Candy Company, nor Nike or Pepsi (though, I can’t wait forever guys, come on!)

No, I’m talking about my Debian and Ubuntu sponsors. Without you, all of my hard work would be sitting in a queue somewhere with no love.

You see, just because I work for Canonical, doesn’t mean I get an automatic berth in the Ubuntu Developer community, nor does it give me any clout with the Debian Developers. The beauty of the open source community, is that it is and probably always will be, a meritocracy. What have you done? What is your level of commitment? How well you can answer those questions at any given time defines how much people trust you, and therefore, your level of autonomy and leadership.

So, folks like me who have just entered the fray in Ubuntu, and who only dabbled in Debian, must prove ourselves. And, to whom will we prove ourselves? Why, sponsor developers.

So, without further ado, These are a few of the sponsors who have made sure that my work has gotten out there in the past few weeks and months, and a few who have made sure that my * shoddy* work has not. THANKS GUYS!

  • Chuck Short (zul) – Ubuntu uploads of bug fixes and warranted critiques of half-assed PHP solutions.
  • Dustin Kirkland (kirkland) – Ubuntu uploads of bug fixes.
  • Scott Moser (smoser) – merging changes for cloud-utils and uploading to Ubuntu
  • Chris Cheney (ccheney) – Upload of gearman-interface source package into Debian (my first debian upload!)
  • Bernd Zeimetz (bzed) – Reviewed first gearman-interface package and convinced me to upload a proper gearman-interface package w/ swig bindings
  • Piotr Ożarowski (POX) – Educated me on finer points of Debian Python Policy
  • Thierry Carrez (ttx) – Upload of various bug fixes into Ubuntu, and sparing me “The look”
  • Matt Zimmerman (mdz) – Instruction on proper maintainer script procedures for memcached
  • Mathias Gug (mathiaz) – Endless attention to detail while reviewing my merge proposals, and “SNAILS!”
  • Thomas Goirand – Responsiveness on crusty old packages like libdbi
  • Kees Cook (kees) – MIR reviews for Ubuntu, and convincing me to get on the metro back to the hotel instead of face the Prague deluge with my little 100Kč (about $5 US) umbrella
  • The people I’ve missed – I can’t remember everyone, but thank you if you helped me and Ubuntu, and Debian out!

I’ll try to do this more often, but I don’t know if I can really single everyone out. Its amazing how many people work together so smoothly, despite the group above being spread out over, by my count, at least 7 countries and 5 time zones.


August 5, 2010 at 10:55 pm Comments (0)

Embedding libraries makes packagers sad pandas!

STOP THE INSANITY!

So, in my role at Canonical, I’ve been asked to package some of the hotter “web 2.0″ and “cloud” server technologies for Ubuntu’s next release, 10.10, aka “Maverick Meerkat”.

While working on this, I’ve discovered something very frustrating from a packaging point of view thats been going on with fast moving open source projects. It would seem that rather than produce stable API’s that do not change, there is a preference to dump feature after feature into libraries and software products, and completely defenestrate API stability (or as benblack from #cassandra on freenode calls it, API stasis).

So who cares, right? People who choose to use these pieces of software are digging their own grave. Right? But thats not really the whole story. We have a bit of a new world order when it comes to “Web 2.0″. There’s this new concept of “devops“. People may be on to something there. As “devops” professionals, we need to pick things that scale and ship *on time*. That may mean shunning the traditional methods of loose coupling defined by rigid API’s. Maybe instead, we should just build a new moderately loose coupling for each project. As crazy as it sounds, its working out for a lot of people. Though it may be leaving some ticking time bombs for security long term.

To provide a concrete example, I’ll pick on the CPAN module Memcached::libmemcached. This is basically just a perl wrapper around the C library libmemcached. It seeks to provide perl programmers with access to all of the fantastic features that libmemcached has to offer. The only trouble is, it only supports everything that libmemcached had to offer in v0.31 of libmemcached.

Now, that is normal for a wrapper. Wrappers are going to lag their wrapped components newer features. But, they generally can take advantage of any behind the scenes improvement in a library, right? Just upgrade the shared library, and the systems dynamic linker will find it right? Well thats going to be tough, because there have been quite a few incompatible changes to the library since 0.31 was published last summer. And the API itself has grown massively, changing calls that are fundamental to its operation, if only slightly.

So, rather than be slave to somebody upgrading the shared library and recompiling Memcached::libmemcached, the maintainers simply embedded version 0.31 of libmemcached in their software distribution. The build process just statically links their wrapper against it. Problem solved, right?

However, from a larger scale software distributor’s standpoint, this presents us with a big problem. We have a lot of things depending on libmemcached, and we really only want to ship maybe 1 or 2 versions (an old compatible version and the latest one) of this library. What happens when a software vulnerability is found and published affecting “every version of libmemcached prior to 0.41″. Now we have to patch not only our v0.40 that we shipped, but also v0.31 inside Memcached::libmemcached. Even worse, what if we also had some ruby code we wanted to ship? The Ruby wrapper for libmemcached has v0.37 embedded. So now we have to patch and test three versions of this library. This gets ugly quickly.

From Memcached::libmemcached’s point of view, they want something known to be working *for the project today*. The original authors have moved on to other things now that they’ve shipped the product, and the current maintainers are somewhat annoyed by the incompatibilities between 0.31 and 0.40, and don’t have much impetus to advance the library. Even if they did, the perl code depending on it must be updated since the API changed so heavily.

Now, I am somewhat zen about “problems” and I like to stay solution focused, and present them as opportunities (HIPPIE ALERT!). So, what opportunity does this challenge present us, the packagers, the integrators, the distributors, with?

I think we have an opportunity to make things better for people using packaging, and people not using packaging. Rather than fighting the trends we see in the market, maybe we should embrace it. Right now, a debian format package (which Ubuntu uses) control file defines a field called “Depends”. This field tells the system “Before you install this package, you must install these other packages to make it work”. This is how we can get working, complex software to install very easily with a command like ‘apt-get install foo’.

However, it gets more difficult to maintain when we start depending on specific versions. Suddenly we can’t upgrade “superfoo” because it relies on libbar v0.1, but libbar v0.2 is out and most other things depend on that version.

What if, however, we added a field. “Embeds: libbar=0.1″. This would tell us that this package includes its own private version of libbar v0.1. When the maintenance problem occurs for libbar “versions prior to v0.3″, we can simply publish a test case that tests for the bad behavior in an automated fashion. If we see this bad behavior, we can then evaluate whether it is worth patching. Any gross incompatibilities with the test case will have to be solved manually, but even with the most aggressive API breaking behavior, that can probably be reduced to 2 – 3 customizations.

“But we’re already too busy patching software!”. This is a valid point that I don’t have a good answer for. However, the current pain we suffer in trying to package things from source is probably eating up a lot more time than backporting tests and fixes would. If we strive for test coverage and embedded software tracking, at least we can know where we’re *really* vulnerable, and not just assume. Meanwhile, we can know exactly which version of libraries are embedded in each software product, and so we can reliably notify users that they are vulnerable, allowing them to accept risks if they so desire.

Of course, this requires a tool that can find embedded software, and track it in these packages. I don’t think such a thing exists, but it would not be hard to write. If a dir contains 99% of the files from a given software package, then it can be suggested that it embeds said software package. If we can work that down to checking a list of 1000 md5sums against another list of 1000 md5sums, we should be able to suggest that we think we know what the software embeds, and sometimes even provide 100% certainty.

I look forward to fleshing out this idea in the coming months, as I see this happening more and more as we lower the barriers between developers and operations. Cloud computing has made it easy for a developer to simply say “give me 3 servers like that” and quickly deploy applications. I can see a lot of them deploying a lot more embedded software, and not really understanding what risk their taking. As Mathias Gug told me recently.. this should help us spend more time being “fire inspectors” and less time being “fire fighters”.


June 12, 2010 at 6:19 am Comments (0)

Gearman K.O.’s mysql to solr replication

Ding ding ding.. in this corner, wearing black shorts and a giant schema, we have over 11 million records in MySQL with a complex set of rules governing which must be searchable and which must not be. And in that corner, we have the contender, a kid from the back streets, outweighed and out reached by all his opponents, but still victorious in the queue shootout, with just open source, and 12 patch releases.. written in C, its gearman!


I’m pretty excited today, as I’m preparing to go live with the first real, high load application of Gearman that I’ve written. What is it you say? Well it is a simple trigger based replicator from mysql to SOLR.

I should say (because I know some of my colleagues read this blog) that I don’t actually believe in this design. Replication using triggers seems fraught with danger. It totally makes sense if you have a giant application and can’t track down everywhere that a table is changed. However, if your app is simple and properly abstracted, hopefully you know the 1 or 2 places that write to the table.

I should also say that I really can’t reveal all of the details. The general idea is pretty simple. Basically we have a trigger that dumps a primary key into gearman via the gearman MySQL UDFs. The idea is just to tell a gearman worker “look at this record in that table”.

Once the worker picks it up, it applies some logic to the record.. “should this be searchable or not”. If the answer is yes it should be searchable, the worker pushes the record into SOLR. If not, the worker will make sure it is not in solr.

This at least is pretty simple. The end result is a system where we can rebuild the search index in parallel using multiple CPU’s (thank you to solr/lucene for being able to update indexes concurrently and efficiently btw). This is done by pushing all of the records in the table into the queue at once.

Anyway, gearmand is performing like a champ, libgearman and the gearman pecl module are doing great. I’m just really happy to see gearman rolled out in production, as I really do think it has that nice mix of simplicity and performance. I love the commandline client which makes it easy to write scripts to inject things into queues, or query workers. This allows me to access a worker like this:

$ gearman -h gearmanbox -f all_workers -s
Known Workers: 11

boxname_RealTimeUpdate_Queue_TriggerWorker_1 jobs=627366,restarts=0,memory_MB=4.27,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700
boxname_RealTimeUpdate_Queue_Subject_13311 jobs=304134,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:58 -0700
boxname_RealTimeUpdate_Queue_Subject_13306 jobs=606126,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700
boxname_RealTimeUpdate_Queue_Subject_13314 jobs=576714,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700
boxname_RealTimeUpdate_Queue_Subject_13342 jobs=294846,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700
boxname_RealTimeUpdate_Queue_Subject_13347 jobs=376998,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700
boxname_RealTimeUpdate_Queue_Subject_13359 jobs=470508,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:58 -0700
boxname_RealTimeUpdate_Queue_Subject_13364 jobs=403182,restarts=0,memory_MB=7.03,lastcheckin=Tue, 23 Mar 2010 22:37:58 -0700
boxname_RealTimeUpdate_Property_SolrPublish_ jobs=219630,restarts=0,memory_MB=6.19,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700
boxname_RealTimeUpdate_Queue_TriggerWorker_2 jobs=393642,restarts=0,memory_MB=4.27,lastcheckin=Tue, 23 Mar 2010 22:37:59 -0700
boxname_RealTimeUpdate_Property_SolrBatchPub jobs=6,restarts=0,memory_MB=6.23,lastcheckin=Tue, 23 Mar 2010 22:37:28 -0700

Brilliant.. no need for html or HTTP.. just a nice simple commandline interface.

I think gearman still has a ways to go. I’d really like to see some more administration added to it. Deleting empty queues and quickly flushing all queues without restarting gearmand would be nice to haves. We’ll see what happens going forward, but for not, thanks so much to the gearman team (especially Eric Day who showed me gearman, and Brian Aker for pushing hard to release v0.12).

w00t!


March 24, 2010 at 5:47 am Comments (0)

Bromine and Selenium – second and third most useful elements behind Oxygen

If you’re an engineer, you hate testing. Seriously, who likes doing what those mere mortal “users” do? We’re POWER users and we don’t need to use all those silly features on all those sites. Just look at Craigslist, clearly an engineer’s dream tool.

For web apps, testing actually isn’t *that* hard. The client program (the browser) is readily available on every platform known to man, and they generally don’t do much more than store and retrieve data in clever ways. So, its not like we have to fire up a Large Hadron Collider to observe the effects of our web app.periodictable

Therein lies the problem though, as clicking around on web forms and entering the same email address, password, address, phone number, etc. etc., 100 times, is BORING.

Enter Selenium. This amazing little tool has been on the scene for a little while now, but its just now getting some momentum. Click through to the website and watch “the magic” as they put it, but basically here’s how it goes:

  • open their firefox plugin and click ‘record
  • do something
  • click ‘record’ again.

Then just save this little test case to a file, and the next time you change anything that might relate to the series of clicks and data entries you just made, run this test again. There are all kinds of assertions you can make while you’re doing something. Like ‘Make sure the title is X’ or ‘make sure a link to Y exists’.

But wait, I could have done that with something like Test::More,  PHPUnit, or lime. Where’s the real benefit?

Well because Selenium remotely controls your browser, all those gotchya’s regarding javascript CSS incompatibilities can come into play here. Because Selenium can control Internet Explorer, Firefox, *and* Safari. In fact it can also control Opera, and according to their website, any browser that properly supports javascript fully.

This is really a nice evolutionary step for web shops, as tools like this generally are OS specific and cost a lot of money. Once again open source software appears where a need becomes somewhat ubiquitous.

You can even take it a step further. The next thing that generally happens in a web dev shop when they get bigger than 20 or 30 people is they hire people who actually like testing. Well not really, but they dislike it *less* than software engineers. These are QA engineers. And they DO like things to be orderly and efficient.

Bromine is the answer for that. Its still pretty rough around the edges, but it gets the job done.

Again check out their website and watch the screencast, but basically it goes like this:

  • Write selenium tests as specified above
  • Upload tests to Bromine server
  • Attach tests to requirements
  • Run selenium remote control on all required OS/browser version combinations (can you say virtualbox?)
  • Run tests

Another nice thing about using bromine is now you are running your tests in a server side language, not just the Selenium IDE, which is limited to the IDE’s generated “Selenese” XML commands for tests. The IDE exports your basic test into PHP or Java, and then on the bromine server you can do interesting things, like check an IMAP box for an email, run a backend process, or send an SMS.

At first it may not seem like much, but eventually you end up with a multitude of useful tests for your web app that can be run all the time against development branches before release, and catch many problems. Quality means happier users, which hopefully means loyal users that keep coming back.


November 3, 2009 at 1:48 am Comments (0)

SSH brute force protection – Its almost always already written

Every time I get my logwatch report and see the 20 – 40 daily brute force attempts on it, I cringe. I’ve locked it down to a point, but ultimately I prefer convenience on some level. Limiting any one IP to 2 ssh connections every 5 minutes has annoyed me as many times as it has probably saved me. Preventing root from logging in is nice too.

Ultimately though, I wanted a way to fight back against the brute forcers.. to get a step ahead of them. From seeing the success of projects like SpamHAUS and Project HoneyPot, I know that massive group collaboration works. Of course I started thinking how I’d write it in my head. Every time… for months.

Well, once I let go of my egotistical desire to write it, I found this great project, DenyHosts, which does the same thing for the brute force scanners. I just installed it, and already it has added a few IPs to hosts.deny. Go download it, run it, and stop the annoying scanners!


August 23, 2009 at 4:49 pm Comments (0)