DivX, where did that momentum go?

Back in February DivX graced us with their web player plugin for Windows and Mac. Apparantly to little fanfare, I didn’t even notice at the time. I have always been a fan of DivX; the codec mind you, not the company. The quality is superior and encoding time shorter than any other option out there, not to mention a relatively sane DRM implementation. On the other hand, the company was born on the media wings of online tv and movie pirates, and it all started out as a Windows Media Video hack. Coming from a background where the involved individuals seem to prefer and trend toward transparency rather than tight-assed corporations, it always seemed rather odd to me the way DivX attempted to capitalize on their momentum. Notice that they have long since lost the support of online pirates who are now using the more transparent (open source) XviD codec extensively.

This post isn’t so much about DivX’s failure to gain real end-user traction in years past, it’s more an open question as to why they seem to be unable to meet market demands NOW. On2, who we first heard about when they open sourced their VP3 codec, whom the astute reader will know is the codec Ogg Theora is based upon. Somehow, these On2 fellows in the past couple of years have managed to get Macromedia and now Adobe eating out of their hands, which in turn has granted them 95% or better market penetration through the web browser (bundled with Flash 8). Now they can be seen launching products left and right aimed at corporate licensee’s such as the On2 Video Publisher. Dare I pose the question, DivX, you went through all the trouble of creating a browser plugin for playback, why didn’t you take it the extra 10 yards?

Open Source Flash Media Server: Red5

I have been following the development of red5 since very early on. It was the prime motivator behind flixn, as having a freely available flash media server would allow one to do things that were inconcievable previously due to the financial weight of licensing. Just to be clear I have never, not by any stretch of the imagination, been a proponent of flash. To be perfectly honest I am one of those geeks who will steer clear of a website entirely if flash is a hard requirement. That said, I would like to think that I am able to recognize the merits of a technology no matter how foul a taste it leaves in my mouth. Flixn exploits the heck out of one of those merits, there will be no punches pulled here.

Over the last couple of months flixn has gone from one of those little backburner projects that is slowly building steam to seeing near fulltime development by several individuals. As work has progressed it has become abundantly clear that no matter how far red5 has come in its short life it just wasn’t going to be ready for prime time on the same timetable as flixn.com. In our case, the final phases of development are currently underway, including a switch to Adobe’s Flash Media Server 2. It is quite unfortunate, but such is life. I wish to point out that this should not be taken the wrong way, red5 has proved itself to be extremely robust and stable as a development platform. The switch to FMS2 was not a light one by any means, and yes, that does mean we were very seriously considering a launch using red5.

As critical as I am of flash, I am extremely eager to see what the entrepreneurial web 2.0 crowd will do in the next 6-12 months as red5 becomes mature. That Adobe seems to have taken an interest in getting Flash9 out for Linux doesn’t hurt my feelings either.

PHP/AJAX file upload with progress bar, part 2

I made some good progress on implementing the server side of things as outlined in my prior post on this subject, but I got lazy as usual. I simply haven’t gotten around to finishing this up, so I decided to throw the code up and make a post about it in the hopes that someone will either badger me into completing it, or find some of what is already there useful. The code as of now implements what could be a reasonably useful (python asyn* inspired) set of utility classes for writing socket applications in PHP. Not that anyone would ever want to do anything like that, of course.


OpenBSD’s financial situation

In typical form, Theo is once again ranting about for-profit corporations not living up to his personal ethical standards. Wait, wait, wait.. WHAT? Right, exactly, let me spell it out for those of you that missed it. For-profit corporations typically do not, nor should they in general operate under the guidance of some high and mighty code of ethics. That’s just not what they do! If you expect them to, you are a lunatic, profit margins are everything. Granted, there are some that do and kudos to them.

Theo, your pet OpenBSD is in some ways pushing the envelope to the extreme when compared to other open source operating systems. OpenSSH/SSL, bgpd, pf to name a few. You’re a bright guy Theo, I know you are, get with your guys and figure out how to monetize those good bits. Selling CD’s and DVD’s isn’t the way. Selling support services is a step in the right direction, but isn’t going to get you too far. Offer custom development services, for customization, etc? Ok, ok, sure, not bad. But you can do better than that, it all depends on how far you want to take it. Oh wait, there’s a thought. $10,000/year subscription service that gets your corporation fed security patches 72 hours before everyone else? How about a $30/year subscription service for individuals that gets you a login to download new releases 2 weeks before everyone else? Oh, right, but then one guy would grab it and setup a torrent or something. Think about it this way, though. At least then you’ll have something legitimate to rant about.

The Complete FreeBSD

One month ago, on February 26, 2006, I strong-armed Greg “grog” Lehey over IRC into letting me provide him an account on a machine to act as a download mirror for his book, The Complete FreeBSD. This while I was drafting the article submission to Slashdot, just in case. The announcement that the book was now available for free under the Creative Commons Share-Alike license was made several days prior and syndicated on a number of other more niche geek news sites. After the /. article went live on the 27’th, Greg decided to make use of the mirror, and switched the primary download site to Evilcode.net. Shortly thereafter traffic leveled off at around 7Mbit and slowly tapered off over the next few days. This from a post that did not even hit the main page.

Now, one month later, Evilcode.net is still acting as the primary download mirror for The Complete FreeBSD. By my rough count it has consumed so far in the neighborhood of 75GB of transfer, which breaks down as, roughly mind you… 7500 copies of the PDF version, 3400 copies of the PostScript version and 1000 copies of the book sources. Not too shabby!

I would like to extend my most sincere thanks to grog not for just disseminating this valuable resource openly as he has done, but also for its existence in paper form in the first place. I still have fond memories of receiving and reading my copy of the second edition some years ago when I was a FreeBSD novice. Not only that, but for years of valuable contributions to the FreeBSD project. The developer community would not be the same without you Greg, you are one of the good guys.

Direct link to The Complete FreeBSD page on lemis.com

Receiving asynchronous notifications of database changes , part 2

As I trudge forward with this explanation please keep in mind that Epidemic is a proof of concept. I’m not saying that it wouldn’t work fine in production as-is, but odds are it will bring down your entire infrastructure AND club all of your pet baby seals.

Let’s quickly step through the code, starting with a simple entry point mail_server.py. After pulling in the required classes and whatnot, the first thing this piece of code does is instance SQLNotifyDispatch which inherits SQLNotify. These two classes form the core of the code that does all the heavy lifting we wanted to avoid in our frontend. It cannot even fathom what to do, however, without a little help. Rather than telling it what tables to watch, and also implementing code to take action when something happens to a watched table, the framework has been designed to allow one to simply implement the code that takes action, and let it inform SQLNotifyDispatch what exactly needs to be watched.

This brings us to mail/server.py, in which I will focus on the mail_users class. This class performs actions when modifications are made to the mail_users table. The classname does not necessarily need to mimic the table name, as can be seen if you look back at mail_server.py. First, the mail_users class is instanced, and then it is registered with the dispatch object we created before. You will notice the first argument to the Register method is ‘mail_users’, this is where the name of the table to watch is defined. Back to the mail_users class, now that we’ve let the Dispatch end of things know what table to watch, we need to let it know what columns we are interested in. This is where the GetCol method of the registered class comes in. As you can see its implementation is very simple, return [‘gid’, ‘dba’, ‘username’, ‘quota_bytes’]. This tells Dispatch that we are interested in changes to those 4 columns, changes to other columns in the table are fantastic, but we don’t need to be notified of them.

The real meat is contained in backend/sql.py Digging down through SQLNotifyDispatch and into SQLNotify, you will notice a number of static definitions in the SQLNotify class, such as TableCreate, Function, Trigger and Rule. This is the heart of the whole operation. When SQLNotify is told the table and columns to watch, it crafts a table to log changes to, and a function/trigger/rule trio specific to the table being watched that serves two purposes. The first, it ensures that any operations happening on the watched table get stuffed into the log table. Second, it fires off PostgreSQL’s NOTIFY, to let us (SQLNotify) know that a change has happened. That’s right, it lets “US” know, because once this is all registered with the database it is out of our hands. This means that SQLNotify only has to make these changes the first time it watches a database. It also means, and much more importantly, that no matter what happens to “US”, the application which setup the watching, and is acting on changes, no changes will be lost. The application could crash, no matter, as soon as we come back up, we can poke into the log tables and see what has happened while we were away. It is possible in some cases to receive notification of changes a bit later than one would like, but you will always, always know if changes were made.

When a change does happen, PostgreSQL fires off NOTIFY as dictated by a rule the SQLNotify class created for us. As a result of this our application is asynchronously notified of the change (that’s right, no polling!). When SQLNotifyDispatch receives one of these notifications, it calls the appropriate method in our worker class, INSERT(), UPDATE() or DELETE(), dependant upon what the operation was that happened in the database. As you can see in mail/server.py, those three methods do various operations, such as creating directories on the filesystem, setting up or changing quota’s, or deleting directories.

It is all a just a bit more complicated than what is typically done when this type of functionality is needed, I admit. My personal experience dictates, however, that this type of approach actually ends up being much simpler and easier to maintain down the road when one starts to grow an infrastructure.

Those who are tied to MySQL are not prevented from using a nearly identical solution. With the introduction of MySQL 5.0, it is all very possible save the asynchronous notification features. While elegant and conducive to good performance, are certainly not required. Keep in mind that polling log tables which are consistently pruned is in general going to be faster than checking a datestamp or even an indexed “updated” boolean column on a very large table.

Receiving asynchronous notifications of database changes

The following is directly pertaining to a PostgreSQL-based internet mail solution, but be not discouraged. Most, if not all, of the techniques can be applied elsewhere, and I intend to explain how.

For about six months in 2003 I worked for a company based out of Spearfish, SD, called Altaire Enterprises, Inc., a small floundering dialup ISP. This is where my first real experiences with PostgreSQL took place, up until this time I had been a die-hard user of MySQL for all of my database needs, whether it be as a backing store for a web application or otherwise. The largest project I took on while I was there was the implementation of a database backed mail system. That is to say, all mail accounts were entirely virtual, no system accounts, and all data associated with them was stored in PostgreSQL. It could just as easily have been LDAP, but it wasn’t, and that isn’t the point of this little ditty I’m writing now. Architecting the mail system was the easy part, as I found, plenty of mail applications are perfectly happy to talk to PostgreSQL. There are two hard parts. Performance and Management.

I will get to performance later. Management comprises more than one would think at first. Obviously, you need some sort of frontend or tools to add users, domains, etc. to your mail database. In this case it was a collection of C applications, rather than the typical web frontend, because they could be executed by the internal billing system (platypus). In consideration of the scope of this text, how the data is being entered is moot.

There is a flip-side to management, and specifically putting information into the database in this scenario. There are cases where you need to know when modifications are made to that data, so that you can perform operations on disk, for instance. Such operations may be setting quota’s, or creating a users Maildir if your MTA doesn’t handle that for you. The typical way to do this is to have your frontend perform that action as well, which is logical in a small installation, and is exactly the method used at Altaire. The C applications performed whatever on-disk operations were necessary.

What happens when your (web) fronted is hosted on a different server, though? I suppose you could export a set of web services, or similar, from the mailserver, allowing your frontend to connect to it and perform the necessary operations. What then if you have 10 mail servers? The frontend has to decide which one to connect to, and does its thing, fine, but what if one of those web services runners dies, what do you do? Write additional logic to record failures, and play them back later? Oh lords, there must be a better way. Yes, yes.. Of course there is. There is another method that has been used time and again, it is proven and reliable. My first major experience with it was during the time I helped architect ITMom.com, a web hosting provider, back in 99/00. You create an addition column in the tables that will require external operations, and when modifications happen, you toggle that field on with your frontend. You can then have a runner that polls those tables watching for modifications, and performing the correct operation when one is found. This works well, with a number of drawbacks. One, you are polling. Two, you are littering your carefully constructed, optimized and normalized schema with columns whose sole purpose is to notify external applications of changes. You could easily avoid this by logging changes to seperate tables, sure, but now we come to three. You are stilly relying on your frontend to dictate what actions the backend should take.

Would it not be better to just let the management application do what it should do best, and insulate it from the underlying technical details? This becomes even more important in an organization where the developer(s), database administrator(s) and system administrator(s) are all different people.

Enter Epidemic, a proof-of-concept framework to easily make such things possible.

To Be Continued…

PHP: Protecting your code (Zend Encoder/IonCube/SourceCop)

SourceCop Decoder

Personally, I have never found need to encode/encrypt/obfuscate any PHP. I do however know that there is a large audience of developers and/or organizations out there that do rely on such obfuscation to protect their works. Not being sure if it has hit the news or not, as I have been too busy of late to even open up my RSS aggregator to skim the headlines, know that there is at least one service in the wild that can successfully decode Zend Encoder and IonCube encoded files. It’s not perfect by any means, as it is reconstructing the code based on the opcodes, but it does return it in a format that is true to the original as far as execution and reasonably easy for a human to parse.

I wrote this little number the other day after running across a script I was wanting to use, in which one component was obviously dependant upon register_globals. My gosh, if I only had the code I could fix that! Fortunately it was obfuscated with an application called SourceCop, which provides very little in the way of protection. Come on guys, you could at least obfuscate the code itself first, munging whitespace, variable and function names. As it was, it took a mere 20 minutes to write a script that would replace an encoded file with a pristine copy of the original. At any rate, here is the script, do note that it was a quick hack and as such it may or may not work for you. It will also simply overwrite any SourceCop encoded files fed to it, so you will want to create a backup first, you have been warned.
Update: 2/23/2006, revised script

MySQL 5.0 standardized join syntax

I am sure the revised / SQL:2003 standardized join syntax in MySQL 5.0 is old news to many out there. My guess is they are in the minority, and most haven’t heard a thing about it. Some may have even upgraded only to be frustrated that their queries weren’t working as they should any longer. Here’s the skinny, taken directly from the MySQL manual.

Beginning with MySQL 5.0.12, natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard. These changes make MySQL more compliant with standard SQL. However, they can result in different output columns for some joins. Also, some queries that appeared to work correctly in older versions must be rewritten to comply with the standard.

I would like to applaud MySQL AB on their latest release. Throughout its history, MySQL has been vastly out-gunned in terms of useful features by many other commercial and free databases. It has also taken a great deal of heat on many occasions due to its poor standards conformance in comparison to the other options on the market. With this release, even if they have not completely closed that gap, they have narrowed it by an impressive margin.

I do have a major gripe, however. Whilst the previously mentioned changes improves standards conformance and portability, it breaks a large enough percentage of MySQL-bound applications to warrant serious scrutiny. Apparently MySQL AB has forgotten that not the entire world is open source, and that many of us must every day maintain databases accessed by scripts and applications controlled by a third party or to which the source code is simply not available. This makes it rather impossible for any of us in such a situation to move those databases to servers running 5.0.

In my particular case, I was looking forward to leveraging triggers, stored procedures and views to reduce my administrative burden and deprecate a number of external scripts (hacks) that we use to transform data for use by other applications. The primary application sitting on this database is commercial and Zend encoded, so “fixing” the broken queries is simply not an option. Yes, we have talked to the vendor. I find it hard to believe that I am the only one in this situation.

Seriously now, how hard would it have been to add an option to enable the legacy behavior?

Update: 2/19/2006, offending commit

PHP/AJAX file upload with progress bar

Over the past couple of days I have been pondering adding some file upload functionality to the form classes I have been using for a bit over a year now. History repeats itself, again, time spent pondering instead of just getting on with the nitty gritty means I start thinking about ideal functionality. So, as I pondered how to go about sanely handling file uploads features started coming to mind, and one of them just wouldn’t go away. A semi-realtime inline file upload progress indicator. Well, that doesn’t sound so hard.

I spent some time with Google doing the requisite research to find that there are a number of stumbling blocks. The first being client-side, when a browser window/frame is busy pushing a file or files up the pipe, it seems that it is just that, busy. Which makes it a bit difficult to talk it into displaying updates. This seems to be pretty easily solved by pushing the file upload through a hidden iframe referenced by the target attribute on the form.

That certainly isn’t where the problems end. As luck would have it, not only is the browser happy to be working against us, so is PHP, in more ways than one.

When the execution unit handling the upload gets hit with the POST, it would seem that it likes to make itself busy as well. Ok, so no way to get the status of the file upload from the thread/process actually handling the upload. Apparently there are some patches against PHP to rectify this situation, but until they get committed and see a release they are unusable for most people. I am all for gratuitously hacking my own PHP install, but it seemed like there must be a better way.

I then stumbled across another method. Scan the upload_tmp_dir (PHP INI variable) for files of a known naming scheme, looking for the one with the latest timestamp. The current size of this file could be pushed back to the browser so that it could calculate the upload progress. This method is also not without its glaring faults. The probability of a race condition is too high for any kind of production use. Oh wait, scratch that, I’m starting to sound like a PHP developer, let me rephrase… There is an unavoidable possibility of a race condition, so this method cannot be used. Well… Wait a minute, there is an upload_tmp_dir variable. Why don’t we just generate some kind of unique form id to be passed back to us when we get the POST, then it should be possible to create a directory to have PHP put the file(s) in of a known name, eliminating our race, no? I suppose upload_tmp_dir being read-only is a bit of a stumbling block with that idea, considering we already decided hacks to the PHP source were out. Not to mention PHP probably isn’t going to let us set the variable before it gets busy processing that form data anyway.

Google led me to a couple more resources for accomplishing this throughout the course of my research, but they all involved an external non-PHP script to handle the upload and drop status information somewhere accessible. Unacceptable I say! There must be a way to do it with PHP alone!

I have theorized a method, implementation forthcoming. Here is a brief summary. Have an onSubmit handler frob a PHP script and retrieve a URL to apply to the action property of the form, said PHP script will have just launched a PHP-based very simple webserver. This webserver’s sole purpose in life is to eat POST’s and parse multipart form data. This same PHP script will update an accessible location with the status of the upload. The hidden iframe trick gets used to free up the window with the form in it. This window can now pull upload status via XMLHttpRequest and update a progress bar accordingly. This method also has the benefit of being able to degrade gracefully in the event that JavaScript is unavailable on the client. The default action URL can be implemented as a standard file upload handler.