Category Archives: Musings

Real world effects of changing rel canonical link element

In 2009, Google introduced a method website owners could use to disambiguate duplicate content. By specifying a rel=canonical link element in the header of the page you give the search engine a hint as to the URL which should be authoritative for the given content. It should be noted Google has indicated they consider this method a hint, and not a directive. The conditions under which the hint will be ignored are not known, but such conditions are presumed to exist.

Imagine a simple example, anyone who has purchased a home or property in the US is reasonably familiar with the Multiple Listing System (MLS). Real estate agents add properties to the MLS and the exact same information shows up on the website of every agent and agency. How does Google know which website(s) are authoritative for this information if it is the same on potentially thousands of websites? This is a contrived example of a real-world problem, and implementing a strategy around canonical link elements can help to ensure people end up where you want them to be. One strategy might be to get visitors to the website of the agency, rather than the individual agents.

That information is all well and good, in theory, but how does it actually work in practice?

A tale of two websites…

Recently there was a case where a series of several dozen guest blogs on an established website needed to be moved, removed, or somehow re-incorporated into the overall strategy. The established site and its mission had grown and changed, meanwhile, the blog series in question had grown less relevant to the overall goals of the site. But it was still good content that many people accessed and used as a reference!

It was decided the content wasn’t “hurting” anything, and could remain, but would be inaccessible via primary navigation routes and should over the long term be given a new home. The original author of the blogs was willing to give the content a new, permanent, home on his own personal site. The authors site did not yet exist, had no existing inbound links, and zero authority with search engines — a blank slate!

Each blog post in question was re-posted on this new website, several dozen posts in total, a handful of which receive a reasonable amount of search engine traffic. The canonical links for the articles on the established site were then changed to reference these new pages on the formerly empty domain.

Google quickly adapted to the new “home address” of these pages, and within a matter of days, the new domain was seeing all the search engine impressions for these articles. After this quick adjustment over a period of a few days, the pattern held over the following month.

In the following graphic, a screenshot from the Google Search Console, you can clearly see the number of search engine impressions served by Google quickly ramped from 0 to in the neighborhood of 50 impressions per day.

snip-search-console-canonical-change

Here you can see the same data, over a slightly longer period, from the established site. The “new” site neatly stripped away around 10% of the organic search engine traffic from the established site.

snip-search-console-canonical-change-source

Most scenarios involving duplicate content management with the rel=canonical link element aren’t going to exactly match this one, so please take these results with a grain of salt. That said, it does clearly show the cause, effect, and timing surrounding changing the canonical links for established pages. It also clearly shows that Google pays attention to these canonical elements and can take fairly swift action on them.

In-language commenting

Lua is clever and uses –[[ … –]] for multi-line comments. The usefulness in this scheme is that the syntax used for closing a multi-line comment IS a comment, and no syntax error results if you comment out the opener by adding an extra -, which re-enables the commented out section. The part that I don’t like is how single-line comments are specified, with –. Just IMO, FWIW, etc., I don’t think the same characters used for basic operators should be re-used for something like a comment. Really, shouldn’t a = a — 2 result in a-4?

The same in C99/C++, how does using // to specify a comment really make sense? a //= 2 ? a ///= 2 ? /* */ -style comments are better, I suppose, because they are an obvious nop.

Ideally I think we would be using something that allows for the abuse that Lua’s multi-line comments gives us, without overloading any of the basic operators provided by the language. “ perhaps?

“ Comment


Multi-line comment

“ “
“ Active multi-line comment?
a //= 37;

Maybe, I’m not sure.

There are also languages that allow comments to effectively serve as documentation that is preserved (potentially) at runtime. For example, Python assumes unassigned strings immediately inside a class or function definition are to serve as a documentation block for that section. These documentation blocks can be accessed through runtime introspection, or used with appropriate tooling to generate documentation. This is a great approach from my experience, and perhaps -all- comments should be given the opportunity (optionally) to persist alongside the code for which they are intended.

If one is to take this approach, how does one rectify comment specification with static string initialization? Especially with regards to whitespace (esp. newline) handling? How does one specify to what block of code a comment refers when not at the beginning of a closure of some sort? Is there even any point in preserving these comments for any sort of runtime introspection or should the goal be simply the production of documentation, ala Literate Programming?

Object inheritance musings

class Controller defines IController
class CProcess implements IController
class CThread implements IController
x = Controller(CThread).new()

interface IFDEvent
class Select implements IFDEvent
class Poll implements IFDEvent
class KEvent implement IFDEvent

class SocketServer extends FDEvent(Poll) [defines ISocketServer]

class HTTPServer extends SocketServer [defines IHTTPServer]
class FTPServer extends SocketServer [defines IFTPServer]

server = HTTPServer.new()
-or-
server = HTTPServer(SocketServer(KEvent)).new()

No? Where does this break down?

Senator Tim Johnson

I had neglected to shower that day. What was the point? I wasn’t looking to impress anyone, so far as I knew we were to spend yet another day in D.C. perusing the vastness of the Smithsonian. Who in their right mind wants to waste time on a shower when their locomotive of a fourteen year old mind thinks it may be spending another dozen hours in the Air and Space Museum?

Visiting the halls of the capitol building and congressional offices was awe inspiring. So much so that said mind, once so focused on all things scientific, was derailed into a spiral of tumult over the state of affairs within our nation and without. Ten years and seven seasons of The West Wing later…

While the scope of the most pressing issues facing our nation is often daunting, events of the last year seem to have forced a new clarity of perception. Yet without this picture following from one locale to the next. Occasionally surfacing as some odd component or gizmo was fished from some box or another, politics would have in all likelihood been less than an afterthought throughout my life. Full well knowing that this very phrase has likely been emblazoned on hundreds or thousands of other pictures signed by this very same man, it has somehow emboldened me every time I have come across it. It has forced me to choke back the bile I taste when presented with a viewpoint on health care, education or any other swelteringly hot topic when significantly divergent from my own point of view.

Whether our chosen representatives realize or not, it is the little things they do that really change lives, not just the votes they cast. Thank you Tim, from the bottom of my heart, for even though we have never met, you have helped to expand my horizons.

JavaScript/File-based HTTP request logging

http://httpd.apache.org/docs/2.0/mod/mod_log_config.html

I just had the thought that it should be pretty feasible (if not trivial) to tie JavaScript-based request logging (like Mint and Analytics) to traditional file-based request logging using cookies and/or headers and CustomLog in Apache or similar in other httpd’s.

The question being… Is this somehow useful? I think it potentially could be, I’m just not 100% on how as yet.

Infinite monkey theorem

I was just considering that perhaps the Infinite monkey theroem should instead be the Infinite Doozer theorem. Not only would this allow one to conjure a much more humorous mental image, the Doozers are obviously much more industrious than monkeys and therefore much more suited to toiling away in front of typewriters. There is also the fact that an infinite number of Doozers with an infinite number of Doozer-sized typewriters would take up far less physical space than an infinite number of monkeys.

ORM: Metadata mapping

Using Metadata – Martin Fowler

The most time spent working on a site under exhibition was the model or database layer. So before really getting into the rewrite I wanted to see what else was out there that could simplify this. Obviously I was turned to Fowler’s Patterns of Enterprise Application Architecture and ActiveRecord (rails), Hibernate (java) etc. One common element with every ORM solution I have run across is that they partially violate the DRY principle. Those that don’t, or that give you the option of completely defining your schema in a format native to the ORM are not expressive enough to use the full power offered by the database. This is when I remembered that PostgreSQL implements an extension to the SQL standard information schema that allows you to add an arbitrary comment to nearly any database object, tables and columns inclusive. After some poking around I found that MySQL supports the same, although it is not nearly as well documented. SQLite does not share this non-standard extension. So, my thought at current is to express whether relations are one-to-one, many-to-one or many-to-many directly in my table declarations. The only drawback I can see is a lack of portability, but when it comes to PHP, how many people use anything other than PostgreSQL or MySQL? SQLite should be used a great deal more, in my opinion, but in practice I do not believe it is. If at some point I must be portable to something other than the common case, well, in that event I guess I can always just do what all of the ORM’s are already doing.

References:
http://www.postgresql.org/docs/8.2/interactive/sql-comment.html
http://dev.mysql.com/doc/refman/5.0/en/information-schema.html
http://dev.mysql.com/doc/refman/5.0/en/alter-table.html

DivX, where did that momentum go?

Back in February DivX graced us with their web player plugin for Windows and Mac. Apparantly to little fanfare, I didn’t even notice at the time. I have always been a fan of DivX; the codec mind you, not the company. The quality is superior and encoding time shorter than any other option out there, not to mention a relatively sane DRM implementation. On the other hand, the company was born on the media wings of online tv and movie pirates, and it all started out as a Windows Media Video hack. Coming from a background where the involved individuals seem to prefer and trend toward transparency rather than tight-assed corporations, it always seemed rather odd to me the way DivX attempted to capitalize on their momentum. Notice that they have long since lost the support of online pirates who are now using the more transparent (open source) XviD codec extensively.

This post isn’t so much about DivX’s failure to gain real end-user traction in years past, it’s more an open question as to why they seem to be unable to meet market demands NOW. On2, who we first heard about when they open sourced their VP3 codec, whom the astute reader will know is the codec Ogg Theora is based upon. Somehow, these On2 fellows in the past couple of years have managed to get Macromedia and now Adobe eating out of their hands, which in turn has granted them 95% or better market penetration through the web browser (bundled with Flash 8). Now they can be seen launching products left and right aimed at corporate licensee’s such as the On2 Video Publisher. Dare I pose the question, DivX, you went through all the trouble of creating a browser plugin for playback, why didn’t you take it the extra 10 yards?

PHP/AJAX file upload with progress bar

Over the past couple of days I have been pondering adding some file upload functionality to the form classes I have been using for a bit over a year now. History repeats itself, again, time spent pondering instead of just getting on with the nitty gritty means I start thinking about ideal functionality. So, as I pondered how to go about sanely handling file uploads features started coming to mind, and one of them just wouldn’t go away. A semi-realtime inline file upload progress indicator. Well, that doesn’t sound so hard.

I spent some time with Google doing the requisite research to find that there are a number of stumbling blocks. The first being client-side, when a browser window/frame is busy pushing a file or files up the pipe, it seems that it is just that, busy. Which makes it a bit difficult to talk it into displaying updates. This seems to be pretty easily solved by pushing the file upload through a hidden iframe referenced by the target attribute on the form.

That certainly isn’t where the problems end. As luck would have it, not only is the browser happy to be working against us, so is PHP, in more ways than one.

When the execution unit handling the upload gets hit with the POST, it would seem that it likes to make itself busy as well. Ok, so no way to get the status of the file upload from the thread/process actually handling the upload. Apparently there are some patches against PHP to rectify this situation, but until they get committed and see a release they are unusable for most people. I am all for gratuitously hacking my own PHP install, but it seemed like there must be a better way.

I then stumbled across another method. Scan the upload_tmp_dir (PHP INI variable) for files of a known naming scheme, looking for the one with the latest timestamp. The current size of this file could be pushed back to the browser so that it could calculate the upload progress. This method is also not without its glaring faults. The probability of a race condition is too high for any kind of production use. Oh wait, scratch that, I’m starting to sound like a PHP developer, let me rephrase… There is an unavoidable possibility of a race condition, so this method cannot be used. Well… Wait a minute, there is an upload_tmp_dir variable. Why don’t we just generate some kind of unique form id to be passed back to us when we get the POST, then it should be possible to create a directory to have PHP put the file(s) in of a known name, eliminating our race, no? I suppose upload_tmp_dir being read-only is a bit of a stumbling block with that idea, considering we already decided hacks to the PHP source were out. Not to mention PHP probably isn’t going to let us set the variable before it gets busy processing that form data anyway.

Google led me to a couple more resources for accomplishing this throughout the course of my research, but they all involved an external non-PHP script to handle the upload and drop status information somewhere accessible. Unacceptable I say! There must be a way to do it with PHP alone!

I have theorized a method, implementation forthcoming. Here is a brief summary. Have an onSubmit handler frob a PHP script and retrieve a URL to apply to the action property of the form, said PHP script will have just launched a PHP-based very simple webserver. This webserver’s sole purpose in life is to eat POST’s and parse multipart form data. This same PHP script will update an accessible location with the status of the upload. The hidden iframe trick gets used to free up the window with the form in it. This window can now pull upload status via XMLHttpRequest and update a progress bar accordingly. This method also has the benefit of being able to degrade gracefully in the event that JavaScript is unavailable on the client. The default action URL can be implemented as a standard file upload handler.