Real world effects of changing rel canonical link element

In 2009, Google introduced a method website owners could use to disambiguate duplicate content. By specifying a rel=canonical link element in the header of the page you give the search engine a hint as to the URL which should be authoritative for the given content. It should be noted Google has indicated they consider this method a hint, and not a directive. The conditions under which the hint will be ignored are not known, but such conditions are presumed to exist.

Imagine a simple example, anyone who has purchased a home or property in the US is reasonably familiar with the Multiple Listing System (MLS). Real estate agents add properties to the MLS and the exact same information shows up on the website of every agent and agency. How does Google know which website(s) are authoritative for this information if it is the same on potentially thousands of websites? This is a contrived example of a real-world problem, and implementing a strategy around canonical link elements can help to ensure people end up where you want them to be. One strategy might be to get visitors to the website of the agency, rather than the individual agents.

That information is all well and good, in theory, but how does it actually work in practice?

A tale of two websites…

Recently there was a case where a series of several dozen guest blogs on an established website needed to be moved, removed, or somehow re-incorporated into the overall strategy. The established site and its mission had grown and changed, meanwhile, the blog series in question had grown less relevant to the overall goals of the site. But it was still good content that many people accessed and used as a reference!

It was decided the content wasn’t “hurting” anything, and could remain, but would be inaccessible via primary navigation routes and should over the long term be given a new home. The original author of the blogs was willing to give the content a new, permanent, home on his own personal site. The authors site did not yet exist, had no existing inbound links, and zero authority with search engines — a blank slate!

Each blog post in question was re-posted on this new website, several dozen posts in total, a handful of which receive a reasonable amount of search engine traffic. The canonical links for the articles on the established site were then changed to reference these new pages on the formerly empty domain.

Google quickly adapted to the new “home address” of these pages, and within a matter of days, the new domain was seeing all the search engine impressions for these articles. After this quick adjustment over a period of a few days, the pattern held over the following month.

In the following graphic, a screenshot from the Google Search Console, you can clearly see the number of search engine impressions served by Google quickly ramped from 0 to in the neighborhood of 50 impressions per day.

snip-search-console-canonical-change

Here you can see the same data, over a slightly longer period, from the established site. The “new” site neatly stripped away around 10% of the organic search engine traffic from the established site.

snip-search-console-canonical-change-source

Most scenarios involving duplicate content management with the rel=canonical link element aren’t going to exactly match this one, so please take these results with a grain of salt. That said, it does clearly show the cause, effect, and timing surrounding changing the canonical links for established pages. It also clearly shows that Google pays attention to these canonical elements and can take fairly swift action on them.

The Barack Obama Call Center


I’m an Obama supporter, but I’m also that guy who wants to put a knife in someone’s eye when he receives an unsolicited political call. Especially, ESPECIALLY, when it is an unsolicited unprofessional political call. So, now…. … I’m really not sure what to say about this. Introducing, the Barack Obama call center! They are inviting me and you and any other would-be supporter to go ahead and start calling to voice our support. Who would you be calling? Your friends? Oh no, they will supply you with scripts to read to the complete strangers whose names and numbers are being provided! … Great …

PowerDNS / PostgreSQL & Web Interfaces 2

After a bit of eat and drink, as well as a half hour of zOMG why is this not werking!?!?! (iptables), Supermaster/Superslave is operating famously. It seems to “just work”. No complaints thus far, which is, well… highly unusual for me to put it lightly.


Feb 03 05:35:55 Received NOTIFY for flixn.com from 72.232.239.130 for which we are not authoritative
Feb 03 05:35:55 Created new slave zone 'flixn.com' from supermaster 72.232.239.130, queued axfr
Feb 03 05:35:55 gpgsql Connection succesful
Feb 03 05:35:55 No serial for 'flixn.com' found - zone is missing?
Feb 03 05:35:55 AXFR started for 'flixn.com', transaction started
Feb 03 05:35:55 AXFR done for 'flixn.com', zone committed

PowerDNS / PostgreSQL & Web Interfaces

http://evilcode.net/sjg/infernal/PowerDNS/SCHEMA/

I have been looking at PowerDNS for a while now, and after regular confirmation that it is in fact performing extremely admirably over at DreamHost I decided that it was time to deploy it.

While PowerDNS is the least braindead DNS server I have ever come across, there were a couple of things that I was not 100% happy with, at least in terms of coupling it to a web frontend.

  • SOA records are stored space-delimited. This would hardly be a problem except that our serial is stored here. In its defense, PowerDNS has an alternate method of handling serials that is probably better in most circumstances. Hardly, but we would still have to break it apart and put it back together again to edit the minimum (default in practice) TTL, etc.
  • Record types are stored textually. Even when implemented as an enumerated value this still violates DRY, as you must re-state these values in your frontend code.
  • Everything must be represented fully qualified. This = FAIL from a normalization perspective.

Here I have come up with a somewhat optimal schema from the point of view of my web interface, and I have tied it to PowerDNS’s preferred table structure via domain logic. This could have been handled in other ways of course, but I tend to like this one for a number of reasons.

  • First, the alternative is to add custom queries to the PowerDNS configuration file to make it understand whatever schema we might have in place, PowerDNS actually makes this very easy.
  • Another alternative would be to use dynamic (normal) views.

On to the benefits, some being quite minor.

  • Querying against serialized views will have performance benefits versus the above two options, this of course has to be weighed against the cost of maintaining the views.
  • As mentioned, PowerDNS has two methods of handling serials, either in the SOA record, which we are keeping up to date with our domain logic. Alternatively PowerDNS will scan each record for you to find the most recently updated (if you maintain change_date). The former should logically be more performant, so we have implemented that option. This could have been handled either way in the domain logic, but most importantly we aren’t relying on our web frontend to keep our serials up to date.
  • Most importantly, namely for debuggability, data on master’s and slave’s “looks the same”.

To get you rolling your PowerDNS configuration file need not be any more complicated than this:


launch=gpgsql
gpgsql-host=hostname
gpgsql-user=powerdns
gpgsql-password=password
gpgsql-dbname=pdnstest
daemon=yes

I haven’t tried slaving yet, but I suspect it will work without a hitch. Will update here when I do and when this rolls out.

ActionScript

This week I decided to toy with ActionScript/Flash a bit (for the first time, really). I’m using the FlashDevelop IDE, so it’s all free goodness, no shelling out $500 to Adobe. Anyway, I wrote an MP3 player that is devoid of any sort of flash user interface, completely controllable through JavaScript. It’s a mere 162 lines of ActionScript and weighs in at 2071 bytes as an swf. It supports a wide range of operations, load, play, pause, stop, setvolume, getvolume, ispaused, getpauseoffset, getcurrentfile, getduration, getposition, getbytesloaded, getbytestotal, getid3, as well as a number of asynchronous JavaScript callbacks (notifications) on various events, loadcomplete, playcomplete and id3found. You can see it in action with possibly the simplest UI possible here: http://evilcode.net/sjg/player/.

The real question that I am trying to answer for myself is, does eliminating the flash user interface somehow make it [flash] more palatable?

JavaScript/File-based HTTP request logging

http://httpd.apache.org/docs/2.0/mod/mod_log_config.html

I just had the thought that it should be pretty feasible (if not trivial) to tie JavaScript-based request logging (like Mint and Analytics) to traditional file-based request logging using cookies and/or headers and CustomLog in Apache or similar in other httpd’s.

The question being… Is this somehow useful? I think it potentially could be, I’m just not 100% on how as yet.

PHP/AJAX file upload with progress bar, part 3

Part 1, Part 2

I pulled this back out over the weekend and did a little work on it. The backend code is now totally functional, however I would consider it to be alpha quality. It just needs to be wrapped in a UI now, check it out and let me know what you think. Try the demo with JavaScript enabled first, once the upload completes you will see the result is a mash-up that looks like PHP’s $_REQUEST/$_FILES. Hit the resulting url again within 60 seconds and you will see the status side of things (Using a GET request, you don’t want to re-post the form data). Total bytes, bytes received, percent complete. Try it again with JavaScript disabled and behold the standard php handler.

Source
Demo

ZK, AJAX without JavaScript?

ZK Demo

ZK is a slick-looking presentation layer toolkit that seems to at least be growing in popularity on SourceForge. I do not know of any real sites using it. It looked very interesting at first glance, especially in conjunction with something like Hibernate. After some further digging, my second impression is that it suffers from roughly the same major flaw that afflicted the first iteration of exhibition. Namely, it presented some simple ways to do complex things, but when it really came down to it, the complexity was merely being moved elsewhere. Not just that, but the complexity was now being exposed using a technology few people know or understand. Am I missing something here, or is ZK just refactoring the complexities of developing an application for the web?

XSLT shorthand for PHP

As I trudge toward Exhibition v2, I am taking the templating in an entirely new direction. Exhibition deals with XML, a lot of XML. As a direct consequence there is a fairly substantial amount of XSLT. Noticing that XSLT has a very straightforward syntax, I decided to write a little preprocessor to save my fingers some walking later on. Most commands boil down to “command” or “command required_argument” (required_argument usually being an XPath expression). So, that’s exactly what I pruned it down to.

<@ for-each atom @>

versus

<xsl:for-each select=”atom”>

Check it out —

xsltemplate.phps

xsltemplate-test.tpl
xsltemplate-test.xml
xsltemplate-test.xsl
(examples stolen from W3Schools)