All posts by sjg

Tracking MachForm form submissions with Google Analytics

machform-trackMachForm (self hosted) is a great tool for managing many different types of user submissions from visitors to your website. While WordPress has a great form option in Gravity Forms, MachForm is platform agnostic and has a number of integration options allowing it to coexist fairly well with almost any LAMP-based web deployment.

Since version 4 MachForm has allowed for loading of a Custom Javascript File, configurable on a per-form basis. This provides an excellent facility to track form submissions in Google Analytics. These events can then be used to create goals, etc.

This is actually perhaps easier than it sounds, the first step is adding the Google Analytics embed code for the website to a file (assuming you are using the default iframe embed mode of MachForm), without the line for tracking a pageview. Since MachForm uses jQuery internally, we can use jQuery here to attach events to the form that will submit our Google Analytics event when the form is submitted. The portion of the code that extracts the title of the form may be different depending on the MachForm version, MachForm theme chose, or etc.

        (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),

ga('create', 'UA-XXXXXXXXX-1', 'auto');

        var title = $('#form_container > h1 > a').html();
        ga('send', 'event', 'form', 'submit', title);

        var form = this.closest('form');
        }, 500);

Once this JavaScript is saved to a file and uploaded to your server, add the path to under Advanced Options for all the forms you wish to track and you are off to the races.

Magento Integer based SQL injection vulnerability in product parameter

Recently I was asked to look into a potential PCI compliance issue in Magento 1.7/1.8/1.9. The potential issue was uncovered by ControlScan. The summary was as follows:

Integer based SQL injection vulnerability in product parameter to /checkout/cart/add/uenc/<snip>,/product/<id>/
Risk: High (3)
Port: 80/tcp
Protocol: tcp
Threat ID: web_prog_sql_integer

Upon diving into the additional supplied information, it was almost immediately clear what the test was doing. It was performing a POST request against the URL: /checkout/cart/add/uenc/<snip>,/product/XYZ/
XYZ translates to a valid Magento product id. In the payload (POST’d multipart/form-data) that would get parsed into the PHP $_POST superglobal, an initial request passed product=XYZ, and a subsequent request passed product=XYZ-2.

The scan saw the same output returned for each request, and thus assumed the cart might be getting “duped” by the invalid XYZ-2.

Let’s take a look at the code which handles this submission (which is an AJAX style action that adds a product to the cart). It is located in app/code/core/Mage/Checkout/controllers/CartController.php, starting around line 170, in the addAction public method. The take-away here is the $params variable setup in addAction, as well as the product id discovery in _initProduct both retrieve their data by calling $this->getRequest()->getParams(); — this parameter data comes from any number of places, including the URL, GET, or POST. In this instance, the product variable is being parsed out of the URL, and the product supplied via POST is never referenced. No wonder the output was the same, the URL was the same in both cases, the modified POST data was never a factor.

If you simply want to tighten up your cart to get it to pass your PCI compliance scan, the following code will do that for you, just replace the top part of addAction with the following, and be prepared for an eventual upgrade to undo this patch.

public function addAction()
    $cart   = $this->_getCart();
    $params = $this->getRequest()->getParams();

    $postInput = file_get_contents("php://input");
    $postStrDataArr = explode("\n", $postInput);
    $postStrData = array_pop($postStrDataArr);
    parse_str($postStrData, $postData);

        if ((isset($postData['product']) && $postData['product'] != $params['product']) || !is_numeric($params['product']))
        throw new Exception('Invalid Product ID');

    try {

This modification compares the parameter parsed via the URL with the parameter passed via POST and throws an Exception if the two do not match.

No doubt there is a better and more Magento-esque way to remedy this issue, but the above will work in a pinch.

Real world effects of changing rel canonical link element

In 2009, Google introduced a method website owners could use to disambiguate duplicate content. By specifying a rel=canonical link element in the header of the page you give the search engine a hint as to the URL which should be authoritative for the given content. It should be noted Google has indicated they consider this method a hint, and not a directive. The conditions under which the hint will be ignored are not known, but such conditions are presumed to exist.

Imagine a simple example, anyone who has purchased a home or property in the US is reasonably familiar with the Multiple Listing System (MLS). Real estate agents add properties to the MLS and the exact same information shows up on the website of every agent and agency. How does Google know which website(s) are authoritative for this information if it is the same on potentially thousands of websites? This is a contrived example of a real-world problem, and implementing a strategy around canonical link elements can help to ensure people end up where you want them to be. One strategy might be to get visitors to the website of the agency, rather than the individual agents.

That information is all well and good, in theory, but how does it actually work in practice?

A tale of two websites…

Recently there was a case where a series of several dozen guest blogs on an established website needed to be moved, removed, or somehow re-incorporated into the overall strategy. The established site and its mission had grown and changed, meanwhile, the blog series in question had grown less relevant to the overall goals of the site. But it was still good content that many people accessed and used as a reference!

It was decided the content wasn’t “hurting” anything, and could remain, but would be inaccessible via primary navigation routes and should over the long term be given a new home. The original author of the blogs was willing to give the content a new, permanent, home on his own personal site. The authors site did not yet exist, had no existing inbound links, and zero authority with search engines — a blank slate!

Each blog post in question was re-posted on this new website, several dozen posts in total, a handful of which receive a reasonable amount of search engine traffic. The canonical links for the articles on the established site were then changed to reference these new pages on the formerly empty domain.

Google quickly adapted to the new “home address” of these pages, and within a matter of days, the new domain was seeing all the search engine impressions for these articles. After this quick adjustment over a period of a few days, the pattern held over the following month.

In the following graphic, a screenshot from the Google Search Console, you can clearly see the number of search engine impressions served by Google quickly ramped from 0 to in the neighborhood of 50 impressions per day.


Here you can see the same data, over a slightly longer period, from the established site. The “new” site neatly stripped away around 10% of the organic search engine traffic from the established site.


Most scenarios involving duplicate content management with the rel=canonical link element aren’t going to exactly match this one, so please take these results with a grain of salt. That said, it does clearly show the cause, effect, and timing surrounding changing the canonical links for established pages. It also clearly shows that Google pays attention to these canonical elements and can take fairly swift action on them.

New Sturgis Area Resource: Visit Sturgis

A new resource for visitors visiting the Sturgis area recently launched, it is a website called Visit Sturgis. The Visit Sturgis website contains a wealth of information visitors would have a hard time uncovering in a short trip without a local guide. Not only will the new website be useful as a planning tool for those who will be visiting Sturgis as a pre-planned destination, but also for ad-hoc visits. Sturgis sits directly in the heavily trafficked Interstate 90 corridor and as a result sees many thousands of visitors every year who are simply passing through.

The resources and information on the Visit Sturgis website will cover the usual suspects, such as information about lodging, restaurants, and local businesses of interest. The website will also contain a great diversity of information about local events, and little known avenues for recreation.

Sturgis plays host to many events in addition to the well known Sturgis Motorcycle Rally, these include the Sturgis Camaro Rally, Sturgis Mustang Rally, Tatanka Mountain Bike Race, Sturgis Gran Fondo, and many more.

There are also many miles of non-motorized single track trails accessible directly from town that cater to mountain bikers, hikers, trail runners, dog walkers, and horseback riders. See the recreation information on the Visit Sturgis website for more information about accessing these trails.

While this website may have been recently launched, it already contains valuable information that can not be found elsewhere. Expect it to continue to grow into and every more informative resource as time progresses.

New website

The new Black Hills Trails site is now live, although a bit sparse at the moment. Significant changes and additions are expecting in the coming weeks and months.

The maps and mapping functionality of has been given to the Black Hills Trails organization and incorporated into this new site. As the maps and mapping functionality are extended and enhanced the website will eventually be phased out altogether.

Generating aerial tiles from NAIP imagery is working on a mobile (iPhone/Android) trail mapping application for the Black Hills area and one of the chief considerations is making it work offline. As any local can tell you, cellular service can be spotty in the Hills, even in town in some cases! The plan is to let folks optionally download the map data/tiles directly to their device using the MBTiles format.

So you need to create your own aerial imagery tiles for a web slippy map project do you? Before you dive too far down this rabbit hole, take a look at the MapQuest Open Aerial tiles, they are available for use under very liberal terms and are good quality. I was only unable to use them for this project because not all tiles were available at all zoom levels that I needed for my area.

Getting and processing the data

Almost every year the NAIP (National Agriculture Imagery Program) captures imagery of most of the country during the growing season. This high quality aerial imagery of the United States (the same imagery used by Google and other web mapping providers) is available for free download from the USDA Geospatial Data Gateway. It may be available for order in other formats, but the only option available for download in my area is an ESRI Shapefile / MrSID format. The MrSID format is a typically lossy image format designed for very high resolution imagery. Unfortunately I have not found many good inexpensive tools for working with MrSID files, so the first step in this process is converting to a format that is easier to deal with in terms of software support, in this case GeoTIFF. The GeoExpress Command Line Utilities published by LizardTech, available for free download at the time of this writing, are able to do this extraction for us with the following command:

mrsidgeodecode -wf -i Crook_2012/ortho_1-1_1n_s_wy011_2012_1.sid -o Crook_2012.tiff

In this example I am using imagery for Crook County, Wyoming. The -wf (world format) option to mrsidgeodecode seems to be important, it tells it to create a geo-referenced tiff file.

Now that we have our imagery in the GeoTIFF format we can use the open source GDAL/OGR command-line utilities to slice and dice the data. The following commands used from here on out: nearblack, ogrinfo, gdalwarp and gdaladdo all ship with the GDAL/OGR libraries.

The next hurdle is that this raster imagery always has a border of not-quite-black pixels that need to be pared off somehow prior to being able to use multiple adjacent images (counties in my case). If your target tiles exist within one county (or one MrSID file as downloaded from the USDA gateway) then you probably do not need to worry about this.

nearblack -nb 5 -setalpha -of GTiff -o Crook_2012_NB.tiff Crook_2012.tiff

The -nb 5 option in effect tells nearblack how aggressive to be, this seemed to work for me but your mileage may vary.

After trimming the edges we need to warp the GeoTIFF to our target projection. Basically all web mapping uses the same projection, EPSG:3857. In my instance I am creating tiles with TileMill and their documentation specifies that GeoTIFF’s should be in this projection. The only trick here is that you must supply the source projection, the GeoTIFF contains coordinate information but it lost its projection along the way. Use the ogrinfo utility to first get a list of layers available in the shapefile you downloaded from the USDA.

ogrinfo Crook_2012/ortho_1-1_1n_s_wy011_2012_1.shp
INFO: Open of `Crook_2012/ortho_1-1_1n_s_wy011_2012_1.shp'
      using driver `ESRI Shapefile' successful.
1: ortho_1-1_1n_s_wy011_2012_1 (Polygon)

Then, you will need to get the information about that layer to find the original projection.

ogrinfo Crook_2012/ortho_1-1_1n_s_wy011_2012_1.shp ortho_1-1_1n_s_wy011_2012_1
INFO: Open of `Crook_2012/ortho_1-1_1n_s_wy011_2012_1.shp'
      using driver `ESRI Shapefile' successful.

Layer name: ortho_1-1_1n_s_wy011_2012_1
Geometry: Polygon
Feature Count: 12
Extent: (489299.300000, 4885056.470000) - (580705.680000, 4990601.000000)
Layer SRS WKT:

In this case it is “NAD_1983_UTM_Zone_13N”, you may have to Google around to find the corresponding EPSG number, in this case it is EPSG:26913. After all that, we can warp the GeoTIFF. The –config and -wm options here speed up gdalwarp by letting it use more RAM, you may want to play with these a bit to figure out what is fastest for you.

gdalwarp --config GDAL_CACHEMAX 300 -wm 300 -s_srs EPSG:26913 -t_srs EPSG:3857 -r bilinear -of GTiff -co TILED=yes Crook_2012_NB.tiff Crook_2012_NB_GoogleMercator.tiff

A person could at this point use gdal to merge multiple GeoTIFF’s together (if applicable) and then use the gdal2tiles script to generate tiles directly. In my case, my workflow already involves creating tiles with TileMill, so I opted for that route.

This next step is optional in theory but necessary in practice if you want to be able to preview the tiff files in TileMill or other imaging software. What this does is add smaller versions of the GeoTIFF to itself for lower zoom levels.

gdaladdo --config GDAL_CACHEMAX 300 -r cubic Crook_2012_NB_GoogleMercator.tiff 2 4 8 16 32 64 128 256

Creating tiles

After all of that is done you can load all of your GeoTIFF’s up into TileMill and see how they look. I give each TileMill layer a class called “geotiff” and use the following style.

  raster-opacity: 1;
  raster-scaling: lanczos; /* Best quality but slowest */

You can then export tiles using the standard TileMill process.

There are always of course extra considerations, such as output image size/quality. When generating map tiles of roadways and the like very often the PNG format is the best choice, but for our aerial imagery we want to use JPEG. Following are two tiles at 3 different quality / compression levels. From left to right, 65%, 75%, 85%.


Here is what the resulting file size was for the entire area at each of the three quality levels.

3570838528 Mar  7 15:55 aerial_65.mbtiles
4288478208 Mar  6 18:00 aerial_75.mbtiles
5750595584 Mar  6 05:42 aerial_85.mbtiles

Other considerations / Future improvements

Going through this process absolutely explodes the file size. The original NAIP imagery files for my working area are 33,743,711,602 bytes, or a little over 30GB. After converting to GeoTIFF and doing the processing mentioned above, the resulting size of the tiff’s is 952,212,023,220 bytes (closing in on 1TB). One way to greatly reduce this would be to use the JPEG-in-TIFF options that gdal provides.

My biggest complaint with where I am at with this now is that not all of the images are uniform with respect to color, brightness and contrast. gdal provides some options that could be used to adjust these from the command-line, but it would be a very manual process and may take a long time if many iterations are required. I may look at adding some image filters to mapnik (the mapping engine under TileMill) to enable specifying some simple Photoshop-style corrections.

Distances added to maps and descriptions

Distances (in miles) have been added to all of the maps and descriptions on the website. For trail networks without a specific route, such as the Victoria Network, the distance for each trail segment is listed at the end of the description of that segment. For trails that have a defined route, such as Victoria’s Secret,
Victoria 15, and the Victoria Lollipop, the beginning and ending mileage for each segment is listed at the start of the description. For these trails there are also balloons displayed on the map now indicating mileage traveled at certain waypoints.

Victoria Lake Trail Network Update, Site Updates

The site got a new index page a number of days ago which may or may not be more aesthetically pleasing or easy to navigate than the previous iteration. We will continue to iterate on the design and functionality of the website in search of something better than what came before, suggestions are always appreciated.

In that vein, the Victoria Lake area has seen a large update. There are now individual trail pages for Victoria’s Secret, the Victoria 15 Loop, the Victoria Lollipop Loop and there is also a page for the Victoria Lake Trail Network that showcases all of the trails in the area.