Introspecting PostgreSQL and MySQL, starting out

Starting at the top, PostgreSQL and MySQL have wildly different concepts of what constitutes a schemata. In MySQL information_schema is a database that can be used directly or via cross-database queries. Whereas in PostgreSQL, you cannot perform cross-database queries because you connect to a single database. Here, the information_schema is contained within what PostgreSQL calls a schema, or a seperate namespace within a database.

Common Fields (schemata):
PostgreSQL: Always the name of the parent database
MySQL: Always NULL
PostgreSQL: Name of the schema
MySQL: Name of a database
PostgreSQL: Feature not available
MySQL: Default character set (utf8, latin1, etc.)

User (role) that owns the schema

Default character set collation (utf8_general_ci ...)

So, what we get out of all this is that for PostgreSQL, we connect to a database and schemata tells us what schema’s exist. In the case of MySQL, it tells us what databases exist. Of course application dependant, but in many cases you can treat these as the same thing.

Once we know all about our databases, or whether we wanted to know about them at all, the next thing we will probably want to dig into are tables.

Common Fields (tables):
PostgreSQL: Always the name of the parent database
MySQL: Always NULL
PostgreSQL: Name of the schema that contains the table
MySQL: Name of the database
Name of the table
'BASE TABLE' for a regular persistent table
'VIEW' for a view
PostgreSQL: 'LOCAL TEMPORARY' for a temporary table
MySQL: 'TEMPORARY' for a temporary table


To set comments on PostgreSQL tables:
COMMENT ON TABLE table_name IS 'This is a table comment';
To retrieve comments we must dig into the system information schema:
SELECT obj_description(
(SELECT c.oid FROM pg_catalog.pg_class c
WHERE c.relname='table_name'), 'pg_class')
AS comment;

Next, we’ll want to dig into information about the columns of those tables

Common Fields (tables):
PostgreSQL: Always the name of the parent database
MySQL: Always NULL
PostgreSQL: Name of the schema that contains the table
MySQL: Name of the database
Name of the table
Name of the column
Position of the column within the table, the ordering of the columns
Default (expression) for the column
Whether or not the column can be null, 'YES' or 'NO'
Name of the built-in type
'ARRAY' - see the element_types view
'USER-DEFINED' - identified in udt_name+
If domain-based, refers to the underlying
type and the domain is identified in domain_name+
Maximum length of character data if defined, otherwise NULL
MySQL: NULL if not defined

Schema the user-defined type is defined in
Name of the user-defined type

Schema the domain is defined in
Name of the domain

More specific column type than data_type (varchar(255)
'PRI' or blank ?

To set comments on PostgreSQL columns:
COMMENT ON COLUMN table_name.column_name IS 'This is a column comment';
To retrieve comments we must dig into the system information schema:
SELECT col_description(
(SELECT c.oid FROM pg_catalog.pg_class c
WHERE c.relname='table_name'),
(SELECT a.attnum
FROM pg_catalog.pg_class c,
pg_catalog.pg_attribute a
WHERE a.attrelid=c.oid
AND c.relname='table_name'
AND a.attname='column_name'));

ORM: Metadata mapping

Using Metadata – Martin Fowler

The most time spent working on a site under exhibition was the model or database layer. So before really getting into the rewrite I wanted to see what else was out there that could simplify this. Obviously I was turned to Fowler’s Patterns of Enterprise Application Architecture and ActiveRecord (rails), Hibernate (java) etc. One common element with every ORM solution I have run across is that they partially violate the DRY principle. Those that don’t, or that give you the option of completely defining your schema in a format native to the ORM are not expressive enough to use the full power offered by the database. This is when I remembered that PostgreSQL implements an extension to the SQL standard information schema that allows you to add an arbitrary comment to nearly any database object, tables and columns inclusive. After some poking around I found that MySQL supports the same, although it is not nearly as well documented. SQLite does not share this non-standard extension. So, my thought at current is to express whether relations are one-to-one, many-to-one or many-to-many directly in my table declarations. The only drawback I can see is a lack of portability, but when it comes to PHP, how many people use anything other than PostgreSQL or MySQL? SQLite should be used a great deal more, in my opinion, but in practice I do not believe it is. If at some point I must be portable to something other than the common case, well, in that event I guess I can always just do what all of the ORM’s are already doing.


MySQL 5.0 standardized join syntax

I am sure the revised / SQL:2003 standardized join syntax in MySQL 5.0 is old news to many out there. My guess is they are in the minority, and most haven’t heard a thing about it. Some may have even upgraded only to be frustrated that their queries weren’t working as they should any longer. Here’s the skinny, taken directly from the MySQL manual.

Beginning with MySQL 5.0.12, natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard. These changes make MySQL more compliant with standard SQL. However, they can result in different output columns for some joins. Also, some queries that appeared to work correctly in older versions must be rewritten to comply with the standard.

I would like to applaud MySQL AB on their latest release. Throughout its history, MySQL has been vastly out-gunned in terms of useful features by many other commercial and free databases. It has also taken a great deal of heat on many occasions due to its poor standards conformance in comparison to the other options on the market. With this release, even if they have not completely closed that gap, they have narrowed it by an impressive margin.

I do have a major gripe, however. Whilst the previously mentioned changes improves standards conformance and portability, it breaks a large enough percentage of MySQL-bound applications to warrant serious scrutiny. Apparently MySQL AB has forgotten that not the entire world is open source, and that many of us must every day maintain databases accessed by scripts and applications controlled by a third party or to which the source code is simply not available. This makes it rather impossible for any of us in such a situation to move those databases to servers running 5.0.

In my particular case, I was looking forward to leveraging triggers, stored procedures and views to reduce my administrative burden and deprecate a number of external scripts (hacks) that we use to transform data for use by other applications. The primary application sitting on this database is commercial and Zend encoded, so “fixing” the broken queries is simply not an option. Yes, we have talked to the vendor. I find it hard to believe that I am the only one in this situation.

Seriously now, how hard would it have been to add an option to enable the legacy behavior?

Update: 2/19/2006, offending commit