Wednesday, June 09, 2004

Case sensible

I'm a user of subversion and therefore subscribed to the users mailing list. One of the topics that comes around a lot is the problem of case insensitivity. (Is there a special name for topics that come up regularly in mailing lists and forums in which exactly the same arguments get rehearsed over and over and eventually somebody says "look this was all in the archives from a month ago"? I propose "rehash" which has the advantage of also being a Gorillaz song)

The problem with subversion (well not subversion as we shall see) is that it is cross platform. It runs on myriads of Unix operating systems, Mac OS X and Windows. The Unix file system is case sensitive. That is: file names that differ in case only e.g. jeremyp.txt and JeremyP.txt actually represent different files. With Windows and Mac OS X file names are case insensitive, preserve case, which is to say that jeremyp.txt and JeremyP.txt are the same file, but the operating system remembers what combination of upper and lower case you typed in when you named the file.

Case sensitivity causes subversion a problem because although it is cross platform it has to support case sensitivity for the Unix boys and girls who may stupidly call two different files by the same name e.g. foobar.c and foobar.C (note that some misguided people use uppercase "C" as a file extension for C++ and lower case "c" for normal C files). If a Mac OS X person tries to check out a subversion repository with foobar.c and foobar.C in it, the checkout will fail because when subversion gets to the second file, it finds it already exists and isn't prepared to overwrite it (quite correct IMHO).

The Unix file system is case sensitive probably because Thompson and Ritchie couldn't be bothered to write the extra few bytes of code required to do case insensitive compares. In real life this is a stupid way of doing things. Any human can figure out that JeremyP, jeremyp, JEREMYP and JeReMyP all refer to the same person and there is no reason that a computer can't. Even Windows, which is much derided by Linux fans, knows how to do names properly.

The ultimate stupidity of the Unix way of working (UNIX, Unix same operating system, right?) is the humble URL. A URL looks a bit like this: http://somehost.mydomain/path/to/file.html. The somehost.mydomain part is case insensitive. This is because it is a DNS name and DNS is sensible. However, the path/to/file.html part is case sensitive because traditionally it was actually a Unix file (relative to what is now known as the document root). Here we have a specification that includes two different conventions for case sensitivity in one object. How lame is that!

Things were improving for a while. HTML tags are not case sensitive, but now it turns out that XML and by extension XHTML have case sensitive tags. This is a backward step in my opinion.

Mac OS X (my fave) shows that you can run Unix sensibly with a case insensitive file system, so I think we should rise up in arms against case sensitivity all over the world.

Sensible case, not case sensitive!

Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?