Sunday, May 30, 2004

Exclusivity lost

Oh no!

I only signed up because I was the only person citing "spam" as an interest. Now I find there are three of us.

Friday, May 21, 2004

Operating System Stress

A mystery was cleared up for me by Luke the other day.

I went to University to do Comp Sci in the mid eighties. Of course, one of the tools a comp sci undergraduate needs is a computer. At York we had a DEC VAX 11/750 for the entire undergraduate population (I was probably one of the last intakes where the norm was a single computer with lots of terminals hanging off it). The operating system of choice was UNIX - BSD 4.2 to be exact.

With my limited experience of microcomputers before colledge, I soon fell in love with the UNIX experience. It's user interface seemd elegence itself especially when compared to the clunkiness of a Commodore PET. Incidentally, by "user interface" I mean command line. GUIs were the future then.

One of the courses I took was "operating system design". In fact it is the course I enjoyed most for the whole time I was there. Operating systems are probably the most complex objects ever designed by humans and it was fantastic to be able to understand how they work. We learned about context switching and semaphores, virtual memory management, IO processing, file systems and many other things. Right at the end we talked about UNIX.

It was a disappointment.

You see we had been taught that a good operating system uses a layered approach. At the bottom layer you have the fundamental stuff which basically allows processes to run and switch the processor to other processes when the current one gets stuck or simply uses up too much time. You also have semaphores at this level which provide the low level process synchronisation. On top of this you put the memory management and scheduler and further layers provide IO and the file system and the user interface.

The UNIX approach turned out to be, well, agricultural. It had a big blob of code called the kernel. When a process used the kernel, the first thing that happened was that the interrupts were turned off and they weren't switched back on again until the kernel code had exited. In the jargon, UNIX has a monolithic kernel. I could not believe that something that seemed so elegant at the user level could be so crude underneath. In fact, at the time, this approach was an advantage. UNIX was designed to run on small minicomputers like the PDP 11. The PDP 11 has a 64Kbyte address space, so "sophisticated" was not an adjective you had the luxury of applying to your software. Its simplicity was also (in my opinion) a contribution to the fact that it ported to a lot of machine architectures. It was easy to port even if you didn't have access to the source code. That might sound strange, but UNIX isn't really an operating system, it's the definition of interface between the big blobby kernel and the user programs. I might write a blog that expands on that on another occasion.

Let's fast forward a few years. Linux is now the most popular variety of Unix like OS's at least on small computers. It's actually a prime example of what I mean about Unix being an interface definition. All the code on the kernel side of the interface was written by Linus Torvalds originally. Much of the code on the user side was contributed by various members of the GNU project. (This is why Richard Stallman insists on GNU/Linux being its proper name. In the context of this blog, all the GNU stuff is in userland - by Linux, I actually do mean the bit that LinusTorvalds wrote). Whatever SCO would have you believe, there is probably no UNIX source code in Linux, but it manages to be Unixy by having the same system calls.

Here's the mystery: one of the advantages of Linux in the educational environment is that you can get the source code. This means that, for an operating system course, you can actually dissect it to see what a real operating system actually does. At least that's what the Linuxheads say. In my course we didn't look at any real code: the principles of virtual memory management are easier to learn with a few good diagrams than 20,000 lines of poorly commented C code.

If you're going to dissect an OS for a comp sci course, Linux is a poor example because fundamentally, it still has the same big blobby agricultural kernel idea as the one I was so disappointed by in 1986. Linux is based on an operating system called Minix or so the legend goes. Minix was designed by a guy called Andrew Tanenbaum as a worked example for a book he wrote about operating system design. It's always been a mystery to me how people seem to revere him and his book which has almost acquired a classic status. After all the Unix architecture is wrong, surely. It's only good for telling you how not to write an operating system.

The link at the bottom of this blog entry is to the story of how Linus wrote Linux from Andrew Tanenbaum's perspective and it shows how I was wrong about Minix. In a fundamental way, Linux was a step backwards from Minix. The kernel side of the interface does not have to be one big blobby piece.

(This blog written on a Apple Powerbook with Mac OS X 3.3 which uses the Mach 3 microkernel (non-blobby), but "enhanced" to reblobify it - so I'm told).


Wednesday, May 19, 2004

Round the Island Race

I've signed up to crew for my boss, Andy, for the Round the Island Race this year. It's a race for sailing boats that goes anti-clockwise around the Isle of Wight. I reckon we have a good chance in the "Extremely heavy 44ft slow cruiser" class. The boat in question is a Beneteau 44, several years old but with a brand new sail.

In fact, it's not Andy who's entered but the guy he sold a half share in it to and Andy and I are both going to be helping reduce the weight by drinking the lager in the fridge. Before you ask, once drunk the lager has to go somewhere and we don't keep a tank of used beer (as it were) on board. Swimming in the Solent is not a good plan.


Sunday, May 16, 2004

Spam, spam wonderful spam

I cleared 125 spam messages out of my inbox today. This is actually a reduction because it represents all spam received since last Friday. Yesterday's logs show 186 rejections.

A post on the BBC message boards prompted me to do some research on the etymology of SPAM (the canned meat). Everybody knows that the usage of spam to refer to junk e-mail comes from the Monty Python sketch, but where does SPAM the meat get its name from? It turns out that the answer is a bit complicated as you can see by following the link which is to the Wikipedia entry for SPAM.


Up the Arsenal


I've been an Arsenal fan since aged eight when my best friend was given a new red and white tracksuit and we had to support a team that played in red and white. It's fortunate that he got his tracksuit before me because mine was blue and white which would have made us both Chelsea fans. Lucky escape, huh?

I've been thinking about how the achievement of going a whole league season unbeaten ranks with (say) Man Ure's treble. Unfortunately, being honest, I'd much rather have lost a couple of league games and still be in the FA Cup and Champions League.

Congratulations to the team anyway. It's still a fine achievement especially when you consider that the squad is pretty thin at the moment and we spent almost nothing last summer.


Tuesday, May 11, 2004


Well. OK since I said this blog would be about spam, I guess I'd better say something about it.

I have an e-mail address from easynet which I obtained 10 years ago as a result of getting my first Internet account. This e-mail address currently gets quite a lot of spam. I cleared 90 messages out of my inbox this morning.

I don't really use the said e-mail address. i'm just keeping it for experimenting on spam. The ISP redirects it all to an account at which I then redirect to a sendmail server running on my firewall. This server is currently set to deny access to mail from senders with unresolvable domains, from senders at hotmail and also has a mail filter (which I wrote myself) that looks for the Easynet RBL warning header and rejects the message if that exists too. These three measures alone reject about 200 messages per day. Of course, there may be false positives, but nobody I know uses that address anymore.

The next stage is to write a Bayesian filter that plugs in to the sendmail milter API. I think that's some way off though.


Monday, May 10, 2004

Brain dead system

My first annoyance with this blogging system has happened. When trying to set my photograph URL, it refused to accept that a URL ending in ".JPG" is a JPEG image. It's funny how browsers since the year dot have been able to figure that out.

First post

Well everybody else has got one, so I can't buck the trend can I?

This page is powered by Blogger. Isn't yours?