Thursday, 12 June 2008

Let's Get Canonical

I work with databases - I was a Database Administrator, now I'm a Database Architect. And I've always told the developers I work with that I'm responsible for the integrity of the database not the integrity of the data.

But.

Incorrect data really annoys me. Especially the sort which has a canonical source, is only wrong because some moron mis-entered it, and then that wrong data has proliferated across the Internet. If you look down to the right, you'll see a selection of books from my collection on LibraryThing. Books are an excellent example of the kind of screwed-up data I mean. LibraryThing pulls its book data from several sources. And some of it is just plain wrong - mispelt, inaccurate, incorrect... And yet it would be easy enough to check. Just look at the book itself.

Frank Herbert did not write Threshold The Blue Angles Experience. He wrote Threshold: The Blue Angels Experience. The author of Tom Strong Book 6 is not "various" but Alan Moore and Chris Sprouse (well, they're the two that get top billing on the cover, although others did contribute).

It's not just books. It's CDs too. Whenever I buy a CD, I rip it to MP3s so I can listen to it at work and on my Yeep. And yet half the time I have to go and correct all the mispelt song titles. The Black League did not record a song called 'Better Angles (Of Our Nature)' but 'Better Angels (Of Our Nature)'.

It's not difficult to get it right. You don't see books in Waterstone's with mispelt titles. Or CDs in HMV or Zavvi like that.

In fact, I don't see why there can't be a single canonical source of such data - which would be the publishers, of course. It's in their interest to ensure it's correct. After all, how can you order a book or album if they've entered the title incorrectly? So why can't the publishers - the content providers themselves - publish correct data about their products, and allow free access to it by the likes of LibraryThing, GraceNote or last.fm? It's not that difficult...

4 comments:

Magpie said...

Do you not find comfort in the fact that we live in a fallible universe?
Sometimes we just have to be philosophical about it.

I'm fairly knowledgeable... for instance... about Japanese military history. So when I open up a book intended for a non specialist reader which makes reference to this subject... I fully expect there to be mistakes - and I am seldom pleasantly surprised. Names wrong, places wrong, dates wrong, facts wrong. Almost always.
And it being my pet subject.. I can pick it to pieces.

And then I wonder about everything I don't know so well... is that riddled with errors as well?
Probably.

So what does one do?
Well... have another beer and enjoy grumbling about it.

By the way, I originally discovered your blog when searching for Terran Trade Authority books. Glad I wasn't the only one who remembered them.

Ian Sales said...

Getting something wrong is one thing. Being unable to transform data from one format or medium to another without introducing errors is an entirely different matter. I mean, how hard is it to type in the title of a book when you have the actual volume in front of you?

On the subject of the TTA books, were you aware there's a role-playing game based on them? See here.

Magpie said...

Yeah you have a point Ian.

Regarding the TTA content... That's fantastic. I'm not actively into RPGs but I have a liking for the background that comes with them. Thank you!

Anonymous said...

Most people don't care about spelling, I'm afraid. If they can read it that's all right for them.
But you're right, publishers ought to. And judging by the typos/errata I've seen in books published relatively recently, some book publishers don't even employ adequate proof readers - if any at all.

(Just don't get me started on the misuse of the apostrophe....)


Jack Deighton