jmac muses about an Internet book database

Musings regarding an Internet Book Database

Generalities

While knocking about with my new media journal toy, I realized that, while I had some nice, free resources in the form of the Internet Movie Database and the CDDB, there isn't quite and equivalent resource for books. Why isn't there a free, online, and comprehensive book database?

After performing some websearches and interviewing many of my online associates, it seems that the closest the Web has to offer along these lines are Amazon and the U.S. Library of Congress's online card catalog. I like the latter in that it's noncommercial and more or less all-encompassing, including most everything to which an ISBN will stick. (Plus, there is some novelty to be had in seeing due dates for books that people have currently on loan. The concept of checking a physical book out of the LoC boggles me, for some reason -- I mean, it's the Library of Congress. I should expect every volume to be the most pristine example of all its existing copies, with perfectly set type of smooth, unbroken ink on gleaming leaves anathema to the public's grubby thumbs ruffling along their edges. Guess not.) Amazon, of course, represents the furthest thing from noncommercial, though it does have some neat features, in particular its deservedly famous "People who have bought this title have also bought..." link hanging under most every piece of media it sells.

Now, I could just link my books to one or the other, and be done with it, but this still doesn't strike me as an optimal solution, due to the limitations (and annoyances) inherent in one or the other. I really do, on reflection, want something like the imdb, for books. Granted, books as a medium don't lend themselves to quite the level of potential cross-referencing which makes the imdb so wonderful, simply because books tend to have far fewer names associated with them than films -- look up one movie, and you can spend hours threading around other accomplishments by this director, or this writer, or this actress who played Second Girl In Elevator, but with a book you'd have, what, the author, sometimes more than one, the year of publication, and maybe the publishing company for completeness' sake, but that's all. One could, however, make full use of a feature like the imdb's other big draw (for me, anyway), allowing all kinds of reviews, production factoids, and other interesting trivia. (Heck, in the case of really old works that have been assimilated by Project Gutenberg, you could even throw in the full text!)

Specifics

Here are some features I'd like to see, along with some implementation thoughts. I tried to keep both ease of use and security -- as well as spam-resistance -- in mind.

Title Information

Anyone can add titles, but they must be done thru ISBN, and data grabbed from elsewhere. Non-ISBNed titles can be added thru special arrangement; I wouldn't expect it to occur very often. Andrew Plotkin has already done this on a project of his (which has, no doubt, provided me with some of the inspiration for my current musings). He parsed Amazon pages to get title information by ISBN; I was thinking more along the lines of the LoC, simply because that institution, I figure, has a higher likelihood of knowing about out-of-print books. Now that I word it out, I will probably just make up a list of sites to check for information on a given title; of one fails, try the next. Anyway, the point is that it's eminently doable, none too difficult, and decent protection against those who would spam the site with bogus titles.

(The question arises: why not just grab the entirety of some other company's ISBN database, by crawling through every page their website could generate? Well, it would be a _lot_ of information, demanding more than a fair amount of disk space, I suspect, and I bet that, even if this becomes the most popular website in the world, less than half of that entire total would exist within the sphere of books that the site's visitors had actually read. Besides, a book's date of entry into the catalog would be one more datapoint to play with.)

Related Information

Anyone can add objective information relating to a book -- all that trivial but fun information I mentioned before -- but must provide a cite. One weakness I perceive with the imdb's method is that, for instance, I'll read that a famous director threatened suicide 10 times during the making of his masterpiece. That's quite a claim to make, and I really would like to know where they found this tidbit, without having to go hunt for it myself. I don't think it's too much to ask that all facts about a work one would like to share have some sort of backup, hyperlinked to an actual source if possible, but at the very least with some sort of pointer to a reference a person could, if so moved, reasonably locate -- or, if appropriate, a note that the information offered comes simply from personal observation.

User ratings, and title-relating

Every title will have an interface allowing users to acknowledge that they've read it, giving them to option to rate it as well, perhaps also writing a review or adding some opinionated keywords to a pool. Data thus accumulated can be turned into a version of Amazon's "People who have bought this book..." link, except that you don't have to buy the book from any one place to participate. This feature, more than any other, I might be looking forward to the most. We can show relationships between books that tend to be read by the same people, as well as, separately, books that are read by and also enjoyed by the same people. Amusingly, we can also easily build a display linking disliked books. Show me that on Amazon, I say to you.

Other stuff

Would I want to bother with categories? Probably. Perhaps have a dmoz-like setup, with volunteer moderators able to lord it over them, and create new subcategories? Hrm.

I think a message board attached to each individual book might be fun.

It all really comes down to this: is this something I want to put the time into? I'll have to mull it over some more. Or wait until someone shows me where it's been done already, and I just haven't seen it.

Other Ideas

A "glue" site

When I first discussed the idea of a free Internet book database with a friend, and brought up the topic of the associative linking, he said he'd like to see this done across media -- so that one could not only see what other books the readers of a certain book have tended to also enjoy, but also what films they liked, and what kinds of music they listened to. After fleshing out all the ideas as I have above, it occurs to me that I could instead make a site that keeps no title data of its own, but is instead simply the keeper of connections between titles on the imdb, cddb, and Amazon/LoC. People would be able to enter in media by serial number and rate it, just as before, but now this feature alone, spread over a wider range, would become the main focus of the site.

The downside to this is that it gets away from what I originally wanted -- a super-comprehensive book database -- and that it might be getting into some grey areas with copyright. If I rely wholly on other sites for all my data requests, how will they feel about that? I can write my own non-web cddb client, and I doubt that the LoC would take offense to me parsing its pages, but I wonder about the imdb.

I could address the first of these two issues by making the full-featured book database anyway, and tacking the cross-media glue business on later. Or vice-versa.

Return to jmac.org

Musings regarding an Internet Book Database

Generalities

Specifics

Title Information

Related Information

User ratings, and title-relating

Other stuff

Other Ideas

A "glue" site

Hello

Contact

Search

An IndieWeb Webring