My Main Site Is Back
Much remains to be done. I'll try and post in more detail later today.
![]() |
You are viewing Create a LiveJournal Account Learn more | Explore LJ: Life Entertainment Music Culture News & Politics Technology |
One thing I didn't quite figure out with WordPress before New Year's Day was how to upload a userpic for myself. It's not a critical issue, and I kept bumping it to the back of the "look into this" list--until this morning, when I realized that a commenter had a userpic. This is not LiveJournal, where thousands of people have their accounts all on one server and userpics are stored centrally. This is my own private instance of WordPress, installed on my own hosting service, with no blogs on it but mine. So wherethehell did that userpic come from? [Reminder to LiveJournal people: My LiveJournal blog is a mirror of my main WordPress site, which is what "here" means when I say "here."] Shortly thereafter, Julian Bucknall showed up in a comment, with his own userpic. At this point, I quit gnashing my teeth at Ubuntu for being atavistic (why isn't there a dialog in the admin menu tree somewhere for setting a search path? Huh? Huh? Why?) and did some digging.
Of course, something interesting is going on here. There's a Web service called Gravatar, which maintains small images (either photos or drawn art) intended to be used as personal avatars on blog comments and discussion forums. Each image is keyed by an MD5 hash of the image owner's email address. Blog or forum software (anything, actually) simply makes a request to gravatar.com with the hash, and it gets back an 80X80 image.
This works great--when it works, which is most but not all of the time.
I'm still scratching my head here. I can see my Gravatar image on Contra from every browser in the house except the instance of Firefox 2 here on my main machine. IE6 on this box shows it. FF2 and all IEs V6 and after show it. But FF2 on this box won't--except in the "Recent Comments" pane of the dashboard. Then, sure. Gotta make it complicated.
This does not compute. It's the same damned version of FF I have running everywhere in the house. (2.0.0.20) I'm not big on plug-ins, and there's nothing peculiar about this install of XP. I do not see why viewing WordPress on this instance of FireFox would be any different from viewing WordPress with any other instance of Firefox--and it does see other people's gravatars over their comments. Just not mine.
Still stumped, and I'm posting this to see if any of you do not see my picture in the avatar block of any of my comments here on WordPress. Suggestions, of course, are welcome. I won't croak if I can't see my own gravatar as long as everybody else can, but things like this give cloud computing a bad name.
One final note, which boggles this old mind: Gravatar has a rating system. You can have G, PG, R, and X-rated gravatars. You heard me: X-rated gravatars. In an 80-pixel by 80-pixel block. Damn. I can't have a GUI dialog to set the Linux search path, and you can have an X-rated gravatar. Somebody's getting ripped here. Deciding who I leave as an exercise for the reader.
Carol and I got back to Colorado Springs a few hours ago, and the suitcases haven't been emptied yet--in fact, they're in a pile in the corner of the bedroom and may not even be unlocked until tomorrow morning. But on the way home from the airport we picked up the puppies, who seem no worse for the wear, except for their tear-staining. We give them occasional doses of Tylan to treat the staining, but we don't expect the kennel people to keep up with that. So they're going to be redeyed for a couple of weeks yet.
The priority today and tomorrow is to get ready for the big switchover from hand-edited Contra entries (something I've been doing for over ten years!) to WordPress. I did some testing of a free blog editor called Zoundry Raven while I was in Chicago, and it worked well enough for me to want to give it a shot in "production mode." This post is being edited in Raven, and if everything works correctly, it will post the same text and associated images to both LiveJournal and WordPress with one click and without a lot of screwing around. The images were an issue on my test post for December 23, and they may still be, but I'm running out of time to troubleshoot them this year, and I may have to fix'n'figger along the way if Glitch Happens. (And doesn't it always?)
The new URL for the WordPress-based Contra will be www.contrapositivediary.com, in case you haven't seen that yet. Come Friday, there will be no new posts on www.duntemann.com/Diary.htm, though links to all ten years' worth of archives will still be there, at least until I get them moved to the new domain. How far back I move the hand-edited archives into WordPress depends heavily on how much work it ends up being, and that remains an open issue.
Contra is moving to its own domain January 1, and will become a WordPress install as of that date. (Posts there now are all test posts and will be deleted before it goes live.) I've been studying Wordpress and configuring the install to do what I need it to do, and although it's taken some time and some fooling-with, long-term it will save me a huge amount of effort, compared to the hand-editing I have done now for over ten years.
One of the interesting features of WordPress is that it supports both tags and categories. A lot of people scratch their heads over that, but when I saw it I understood it immediately. Tags and categories both apply a text string to a post. The differences from a content management perspective are minor: Categories are predefined and applied via a drop-down list, but you create tags "on the fly" at post-time. You can use tags and categories interchangeably if you want, but using them together allows an interesting sort of two-axis classification of posts. One axis (best handled by tags) describes what a post is about: politics, religion, publishing, Linux, Wi-Fi, and so on. The other axis (best handled by categories) describes the shape of a post, in the sense of a literary form: idea pieces, reviews, rants, travelogs, memoir, and so on. The increase in precision is delicious: Not all posts about wine are reviews—I've done at least one wine rant and will probably do more, and wine travelogs are possible—but if you're more interested in reviews than in rants, selecting the "reviews" category and looking for the "wine" tag will get you exactly what you want.
Both categories and tags work best when used sparingly. Five hundred tags each used once or twice are not only not as useful as keyword search (which is available in WordPress) but less useful, because after awhile we forget what tags we've created and create new tags that are so similar as existing tags as to spawn serious search entropy. (I had this problem on LiveJournal more than once.)
Categories in particular should be few and distinct. I brainstormed with myself a few days ago, jotted down as many category identifiers as occurred to me, and then ruthlessly winnowed the list down to a predetermined limit of ten or fewer. The eight categories I settled on are these:
Daybook: Everyday activities; "Dear Diary:"
Ideas & Analysis: Commentary on news plus ideas and speculation
Memoir: My personal history
Odd Lots: Short items presented without much discussion
Rants: Complaints and other over-the-top material
Reviews: Evaluations of products or services
Travelogs: Where I went and what I saw/suffered/learned in going
Tutorials: How things work and how to do them
I also have a tags list that runs to a little over fifty right now, and includes all the expected keywords describing my many interests, like religion, publishing, ebooks, dogs, hardware, ham radio, psychology, and so on. I spent a sobering half an hour meditating on my accumulated tags list in LiveJournal and threw most of them out. I'm going to try to keep myself to fifty tags or fewer and don't expect a great deal of difficulty creating the list. (I'll post it once I consider it reliable.) This sort of thing is called a "controlled vocabulary" in information science circles, and the trick, of course, is to keep it controlled.
LiveJournal will continue to be a mirror. One unanswered question is whether I will attempt to import LiveJournal posts to WordPress. This apparently can be done, though I haven't tried it and understand that it could seriously mess up my newfound tag discipline—and require me to categorize several hundred posts. I may import but only selectively. Research continues.
Earlier this afternoon, I finally did something I'd been meaning to do for literally years: Configure a dedicated domain for ContraPositive Diary. It's done, and I've pointed contrapositivediary.com to the WordPress instance I created back in September on Fused Network. I'm still learning it, testing it and interviewing widgets and plug-ins, so although the domain and the blog are now live, there's still not much to see.
That will change on January 1. On that day I will stop editing Contra entries by hand (as I've done since 1998) and begin using WordPress. Entries from 1998-2008 will remain pure HTML and be accessible as such. I'm going to copy them from duntemann.com over to contrapositivediary.com, but the copies on duntemann.com will remain there until I kill the Sectorlink hosting account and move the domain over to Fused Network. I intend to keep my LiveJournal account, and use the LJXP crossposter plug-in to automatically cross-post anything I post on WordPress to LJ.
There's a lot of other stuff on duntemann.com that has to go somewhere. The duntemann.com domain is begging for a new index page anyway, and I'm working on how to organize it. I do know that my Maker material on electronics, telescopes, and kites will all be rewritten using CSS and placed under my junkbox.com index. I intend to install a new instance of the Gallery photo manager there, and move the Tech Projects portion of gallery.duntemann.com over to gallery.junkbox.com. Beyond that, well, I won't know until next year.
Some conceptual issues remain undecided; e.g., should I continue to group short link citations into larger Odd Lots entries, or just post them as I find them as individual entries? The way I do it now is an artifact of how I create Contra entries generally: I keep a text file in a window and add short items to it until I decide it's time to format them and post them as a group. That becomes unnecessary with WordPress, and I can streamline the whole process by just popping up Semagic (or something like it) and posting them Right Now instead of storing them locally until I have time to format them for uploading.
WordPress itself is an amazing thing. I'm still trying to figure out what all it can do, either by itself or with the jungle of plug-ins you can find for it. What I know it can do is save me time, which seems to be in shorter supply every year, and that, ultimately, is what the whole exercise is about.
I know I'm older than dirt. What still boggles me a little to think on is that I'm older than...blogging. Yes indeedy: Ten years ago today, I wrote the first entry for something I called VDM Diary. (VDM, of course, being Visual Developer Magazine, which I owned and edited until we shut it down in early 2000.) I had no idea what I was doing, and certainly had no idea that what I was doing would soon become a global phenomenon that would put whole newspapers in their graves and change the shape of information dissemination.
It's amusing to go scanning around the Web to read the heated arguments about who invented blogging. I'll pull an Al Gore here and say that I did. So did a number of other people. It's not like it's rocket science to take a literary form that goes back to at least 1660 and put it...on a Web server. Oh, the genius!
Actually, I'm even more like Al Gore in that I didn't invent blogging—I just like to say that I did. In truth, Lisa Marie Hafeli did, and she simply pestered me into implementing it. Lisa was my ad sales rep at VDM, and she wanted me to figure out how to get more product mentions associated with the magazine, so that she could get a little more credit with developer tools companies. We only had so many pages for reviews and news releases, but...how about talking about products online? How about just writing a little something every day or two about a product?
I remember her bringing up the idea at the beginning of 1998, and I thought about it for months before giving it a try. I had never kept a paper diary, though I wrote a lot of email and posted on forums, so I was used to writing in short pithy snippets. I was leery of pandering to advertisers, so I tried hard to avoid the appearance of just doing VDM Diary to work in product mentions. It was by intention that I sprinkled in little weirdnesses like the FBI's database of UFO sightings (June 17, 1998) and odd observations from my own work in technology, like how Word 97 irritatingly autoconverted the sequence ":)" to a smiley icon. I did the product mentions, but they didn't seem to make much difference in our ad sales efforts. So I branched out, adding personal observations on my own life, and by the middle of 1999 I was thoroughly hooked. Alas, that was about the time that VDM began imploding, and I was depressed for a solid year after Coriolis shuttered the magazine. (Coriolis itself didn't last much longer.) But even though I no longer had a magazine, by the middle of 2000 I re-established a Web diary on my own domain (duntemann.com) and have been doing it ever since.
ContraPositive is not the oldest blog still posting regularly. I think Lileks' Daily Bleat (which goes back to early 1997) has that honor, though if you know of any older ones still posting, please send a pointer. Bob Thompson's Daynotes Journal started up less than two weeks after Contra did, and is still going strong. Jerry Pournelle has been doing something with regular postings on his Web site for a very long time, but it's not organized like a diary, and very hard to figure out where everything is and how long it's been there. (This doesn't mean it's not worth reading.)
Interestingly, I've been told by a couple of people that what I do is not really a blog, and is actually more like a daily newspaper column. There's something to that. When I was a kid, I used to admire writers like Jack Mabley and Bert Bacharach (not his composer/musician son Burt) who wrote daily columns in the local newspapers. (Jack Mabley wrote a blog for a time when he was 90, until he passed away in 2006.) The energy that sustains Contra comes from a conviction learned from far better writers than I (like Gene Wolfe) that no matter what else they might do, writers should write something coherent every day. I usually manage that, though understand that I write on a lot of different projects, of which Contra is only one. Doing it daily isn't difficult. Being coherent, now, well...
In the last year or so, I've been doing fewer Contra posts and longer ones, and gathering shorter items (usually focusing on links) up into regular Odd Lots posts. I'm trying not to split my concentration too many ways on any given day (context changes are costly!) and if I'm working intensely on something like Degunking Essentials or Old Catholics, I tend not to work on Contra that same day. I have bookmark and email folders for items to address later on, and periodically go through it, deleting or archiving items once I've covered them here. The system works, and I'll use it until I think of something better.
As I've said here in a number of contexts, writing benefits the writer as well as the reader. It's good practice, it's discipline, it dissipates tension, and it's one way to stay current in the world. Having something coherent to say requires that you live an attentive life and remain curious about many different things, and the best way to learn something yourself is to explain it to someone else. Contra works for me. I hope it works for you. Thanks for reading, and stay tuned.
Mike Reith sent me a link to a nice article in the New York Times about the radical bad manners that prevail in the blogosphere. It's gotten bad enough so that Jimmy Wales and Tim O'Reilly are trying to bring about the return of blogger civility by devising a Blogger's Code that draws on community guidelines posted sometime back on BlogHer. Of course, guidelines by themselves won't work; people who say that they will misunderstand what's really going on here.
Back to that in a minute, or tomorrow if I run out of space and time. The immediate puzzle, to me at least, is why this is controversial at all. Incivility is not a free-speech issue; it's an immature-nitwits-throwing-tantrums issue. I deal with it on Contra in a number of ways, and have since I first started doing this in 1998. My one rule is simple: Be civil, or you'll be dumped into the shitcan where you belong. I do not allow unscreened anonymous comments on my LiveJournal mirror. I do not respond to angry emails, even to say something like "temper, temper!" because I know it won't do any good, and (as my mother sometimes said) it only encourages them.
I have not had to delete any signed (non-anonymous) comments on LiveJournal because (so far) I haven't gotten any really rude ones. This shouldn't surprise anybody too much. Anonymity is most of the problem. Not the whole problem, but most of it—and if we eliminated anonymous blog comments, the worst of the problem would just go away. I think that it would virtually eliminate the sorts of sociopathic comment attacks that totally freaked tech writer/blogger Kathy Sierra not long ago. (Kathy's situation is all the more remarkable because she blogs about programming languages, not George Bush. Some Guys Are Feeling Threatened, heh.)
I understand that screen names are not necessarily traceable, though if presented with proper warrants, the hosting organization can often be forced to cough up a sociopath's identity to law enforcement. I would go further and place the poster's IP address right there in the post, along with the precise time and date of the posting. A fair number of online forum systems do this, and those are the forums with the least nastiness. There's no need to pass laws, except perhaps to more crisply define what qualifies as actionable threats. If one blogging service allows users to configure anonymity options and another doesn't, the market will decide who's right.
That's a potential solution that's worth trying. The larger question is more difficult: Why is the blogosphere so filled with hate? I think I finally figured it out. I'll explain tomorrow.
By now, I think everbody in the world has heard about security guru Bruce Schneier's blog-based contest for "movie plot" terrorist scenarios. Even the New York Times picked up the story. To see the (now very long) list of scenarios, go to Bruce's blog for April 1, 2006. I ran over there the first I heard of it, but found that one of the first scenarios posted was the one I had thought of: A coordinated effort to light wildfires in American drought regions, especially California.
Seeing the list of scenarios gave me some chills, and second thoughts: Is this really a good idea? I was slow on the uptake (perhaps, as an SF writer, I can imagine things a little too vividly) but I understand now. Bruce is making a point: There are a gazillion ways to mount a terrorist attack, some of them apparently easy and not all of them suicide missions. However, for all that, it's been four and a half years since 9-11, and Islamic terrorists have not struck again.
Bruce's point is one I see no one else making: If they were coming, you'd think they'd have been here by now. I remember having nightmares over Richard A. Clarke's article in The Atlantic for January/February 2005. A fictional retrospective looking back from the year 2011, Clarke thought up a long laundry list of his own terrorist scenarios and laid them out with morbid clarity.
None of them have happened. I considered many of Clarke's scenarios unlikely at the time (and a few outrageously unlikely) but the article still scared the crap out of me. It's worth asking why we haven't been attacked. There are two points worth pondering:
The 9-11 plot wasn't so much brilliant as audacious—and lucky. It doesn't take a criminal genius to know that you can cause considerable damage with large objects full of explosive fuel moving very quickly. The wonder is that they got all the many moving parts to mesh without being detected. I'm pretty sure it only worked because we weren't paying much attention, and I'm even more sure that nothing remotely like it will ever happen again.
I'm out of time for today, but there's an additional factor or two that I'll take up tomorrow.It's miserable to make money as a writer these days. The print outlets that once represented such a good market for technical copy are falling right and left. There are too many publishers fielding too many books for too few purchasers. The reasons for all this are complex, and while the Internet gets blamed for showering free info on people who used to be willing to pay for it, the truth is that personal computing is now a mature market. Although we didn't realize it at the time, we crossed a sort of threshold in 1999 or 2000: Computers and software become Good Enough. People stopped trading up their machines and applications every 18 months. A 2000-era PC is fast enough and expandable enough (USB ports were in every PC by then) so that it can still be used in 2006—with the software of its own era, like Windows 2000 and Office 2000. What this means is that people have had plenty of time to learn the box and the stuff that's in it, and all the books they might need have long been bought. Furthermore, non-technical people have a well-known reluctance to change a system or configuration once they've gotten comfortable with it. The furious ramp-up of personal computer power that we saw in the 1990s is over.
This leaves writers in a pinch. As publishers compete for a shrinking market, royalty rates have dropped, sales totals have dropped, and money in hand is much less than it once was. So what are the options? One thing that has fascinated me in the past year or so is Google AdSense. The AdSense system is simple, and brilliant: You drop a frame in an appropriate place on a Web page, and the Google search engine fills it with ads that relate to the text in the page. When somebody clicks through to the advertiser site, you make a quarter.
It doesn't sound like much, but Web content is persistent: Unlike a magazine article that rises into view and then and sinks out of sight in a few weeks, or a book that spends a few short months on bookstore shelves, Web content can be around for years and years. My pages on Tom Swift and Hi-Flier Kites have both been up for five or six years now. Short pop-culture articles like those might have fetched $150 in print magazines. To make $150 in five years, an article need bring in only $2.50 per month, which is ten ad clicks. My hosting logs tell me that my Tom Swift page gets a pretty consistent 550-600 views per month. That's a 2% click-through rate on page views. Is this doable? I won't know for awhile, but it doesn't seem impossible.
One thing that helps is that the ads placed by the AdSense server have been eerily pertinent. See for yourself: The ads on my Tom Swift page have been things that kids' book readers and collectors would be interested in, including Hardy Boys, Nancy Drew, and other kid-nerd lit like Peter's Packets.
There are some understandable glitches, given that advertisers buy keywords and don't always do so with sufficient care. On my space-charge tubes page, I initially got an ad for Oreck vacuum cleaners. However, by this morning, all four ads were from companies selling vacuum tubes. My assembly language page gets 1500-1700 visits per month, and astonishingly, Google has been able to place at least three ads from people selling assembly language books, tools, and tutoring. (The fourth says "Assembly Operators Wanted: $10/hr." I guess EQU need not apply.)
I'm sure that much depends on the nature of the writing. Some topics just don't have a lot of potential for ads. I'm going to test this, by posting articles on topics like the biographies of eccentric popes. We'll see. On the other hand, I didn't think "assembly language" would be a phrase an advertiser would want, either.
I'm still placing ad frames on pages on my site here, and I have no data as yet. (My membership in AdSense is 36 hours old as I write this.) I'm not desperate for the money, but I'm very interested in whether the ad model can work for individual writers. I'll report back here from time to time and let you know how things go.I had an interesting if kind of obvious idea the other day for a feature that blogging services should have but (as best I know) do not: RSS feeds filtered by tag. This would be especially useful for blogs like mine that cover a lot of ground. The blog server would generate an RSS feed containing only entries tagged with a string specified by the subscriber. For example, if you wanted to read Contra but only wanted to see my entries on ebook technology, you would subscribe to an RSS feed filtered on "ebooks," the tag I use for that purpose. A number of people have expressed interest in this sort of thing, and if LiveJournal added the feature I would certainly use it. I can't imagine that it would be that hard to do. (A reminder for newcomers: Contra is simultaneously published on duntemann.com and on LiveJournal, identical in content if not in format.)
Some of you may not realize that Contra lives in two different places now: Since December 22 I've been posting entries to both www.duntemann.com/Diary.htm and to http://jeff-duntemann.livejournal.com. So far it's a big win, and once I digest enoutgh S2 (the LiveJournal styling language) to be dangerous, I can make it look like anything I want. I don't like the LJ gray background, for example, and with some S2 smarts I can make it white. I also need to figure out how to put text blocks in the mostly-empty left column so I can index to the archives on LJ as I do on duntemann.com. I'm not going to spend the time moving seven years of archives from my own domain to LJ, and I need to be able to tell people where the rest of it is stored.
I guess I have some work to do. The world could use an LJ book, heh.
I've been trying out client-side LJ editors in recent weeks, and as best I can tell (things like this are always a little tentative) the winner is Semagic, an open-source project hosted on SourceForge. It does everything I need it to do, it seems reasonably robust, and it's free, of both cost and spyware/adware. A rolling history of the project can be found here.
LiveJournal itself (the server software) is open-source as well, and there are supposedly sites out there based on the LiveJournal codebase. I haven't spotted any yet; can anybody point me to a list? I've seen a few sites that seem to have a seamless interface to LJ, even though they're not LJ-based themselves, and this intrigues me too. What I guess I'd like is a map of the LJ universe, if such a thing exists. Pointers welcome.Although I still think the blogging community should work together and create an optional controlled vocabulary for tagging, the real problem goes well beyond blogging and comes back to indexing the Web: We're classifying along the wrong axis. When used with some skill, good search engines like Google can tells us what a Web item is about. What's much harder to determine is what I call the "literary form: Is the Web item a FAQ? An online store? A blog? A tutorial? A forum? A photo gallery? This might be considered the "shape" of the item being sought. One has to be very clever to filter on the literary form, simply because there's nothing in the document that unambiguously says what the item is, as opposed to what it is about.
The most pressing need is to exclude online stores. I sometimes try to find good technical info on a technology gadget, only to find that I have to filter out 15,000 or more online stores that tell me nothing except that they carry the item in question.
Metadata frameworks exist could can handle this. The Dublin Core Metadata Initiative (DCMI) is the oldest and probably the best known, but it's little used outside of university circles, probably due to its complexity. In any event, it doesn't really have a spot for what I'm talking about. DCMI suggests a controlled vocabulary for "Type", but what they mean by "type" is type of data (still image, sound, text, etc.) rather than the intended purpose of some collection of data of various types. A blog, for example, can include text, still images, sound, and video. Each piece of the blog may be stored in a separate file and each file tagged with its DCMI Type, but nothing tells me that it's a blog.
I doubt we'd need more than twenty terms in a controlled vocabulary for "literary form." (We really need a distinct technical term for this; I'd like to use "genre" but that's considered a synonym for "type" in the DCMI definitions.) My suggested vocabulary follows, in alphabetical order:
There's some fuzziness here. Many blogs are in fact aggregators, but my definition limits "aggregator" to a Web page that provides an ongoing stream of pointers to other things, with short descriptions. Many larger aggregators include forums (Slashdot, Plastic). We have to do the best we can.
The list above is the "narrow interpretation" of literary form, in that these could be considered content templates. I had originally brainstormed a broader interpretation which included these vocabulary items:
Whether these are things of an entirely different nature is worth discussing. ("Tutorial" may belong to this second group as well. I'm still thinking.) Many blogs are virtually all opinion, and much history is in essay form.
Even with a controlled vocabulary agreed upon and in the can, there remains the problem of how to apply the category tags to a useful number of Web items. Nobody said it would be easy. I'm throwing all this out just to keep the subject in play. I have a couple of ideas of how to do this (drawing upon my ancient plan for world domination called Aardmarks) but I'm getting tired of the subject and want to spend some time on other things.
I first played with the Web in late 1993, and by early 1995, I was getting annoyed at the difficulty of finding things. I knew that the 2.0 version of the HTML markup language had a META tag, and I had the notion that one could use META to apply a classification marker to a Web page. Adding the marker to the content was trivial: Allow the user to select a category from a GUI tool, force the generated META tag onto the clipboard, and then drop it into any text editor with Ctrl-V.
Big snag: No classification system. I had learned DDC in sixth grade, from a nun who was close to as old as the DDC system itself. I knew why it was what it was, but I also knew that we didn't need to use numbers anymore. So I started looking around to see what other people had done in the classification field. Remarkably little turned up. I knew the Library of Congress Classification system (LCC) from college research, but I hated it. As Kyle McAbee pointed out in a note yesterday, you can't get your head around it. There are hundreds of categories at the highest level, and nobody but librarians ever commit even most of it to memory. The Sears List of Subject Headings is smaller than LCC but conceptually similar, and no more accessible to ordinary mortals.
The OMB has a little-known hierarchical classification system for industries called NAICS (North American Industry Classification System) that I like a lot, and in fact intended to absorb it whole into my nascent classification system. I never got quite that far, but my intent was to adopt it into a top-level category called Business & Commerce. (See below.)
Few people know that Peter Roget created a classification system for his Thesaurus, to make it easier to find similar words. It's in the book, and quite clever, though with categories like "Pushing, Throwing" (#903) it's clear that it wasn't intended to be a classification system for articles or papers, just for synonyms. I learned earlier today (thanks again to Kyle) that Thomas Jefferson had a three-category top-level hierarchy: Memory (i.e., history), Reason (philosophy), and Imagination (the arts.) Alas, even he gave up, and began shelving his books by size. (Better that than by the color of the spine.)
So back in 1995, almost as a lark, I sat down and tried to think through what a hierarchical knowledge classification system would look like. I used the Windows folder hierarchy as a visual model, and eventually created a treeview-driven utility to browse a textual category hierarchy that I called the Knowledge Explorer. My goal was to create a system that would be accessible because it wouldn't have to be memorized: You could browse it like any tree-structured data, or use text search to look for individual category tags.
I dove in. And I got addicted. For most of a year I studied the shape of human knowledge and the relationships of things to other things. It was great fun. I gave myself a broad if shallow education in things like dog breeds, world religions, and systems of government. And I created a category hierarchy, which eventually reached 2800 lines long. As I thought it through, the top-level categories rose from six to seven and then ten. I split a few up until I had twenty, then brought it back to ten:
Business & Commerce
Governance
Humanity
Mathematics
Philosophy
Reference
Regional
Religion
Science
Technology
The Web has spoken, and I'm coming around to the view that they (we) are right. Not about any judgement on hierarchical classification, but on the axis across which we should be classifying. I'm out of time, and will have to continue tomorrow.
In yesterday's comments section of the LiveJournal incarnation of Contra, Bill Higgins pointed out that there are fundamental problems with the Dewey Decimal Classification System that go well beyond Conan the Librarian. He's right, and I don't want to be misread as a Dewey fanatic. I stand with most librarians in feeling that Dewey has had his day, though it's been more than a dayit's been almost 130 years. That said, I need to reiterate that the problems with DDC are the problems of a particular classification hierarchy, and not with the idea of hierarchical classification itself.
The DDC has legacy problems like we can't even dream of in computing. I think it's fair to say that we didn't know very much in 1876we had no clue, for example, how the Sun generated its energyand viewed what we did know entirely differently than we view the body of human knowledge today. Melvil Dewey's classification assumed a fairly aristocratic Protestant Christian view of the world. Sensitivity to aboriginal peoples wasn't on the radar. Non-Christian faiths were considered subordinate and more cultural phenomena than genuine religions.
There were problems of the moment as well. In 1876, Spiritualism was in its heyday, and you couldn't spit and miss a medium. Melvil Dewey lumped paranormal phenomena in with philosophy and psychology, and there they remain, Dewey (the other Dewey, Philosopher John) and hooey, side by side.
This problem isn't unique to the DDC; the DDC has a bad case simply because it's so old. All classification schemes (not merely hierarchical ones) have to be able to adapt to changes in what we know and how we view it. And it's inevitable that people who assume an older version of the system will turn up some 404s. (But when's the last time you hit a 404, threw up your hands, and went home?)
The worst problem with the DDC is inherent in its design: the decimal-imposed "rule of ten" that allows no more than ten subcategories beneath every category. That's entirely artificial and often an extreme nuisance, but it was crucial to the DDC's original strategic mission: to allow marginally literate people to accurately reshelve books in public libraries. With a DDC callout number embossed on a book's spine, a shelver would not even have to be able to read the book's titleall he or she would need to know is how to tell when one number is greater or less than another. That's a lot easier to teach than reading. It was also useful in libraries where many books were in languages other than English.
Just about ten years ago, I got passionate about this subject (the Web was new, there was no Google, and finding anything was as much luck as skill) and without realizing the implied megalomania, I set out to recast the DDC for the Web and build some GUI tools for classifying Web sites with automatically generated META tags. I failed, but I had more fun failing than I ever had succeeding at most things. More tomorrow.

Early yesterday afternoon, Bill Roper put me on to the fact that Glenn Reynolds had posted a review of The Cunning Blood on his Instapundit, the #7 blog on Technorati. It was the shortest review I think I've ever had for a piece of SF, but his 50 words sent the novel skittering up to #2,535 on Amazon by late evening, and up as high as #76 on the Amazon F&SF stackrank, right ahead of Left Behind. (You can't imagine how good that makes me feel!) The review is doubly valuable because (having read Instapundit off and on for some time) I think his readers will share some of the perspectives in the novel, which is definitely not San Francisco liberal. (Neither is it precisely conservative or libertarian. As a friend of mine told me once, "Jeff, you're just hard to figger.")
The big problem with LiveJournal (and again, it's not so much a problem as an omission) is a problem shared by all the other blogging services/utilities I've tested: There's no standard vocabulary for tagging. Everybody makes up a personal tagging vocabulary (idiodically called a "folksonomy" even thought it takes more than one person to be a "folk") and just uses it. If everybody uses a different tag for nominally similar entries, who cares?
LiveJournal uses tagging correctly as far as it goes: You click on a tag at the end of a tagged entry, and the view changes to only those entries to which that same tag has been applied. If I tag a certain number of my entries with the word "filesharing," you can click on the tag in any of those entries and see all my entries tagged with the word "filesharing."
The problem comes up when I want to see what other people on LiveJournal have written about filesharing. Some people might also use the tag "filesharing." However, that would be simple good luck, as there's no master list of suggested tags anywhere. Another person writing about filesharing may tag pertinent entries as "file sharing" or another "peer to peer" or "P2P" or even "downloading."
This is doubly peculiar because LiveJournal does have a standard vocabulary for tagging moods. You pull down the mood tag list, and declare that you're pleased or annoyed or thoughtful, and if you happen to be feeling phlegmatic or irenic or hysterical that particular day (these are not on the list, heh) you simply type them into the adjacent edit field.
So the machinery's there. If the standard tagging vocabulary isn't there, I suspect it's because a certain very small number of snot-nosed humanities types dislike standard (technically called controlled) vocabularies of any kind, especially those (like the venerable Dewey Decimal Classification system) arranged in hierarchies. Hierarchies are undemocratic, I guess, and asking people to look at a suggested list of standard tags smacks of fascism. (If you think I'm exaggerating, you clearly haven't moved in Humanities circles much in recent years.)
The objection can be made that swallowing the whole DDC hierarchy is impossible except by librarians, and that's true. (I have the entire 4-volume DDC Edition 21 on my shelf, which is six shelf-inches of small type. I read it sometimes when I'm trying to get sleepy. It works.) However, that's the magic of hierarchies: You don't have to use the whole thing. A classification hierarchy is a tool for expressing incremental specificity. There are ten fundamental classes in DDC, and each of those is divided into ten, and those into ten, and so on. The top-level ten classes represent very broad categories like religion, science, technology, and so on. Add a few decimal places, and the powers of ten allow you extremely terse expression of some pretty narrow categories. (Supercolliders may be found at DDC 539.736, and sport kayaking can be located at 797.1224.)
But the lesson I draw from studying DDC and similar systems is that a relatively small standard vocabulary can get you very close in a global search, especially if the system allows the use of secondary tags chosen by the tagger for a specific entry. "Sports" should be a standard tag. "Kayaking" need not be. ("Water sports" could be a good compromise, if the standard tag set hasn't gotten too huge.)
I'm going to sniff around a little more and see if anyone has suggested a standard vocabulary for tagging blog entries. Eighty or a hundred tags would probably be enough; hell, LiveJournal's pull-down menu for moods holds 132 standard tags for moods alone. If I can't find a standard vocabulary for tagging I'll just invent one and post it here for discussion. I've studied cataloging and classification systems in depth and spent a couple of years creating a "knowledge explorer" category schema for tagging Web sites. That's a separate story that I'll take up here at some point.
The problem with "folksonomies" is that they work against community understanding. There is a common view of the universe that we all share, at least in approximation. (And human life could be defined as a very large number of approximations.) In a folksonomy, a tag means precisely what I choose it to mean, (as Humpty would say if he were a blogger, which if he were real he certainly would) and if nobody can do a global search on blog entries, well, what do I care? Blogging is all about me, after all.
As one who's been doing it since 1998, I'd like to suggest that blogging, if we must call it that, is not about individuals but about the global community of thought, within the context of our collective understanding of the universe. Searching that requires a controlled vocabulary for tagging. Sooner or later one is going to happen.
Some of you knew this already, but for a few weeks now I have been posting Contra entries to my account on LiveJournal, perhaps the most sophisticated bloghosting service out there. I wasn't sure for awhile that I would stick with it, but the more I probe the details of LiveJournal, the more I like it.
So what I'm going to do for awhile is post both to this page and to LiveJournal. Unless something radically wrong happens with LiveJournal, I'll continue to post to it. However, I will also post here, at least until I figure out how to retain reliable archives of the material in the event that LiveJournal goes away.
LiveJournal provides RSS feeds, something I could do manually if I felt like spending the time on itbut time is a big problem for me right now. Syndication is one way to get a broader readership, which I would like to have. LiveJournal is also tied into blog-search and rating services like Technorati, and I'd like to get into that as well. There's a lot of question about how to get my archive uploaded (keep in mind that I have well over 2,000 entries now, going back to 1998) but going forward, the advantages outweigh the drawbacks.
I have two quibbles with LiveJournal so far, one minor and one major. The major one will require an entry to itself. The minor one is simply the absurd limitations on user names. A user name may not contain spaces, and must be entirely in lower case. Good God, why? Are we still in the grip of that juvenile C programmer's tantrum against capital letters? And lest some nit ask, "Why would you want to do that?" (which means, "There isn't any reason you can't do that other than my own ego, so I have to turn the blame around to you so that I don't lose face") I will simply insist that "Jeff Duntemann" is how my name is spelled. I don't hide behind screen names. "jeff_duntemann" is a misspelling. Spaces are characters. Uppercase letters are characters. I am not e.e. cummings. (Nor e_e_cummings.)
Guys, this is just plain dumb.
I'll deal with the major quibble more thoughtfully, with some luck tomorrow.