Monday, February 14, 2011

Building the Writer's Knowledge Base—by Mike Fleming

WkbBadgeHannibal from the A-Team always loved it when a plan came together. Unfortunately, the Writer's Knowledge Base (WKB) didn't emerge from some well thought out plan. While I'm not a believer in destiny I'll admit that sometimes it does seem like a real force and the WKB could be an example. In this post I'll describe how the WKB came to be, how it works, and why I'm doing it.

When I started following writers on Twitter I quickly realized that the excellent links they posted had the lifespan of a gnat. Actually, gnats live a lot longer. It seemed like a shame that the links had such a short shelf life. The actual page at the other end of the link was still there, of course, but finding it is a lot trickier when you don't have a human curator separating the good from the bad.

While writing this post I dug through my notes to find what I wrote about the idea I had for fixing the problem. I found a less-than-eloquent entry on September 30, 2010, that says:

"Monitor writers' tweets for links to writerly subjects especially on the craft of writing. Then, user could search for "characterization" and get links to all kinds of articles."

While those two sentences clearly foreshadow the WKB as it is today, back in September it was just another idea in a bucket full of them. While I suspected it was a good idea I decided to continue focusing on Hiveword which is the fiction organizer I'm developing. In fact, the idea itself was intended to be part of Hiveword at some point. That's the context I was in at the time.

Now keep in mind that @elizabethscraig is one of the Twitterers I was following and while she is not the only one to post links I think we can all agree that she is by far the most prolific one. So imagine my surprise when I saw her post on December 13th where she was exasperated about the difficulties of making all of those great links findable.

Well.

The problem was she had content and no technology and I had technology and no content. Isn't that how Reese's peanut butter cups were born?

This smacks of destiny, I thought. So, I slept on it and on the next day sent Elizabeth an email outlining my proposed solution. After running a background check on me she decided that together we could provide a compelling free service to writers everywhere. Bloggers would benefit, too, since they would have another source of traffic. There was little downside.

With Elizabeth on board I set off to work. From concept to implementation it took under a month to do on a part-time basis since I have a day job. Part of the reason it was so fast was that I was able to leverage the platform I already had for Hiveword. Another reason is that I had an appendectomy in early January and the doctor said I should stay home for a week. How convenient. 40+ hours of work on the WKB. w00t!

Of course, telling you how it works would spoil some of the magic, no? I think you'll find that it's actually fairly mundane. But if you insist...

The WKB automatically checks Elizabeth's Twitter feed once an hour, pulls any new tweets since the last time, and stores them in a holding area in the database. Each day I manually process each link by copying and pasting the article content into the search engine component. The search engine indexes the content and makes it fast to search. That's where all the magic is, of course.

You might be wondering why I do the manual part when I could have the computer do it. I'll tell you, that's a mighty fine question given the number of links Elizabeth tweets! The answer is simple, though, and it's about search quality. If I index the entire article the search engine component will consider the whole page including header, footer, sidebars, ads, comments, etc. That can obviously throw off the relevance score when you do a search. If bloggers would agree on a standard way of marking content I could pull it automatically but there's not enough consistency for me to do that.

Then there was the fact that Elizabeth already had approximately 5,500 links on her Twitterific pages. Those links were just sitting there daring me to get them indexed. I'm pleased to say that I corralled those rascals but I didn't process them manually, of course. Rather, the entire page of each article was indexed which, unfortunately, has the drawbacks mentioned above. That's why you'll sometimes see strange snippets under a result. Sorry about that. However, there are now more than 6,000 articles in the WKB for you to learn from and enjoy.

Finding articles is rather easy because the interface is intentionally simple a la Google. Searching is an obvious way to find articles but you can find plenty of gems by trying the Random or Popular links. Random is self-explanatory and Popular would perhaps be better named "Top 100" since that's what it's really showing. Give them a try if you haven't already; I think you'll be pleasantly surprised.

You might also be wondering why I'm doing this. There are actually a bunch of reasons. For example, as a programmer I've benefited greatly from the work of others on the Internet who gave freely of their time and skill and I've wanted to contribute something back for a while now but hadn't been able to hit on the right thing. I always assumed it would be something for programmers but giving something to the writing community is like paying it forward. That works for me.

Also, the WKB amplifies the work that Elizabeth is doing for writers so that's a win, too. As mentioned earlier bloggers will get more recognition and traffic and users of the WKB will hopefully learn something from their time spent using it. That means four distinct parties can benefit from the WKB -- how great is that?

I'm elated by the reception the WKB has gotten from the writing community and I'm pleased that so many get value from it. I enjoyed creating the WKB and of course it wouldn't be as useful as it is without Elizabeth's help. She does a huge amount of work digging up the content in the first place.

That said, the WKB is not done. Would you believe I have a bucket full of ideas for it? Stay tuned!

-----------------------

Mike Fleming is a software engineer who can't seem to get enough of his craft. Give him something to do by suggesting some features for the WKB. He also maintains the WKB's Facebook page which he considers a place for insiders to stay informed about WKB news and tips. You can also sign up for the Hiveword email list if you want to be notified when the fiction organizer is ready.