The Tar Pit: the first year

August 2, 2014 by Lucian Mogosanu

What's one more year?
A circle in the sky?
The future's in motion
You're free to fly

Last year on the 22nd of July I started this blog. It was only about the second time I had attempted to create a blank slate for me to imprint my thoughts on. I was wildly confused, but determined go ahead with this attempt of mine, and now, here, this is what it has led to.

A year gone, a small yet robust foundation for what's about to come, a shitload of drafts unpublished for now. Yes, I am honest enough to myself as to have a set of standards for self-publishing.

I will look back now and examine that which was: I wrote 38 posts comprising 33562 words. The shortest post was "The Tar Pit: about", only 84 words. The longest post was "Type algebra: the semantic ambiguity of nested lists", of 2367 words. I wrote about 883.2 words per post, with a standard deviation of 445.2, denoting a rather big variation in the length of my articles. How or why, I will let the reader judge.

About the content itself: since running a Markov chain parser on the posts would be overdoing it, I'll just look at the word frequencies. The most frequent word was "the", with 1689 appearances. The most frequent verb was "is", 469 appearances, 1066 if we count all declensions and tenses of "to be", while the second most frequent verb was "have", with 148 appearances; I won't bother finding all its forms.

The most frequent pronoun was, expectedly, "I", 564 appearances, which makes me something of a self-centered bastard, but it can be forgiven, since I'm writing a personal blog, not a fictional novel. The most frequent nouns were "Haskell", 72 appearances, and "time", 67 appearances. The most frequent proper name was "Romania" (including "Romanian" and other variations), with 84 appearances, more than Haskell only because I've included all the variations. The most frequent adjective was "more", 166 appearances.

There were too many least frequent words to name them all here, but some of the least frequent verbs, with only one appearance each, were "abstracting" or "abuse" or "begs", depending on which one you like the most. The least frequent pronoun I could find was "each", with at most 9 appearances, since it might have been used as an adjective in some cases. And by the way, the least frequent adjectives were "absolute", "acceptable", "rigid" and so on.

One of the least frequently used nouns was "achievements", while the least frequent proper names are too many to be mentioned here. I will only remind "Uhura" and "APL" as two of my favourites.

I've published about 5202 unique words, give or take a few hundred due to contractions and other such morpholinguistic phenomena. And that's about it. The bash script I used to extract this data is available for you to play with and cross-examine the results.

Filed under: meta.
RSS 2.0 feed. Comment. Send trackback.

Leave a Reply