Today’s blog post is for the geeks. Seriously, if you don’t think frequency analysis for its own sake can be fun, you probably won’t enjoy this much.
Stan Carey has a blog post about the length of the chemical name of the largest known protein, considered as though it were a word. It takes three and a half hours to read aloud, so it would easily be the longest word in the English language were it not for the fact that it doesn’t count.
I decided to play around, so I started by taking the chemical name, and (after removing hyphens/whitespace from raw text) ran it through a character frequency analyser. This told me that the letter L occurs 14645 times, accounting for 22.9% of the text. At the low end, the letter D occurs a measly 238 times, which is just 0.4%. Letters not present at…
View original post 1,253 more words