## Entropy of English

I just finished dusting off some Matlab code to estimate the entropy of English character sequences from a text source. In my opinion, this is a good tool to teach entropy rate. One might use the idea to calculate the entropy rate of another language, or other discrete-valued data source, like numerical data or twitter tweets. My code isn't particularly smart; my storage (and computation) is increasing as \$L^N\$ where \$L\$ is the number of characters considered and \$N\$ is the character sequence length. I'm sure someone more adept at programming can implement a more efficient version (perhaps a hash table?). However, the code does work, and computes the entropy for a sequence of characters (I've tested up to 4) from a given text file. I used Shakespeare's Romeo and Juliet, and found per-character entropies of 4.12, 3.73, 3.35, 2.99, for \$L=\$ 1, 2, 3, and 4, respectively. Info on how this is done is in my lecture 4 notes from today's Advanced Random Processes class; and the letter entropy Matlab code and Shakespeare text are also posted.