There are many ways to come at a problem ranging from thorough analysis through to use of simple heuristics and rules of thumb. I always like it when people can get to an answer by looking at it slightly differently. I came across this one in Baeyer’s book, Information.
Samuel Morse, of Morse code fame, wanted to develop the most efficient way to code letters so they could be transmitted quickly. The principle of achieving this is pretty obvious; the most efficient code assigns short symbols to common letters, and long symbols to rare ones. He then had to answer the question what is common and what is rare? What is the order of the frequency with which letters appear in English? One way to gather such statistics is to select a text, and count the number of times each letter appears. This method works well for the three or four most common letters but it becomes less reliable for the more uncommon ones, such as Q, X, Z, unless the reference text is very long. Besides, who wants to count letters from a 1000 page book. Morse’s pragmatic solution was a lot quicker; he walked into a newspaper office and counted the number of letters in each compartment of the printers box. Presumably decades of experience had reduced its contents to an efficient compromise between supply and demand. Since he found more Es than any other letter, E is represented by a single dot, followed by T with a dash. X,Y and Z, on the other hand, whose compartments in the type box where relatively empty, drew four symbols each.