A while ago we noticed that our company has a surprising number of people with matching initials – whenever we were writing meeting minutes we would have to use an initial for the person’s middle name to distinguish them. Out of 18 people (as it was then) there we four pairs of matching initials e.g. there were two BB’s, two JS’s etc.
What are the chances of there being exactly 4 sets of matching initials in a population of 18 people?
This seemed to be quite unlikely, however when you look at the problem it is almost the same as the famous Birthday Problem – In a group of 23 (or more) randomly chosen people, there is more than 50% probability that some pair of them will both have been born on the same day. Our initial problem is similar except we have 26×26=676 possible combinations of initials instead of 365 days of the year. The same approach can be used to calculate the odds of there being a match in our company of 18 people. However I wanted to know what the chances of there being exactly 4 sets of matching initials and got stuck, at which point I sent an email around the company (a lot of engineers and scientists) and resorted to a brute force Monte Carlo model.
The approach is outlined below (John Somerville cracked the problem the same way). We generate a random number between 1 and 676, which defines the possible set of two initials, for each the 18 people. We then do a pair wise comparison to see if there is a match between people. In the example below there is a match between Person 10 and Person 3. We can then run a series of iterations and keep track of the number of times a single match, double match etc occurs.
After a run of 10,000 iterations we got the table below. There was about a one in five chance of a single match, but for four matches the probability was very low indeed about 0.01-0.03% (only ran the simulation a couple of times). Not very likely at all!
Another guy, Maccas, came up with an even better simulation that took account of the fact that not all initials are equally likely e.g. John Smith, JS, is more prevalent that the initials ZZ. Alas the file is too big to link to from here. Here is a link on Wikipedia to letter frequencies .
Closed Form Solution
Not happy with just getting the numerical output I waited for one of my more gifted colleagues to come up with a closed form solution. Dave did not disappoint and sent the following MATLAB expression
It is 1 in 52047
This can be written with prettier conventional symbols. The number seems higher than that suggested by the simulations.
If anyone else has a better approach, numerical or closed form, please feel free to suggest……