Matching Initials

December 13, 2008

Download the Excel Spreadsheet

A while ago we noticed that our company has a surprising number of people with matching initials – whenever we were writing meeting minutes we would have to use an initial for the person’s middle name to distinguish them. Out of 18 people (as it was then) there we four pairs of matching initials e.g. there were two BB’s, two JS’s etc.

What are the chances of there being exactly 4 sets of matching initials in a population of 18 people?

This seemed to be quite unlikely, however when you look at the problem it is almost the same as the famous Birthday Problem – In a group of 23 (or more) randomly chosen people, there is more than 50% probability that some pair of them will both have been born on the same day. Our initial problem is similar except we have 26×26=676 possible combinations of initials instead of 365 days of the year. The same approach can be used to calculate the odds of there being a match in our company of 18 people. However I wanted to know what the chances of there being exactly 4 sets of matching initials and got stuck, at which point I sent an email around the company (a lot of engineers and scientists) and resorted to a brute force Monte Carlo model.

The approach is outlined below (John Somerville cracked the problem the same way). We generate a random number between 1 and 676, which defines the possible set of two initials, for each the 18 people. We then do a pair wise comparison to see if there is a match between people. In the example below there is a match between Person 10 and Person 3. We can then run a series of iterations and keep track of the number of times a single match, double match etc occurs.


After a run of 10,000 iterations we got the table below. There was about a one in five chance of a single match, but for four matches the probability was very low indeed about 0.01-0.03% (only ran the simulation a couple of times). Not very likely at all!


Another guy, Maccas, came up with an even better simulation that took account of the fact that not all initials are equally likely e.g. John Smith, JS, is more prevalent that the initials ZZ. Alas the file is too big to link to from here. Here is a link on Wikipedia to letter frequencies .


Closed Form Solution

Not happy with just getting the numerical output I waited for one of my more gifted colleagues to come up with a closed form solution. Dave did not disappoint and sent the following MATLAB expression

Billy,

 

It is 1 in 52047

 

C=26*26

for i=1:14  

    Pbase(i)=(C-i+1)/C;

end;

c=0;

for i1=2:15

    for i2=4:16

        for i3=6:17

            for i4=8:18

              if (i1<i2)&(i2<i3)&(i3<i4)

                       c=c+1;

                       Prob(c)=((i1-1)/C)*((i2-3)/C)*((i3-5)/C)*((i4-7)/C)*prod(Pbase);

              end;

            end

        end

    end

end

a=sum(Prob)

 

This can be written with prettier conventional symbols. The number seems higher than that suggested by the simulations.

If anyone else has a better approach, numerical or closed form, please feel free to suggest……


A Collection of Random Clippings

December 7, 2008

“I pity Simplicio no less than I should some gentleman, who, having built a magnificent palace at great trouble and expense, employing myriads of artisans, and the seeing it threatened with ruin because of poor foundations, should attempt, in order to avoid the sad sight of walls destroyed, adorned as they were with so many lovely murals; or columns fall, which support the superb galleries, or gilded beams collapse, or doors, pediments and marble cornices, supplied at so much cost, spoiled – should attempt to prevent the collapse with chains, props, buttresses, iron bars and shores”. – Galileo’s Dialogue

“That is why, as soon as I reached an age that allowed me to escape the control of my teachers, I abandoned altogether the study of letters. And having decided to pursue only that knowledge which I might find in myself or in the great book of the world, I spent the rest of my youth travelling, visiting courts and armies, mixing with people of different character and rank, accumulating different experiences, putting myself to the test in situations in which I found myself by chance, and at all times giving due reflection to things as they presented themselves to me so as to derive some benefit from them. For it seemed to me that I could discover much more truth from the reasoning that we all make about things that affect us and will soon cause us harm if we misjudge them, than from the speculations in which a scholar engages in the privacy of his study, that have no consequence for him insofar as the further they are from common sense, the more he will be proud of them, because he has had to use so much more ingenuity and subtlety in the struggle to make them plausible”. – Descartes A Discourse on the Method

“And although logic really does contain many very true and excellent precepts, there are some many others mixed in with them that are either harmful or superfluous, that it is almost as difficult to separate the former from the latter as it is to extract a statue of Diana or Minerva from a rough block of marble”. – Descartes A Discourse on the Method

“Never accept anything to be true that I did not incontrovertibly know to be so; that is to say, carefully to avoid both prejudice and premature conclusions; and to include nothing in my judgements other than that which presented itself to my mind so clearly and distinctly, that I would have no occasion to doubt it. The second was to divide all the difficulties under examination into as many parts as possible, and as many as were required to solve them in the best way. The third was to conduct my thoughts in a given order, beginning with the simplest and most easily understood objects, and gradually ascending, as it were step by step, to the knowledge of the most complex; and positing an order even on those which do not have a natural order of precedence. The last was to undertake such enumerations and such general surveys that I would be sure to have left nothing out”. Descartes A Discourse on the Method

“I imitated those travellers who, finding themselves lost in a forest, must not wander in circles first to one side and then the other, and still less stop in one place, but have to walk as straight as possible in one direction, and not alter course for weak reasons, even if it might only have been chance which lead them to settle on the direction they had chosen; for by this means, even if they do not end up precisely where they want to be, they will eventually reach somewhere where they will most likely be better off than in the middle of a forest”. Descartes
A Discourse on the Method