Matching Initials

December 13, 2008

Download the Excel Spreadsheet

A while ago we noticed that our company has a surprising number of people with matching initials – whenever we were writing meeting minutes we would have to use an initial for the person’s middle name to distinguish them. Out of 18 people (as it was then) there we four pairs of matching initials e.g. there were two BB’s, two JS’s etc.

What are the chances of there being exactly 4 sets of matching initials in a population of 18 people?

This seemed to be quite unlikely, however when you look at the problem it is almost the same as the famous Birthday Problem – In a group of 23 (or more) randomly chosen people, there is more than 50% probability that some pair of them will both have been born on the same day. Our initial problem is similar except we have 26×26=676 possible combinations of initials instead of 365 days of the year. The same approach can be used to calculate the odds of there being a match in our company of 18 people. However I wanted to know what the chances of there being exactly 4 sets of matching initials and got stuck, at which point I sent an email around the company (a lot of engineers and scientists) and resorted to a brute force Monte Carlo model.

The approach is outlined below (John Somerville cracked the problem the same way). We generate a random number between 1 and 676, which defines the possible set of two initials, for each the 18 people. We then do a pair wise comparison to see if there is a match between people. In the example below there is a match between Person 10 and Person 3. We can then run a series of iterations and keep track of the number of times a single match, double match etc occurs.


After a run of 10,000 iterations we got the table below. There was about a one in five chance of a single match, but for four matches the probability was very low indeed about 0.01-0.03% (only ran the simulation a couple of times). Not very likely at all!


Another guy, Maccas, came up with an even better simulation that took account of the fact that not all initials are equally likely e.g. John Smith, JS, is more prevalent that the initials ZZ. Alas the file is too big to link to from here. Here is a link on Wikipedia to letter frequencies .


Closed Form Solution

Not happy with just getting the numerical output I waited for one of my more gifted colleagues to come up with a closed form solution. Dave did not disappoint and sent the following MATLAB expression

Billy,

 

It is 1 in 52047

 

C=26*26

for i=1:14  

    Pbase(i)=(C-i+1)/C;

end;

c=0;

for i1=2:15

    for i2=4:16

        for i3=6:17

            for i4=8:18

              if (i1<i2)&(i2<i3)&(i3<i4)

                       c=c+1;

                       Prob(c)=((i1-1)/C)*((i2-3)/C)*((i3-5)/C)*((i4-7)/C)*prod(Pbase);

              end;

            end

        end

    end

end

a=sum(Prob)

 

This can be written with prettier conventional symbols. The number seems higher than that suggested by the simulations.

If anyone else has a better approach, numerical or closed form, please feel free to suggest……


Conway’s Game of Life

June 28, 2008

The Game of Life is a cellular automaton devised by the British mathematician John Horton Conway in 1970. It is the best-known example of a cellular automaton – Wikipedia

The universe of the Game of Life is an infinite two-dimensional orthogonal grid of square cells, each of which is in one of two possible states, live or dead. Every cell interacts with its eight neighbours, which are the cells that are directly horizontally, vertically, or diagonally adjacent. At each step in time, the following transitions occur:

  1. Any live cell with fewer than two live neighbours dies, as if by loneliness.
  2. Any live cell with more than three live neighbours dies, as if by overcrowding.
  3. Any live cell with two or three live neighbours lives, unchanged, to the next generation.
  4. Any dead cell with exactly three live neighbours comes to life.

Gospers glider gun from Wikipedia

Dowload Excel Game of Life

The Excel document uses macros to simulate the birth and death of the cells according to the rules outlined above. There is a macro that will seed the game with a random configuration of cells. You can then increment this a single generation at a time or run ‘life’, where you will see the evolution of the cells over one hundred generations. There are also some interesting configurations stored on other worksheets. You can cut and paste these into the start generation and run life from there. There are a numbert of interesting ‘creatures’ outlined in Martn Gardiner’s Scientific American article from 1970.


Here are two screenshots of the population when life is running.


The macro also records and plots the population change over time. The plots for xbar and ybar give the average location of all the cells – when looking as single creatures such as a ‘glider’ you can see the aggregate direction they take.


St. Petersburg Paradox

June 28, 2008

Extracted from Wikipedia article – The St. Petersburg paradox is a paradox related to probability theory and decision theory. It is based on a particular (theoretical) lottery game (sometimes called St. Petersburg Lottery) that leads to a random variable with infinite expected value, i.e. infinite expected payoff, but would nevertheless be considered to be worth only a very small amount of money.

In a game of chance, you pay a fixed fee to enter, and then a fair coin will be tossed repeatedly until a tail first appears, ending the game. The pot starts at 1 dollar and is doubled every time a head appears. You win whatever is in the pot after the game ends. Thus you win 1 dollar if a tail appears on the first toss, 2 dollars if on the second, 4 dollars if on the third, 8 dollars if on the fourth, etc. In short, you win 2k−1 dollars if the coin is tossed k times until the first tail appears.

What would be a fair price to pay for entering the game? To answer this we need to consider what would be the average payout: With probability 1/2, you win 1 dollar; with probability 1/4 you win 2 dollars; with probability 1/8 you win 4 dollars etc. The expected value is thus




This sum diverges to infinity, and so the expected win for the player of this game, at least in its idealized form, in which the casino has unlimited resources, is an infinite amount of money. This means that the player should almost surely come out ahead in the long run, no matter how much they pay to enter; while a large payoff comes along very rarely, when it eventually does it will typically far more than repay however much money they have already paid to play. According to the usual treatment of deciding when it is advantageous and therefore rational to play, you should therefore play the game at any price if offered the opportunity. Yet, in published descriptions of the paradox, e.g. (Martin, 2004), many people expressed disbelief in the result. Martin quotes Ian Hacking as saying “few of us would pay even $25 to enter such a game” and says most commentators would agree.

Download the St Petersburg Paradox Excel Model


The model simulates the coin flipping and payout for a streak of heads. You can get a feel for what you would pay to take the bet; you can run it with a Monte Carlo package to get the payoff distribution. The screenshots below show three different runs where there is a streak of 4, 5 and 6 heads with a payout of $15, $31 and $63 respectively.