Monty Hall -Again

February 16, 2009

If you ever want to get people of a mathematical bent shouting at each other you should try to get them to agree the solution to the Month Hall problem.

“Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?”

There are lots of ways to approach the problem; I’ve just heard a new one from Hans Christian von Baeyer’s book, Information, which should convince any die hard “it makes no difference if you stick or switch” people.

“Imagine there are not three but, but a thousand curtains, and one car. Initially you pick, say, number 815 with a resigned shrug – realising that your chances of success are one in a thousand. The host (who knows precisely where the car is) now opens 998 empty cubicles. Not the one you have picked and not cubicle number 137. Now he asks politely: ‘Do you want to stick with your first guess, curtain number 815, or switch to curtain number 137?’”. What should you do? By changing the degree it makes it a lot more intuitive.

 


Dan Dennett TED Lecture on Memes

January 4, 2009


Matching Initials

December 13, 2008

Download the Excel Spreadsheet

A while ago we noticed that our company has a surprising number of people with matching initials – whenever we were writing meeting minutes we would have to use an initial for the person’s middle name to distinguish them. Out of 18 people (as it was then) there we four pairs of matching initials e.g. there were two BB’s, two JS’s etc.

What are the chances of there being exactly 4 sets of matching initials in a population of 18 people?

This seemed to be quite unlikely, however when you look at the problem it is almost the same as the famous Birthday Problem – In a group of 23 (or more) randomly chosen people, there is more than 50% probability that some pair of them will both have been born on the same day. Our initial problem is similar except we have 26×26=676 possible combinations of initials instead of 365 days of the year. The same approach can be used to calculate the odds of there being a match in our company of 18 people. However I wanted to know what the chances of there being exactly 4 sets of matching initials and got stuck, at which point I sent an email around the company (a lot of engineers and scientists) and resorted to a brute force Monte Carlo model.

The approach is outlined below (John Somerville cracked the problem the same way). We generate a random number between 1 and 676, which defines the possible set of two initials, for each the 18 people. We then do a pair wise comparison to see if there is a match between people. In the example below there is a match between Person 10 and Person 3. We can then run a series of iterations and keep track of the number of times a single match, double match etc occurs.


After a run of 10,000 iterations we got the table below. There was about a one in five chance of a single match, but for four matches the probability was very low indeed about 0.01-0.03% (only ran the simulation a couple of times). Not very likely at all!


Another guy, Maccas, came up with an even better simulation that took account of the fact that not all initials are equally likely e.g. John Smith, JS, is more prevalent that the initials ZZ. Alas the file is too big to link to from here. Here is a link on Wikipedia to letter frequencies .


Closed Form Solution

Not happy with just getting the numerical output I waited for one of my more gifted colleagues to come up with a closed form solution. Dave did not disappoint and sent the following MATLAB expression

Billy,

 

It is 1 in 52047

 

C=26*26

for i=1:14  

    Pbase(i)=(C-i+1)/C;

end;

c=0;

for i1=2:15

    for i2=4:16

        for i3=6:17

            for i4=8:18

              if (i1<i2)&(i2<i3)&(i3<i4)

                       c=c+1;

                       Prob(c)=((i1-1)/C)*((i2-3)/C)*((i3-5)/C)*((i4-7)/C)*prod(Pbase);

              end;

            end

        end

    end

end

a=sum(Prob)

 

This can be written with prettier conventional symbols. The number seems higher than that suggested by the simulations.

If anyone else has a better approach, numerical or closed form, please feel free to suggest……