Intuitive Bayesian methods for portfolio selection – Part II Bayes and Jeffrey

April 7, 2009

Bayes’ theorem in its common form describes the way in which one’s beliefs about observing ‘A’ are updated by having observed ‘B’. Bayes’ theorem relates the conditional and marginal probabilities of events A and B, where B has a non-vanishing probability.


Each term in Bayes’ theorem has a conventional name:

P(A) is the prior probability or marginal probability of A. It is “prior” in the sense that it does not take into account any information about B.

P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is derived from or depends upon the specified value of B.

P(B|A) is the conditional probability of B given A.

P(B) is the prior or marginal probability of B, and acts as a normalizing constant.

Bayesian belief updating is the model we use for learning. We in effect already use it when we sit in meetings, discussing best options, as we will have individually modified belief over time as we receive new information – the problem is that it is difficult for others to see what evidence corroborates this belief, which opens up the door for our cognitive biases and simple heuristics.

Jeffrey’s Rule

During product selection and development we acquire and learn new information, which allows us to update our belief about how to make future investment. However, we know that some information is of a higher quality e.g. let’s say two people make exactly the same statement; one is a lead customer and the other is a stranger on the street, we know which is of a higher quality with a higher information content. Bayes’ rule relies on learning a definitive new truth to revise our belief. Most new knowledge we acquire during product development cannot be classed as definitively true e.g. one customer may say one thing and another may say something totally different. Jeffrey’ rule allows us to deal with opinion, rumor and weakly supporting evidence.

We can formulate a partition of hypothesis Ho and ~Ho

Ho = We will sell 10 products to customer x this year

We are at a trade show talking to a distributor who tells us he has heard that customer x is currently trialing our competitors products. We will call this new piece of evidence E

E = Customer x is currently trialing our competitors products

Before we had heard this we may have been quite bullish about the prospects of selling to customer x because we have had several meetings where they expressed interest and have been talking about using some demo equipment.

Pr(Ho)=0.8

However if it is true that customer x is currently trialing the competitor products then I figure that is bad news as they need to commit resource to testing and are further down the line with our competitors.

Pr(Ho/E)=0.1

If what I’ve heard is not true then I have no other reason to revise my prior belief

Pr(Ho/~E)=0.8

I represent my belief in light of the new rumor as Pr*, so that Pr*(Ho) stands for my belief in Ho in light of the new information E.

When talking to the distributor he can’t remember who he heard it from but is pretty sure that he is right. I might assign a probability that the information is right to 0.75.

Pr*(E)=0.75    Pr*(~E)=0.25

Jeffrey’s revision of Bayes’ rule is reminiscent of the rule for total probability

Pr*(Ho) =Pr(Ho/E)Pr*(E)+Pr(Ho/~E)Pr*(~E)

Jeffrey tells us to conclude that Pr*(Ho)=0.275. Before we heard the rumor we thought it was quite probable that we would sell to customer x, but things are looking a bit more bleak.

Dashboard representation

We can put together a dashboard that allows a user to start with a prior belief and update using Jeffrey’s rule. Two sliders are used to input Pr(Ho/E) and Pr*(E). The numeric inputs are augmented with descriptive labels.

Examples

If we receive a new piece of information that definitively refutes our hypothesis, but we know the source is completely unreliable then we would have no reason to update our belief e.g. if a stranger in the street says he wouldn’t buy our chemical detection equipment, this has no relevance or impact on my belief that the US Army will.

If we receive a new piece of information that we know is definitely true but is doesn’t add much to support our hypothesis then our posterior belief will be unchanged. For example, two people from one company tell me a piece of information separately. When I hear it from the first person I update my belief accordingly, when I hear it for the second time is gives me no new knowledge even though I believe the source completely.

Potential problems with the application of Jeffrey’s rule

Prior Belief

We can look at what happens if we start out with very different prior beliefs. If we are rationally updating with new evidence and agree on the impact and quality we should eventually converge on a common belief.

Evidence 

Pr(Ho/~E) 

Pr(Ho/E) 

Pr*(E) 

Updated 

0 

1.00 

0.16 

0.23 

0.81 

1 

0.81 

0.11 

0.44 

0.50 

2 

0.50 

0.64 

0.51 

0.57 

3 

0.57 

0.90

0.69 

0.80 

4 

0.80 

0.16 

0.04 

0.78 

5 

0.78 

0.74 

0.38 

0.76 

6 

0.76 

0.62 

0.40 

0.71 

Table 1 Change in belief from a starting belief of 1

Evidence 

Pr(Ho/~E) 

Pr(Ho/E) 

Pr*(E) 

Updated 

0 

0.00 

0.16 

0.23 

0.04 

1 

0.04 

0.11 

0.44 

0.07 

2 

0.07

0.64 

0.51 

0.36 

3 

0.36 

0.90 

0.69 

0.74 

4 

0.74 

0.16 

0.04 

0.72 

5 

0.72 

0.74 

0.38 

0.73 

6 

0.73 

0.62 

0.40 

0.68 

Table 2 change in belief from a starting belief of 0

The tables above and graph below illustrate the sequential application of Jeffrey’s rule. We start with differing prior beliefs and as new evidence arrives we update our belief. The dataset for Pr(Ho/E) and Pr*(E) are randomly generated number between 0 and 1. We can see that after 3-4 pieces of evidence we are starting to converge on a common belief. While not rigorous, inspection of simulated cases supports the idea that beliefs will converge irrespective of the staring belief.

Applying the principle of insufficient reason to prior belief

What happens if we start with no evidence at all for a hypothesis? We may be inclined to say that there is nothing to choose between the alternatives, true or false, so they should be treated as equally probable- this is the principle of insufficient reason or the principle of indifference. However we can look at a simple example; I state a hypothesis, “your car is red”. Initially without any evidence it doesn’t seem that the partition “your car is red” and “your car is not red” would have an equal probability.

In most business examples I can think of it is usually more likely for a specific hypothesis to be false; “this product will be successful” vs “this product will fail”. There are usually many more ways to fail than to be successful. We may be happy to assign a personal probability to the prior belief as opposed to assuming indifference. However this may allow certain hypothesis an ‘easy ride’ without forcing us to find evidence to corroborate or falsify. I prefer to operate the maxim ‘guilty until proven innocent’; assume the hypothesis is false until proven otherwise. This forces me to find evidence so I can justify my belief position – just because I think it is obvious that something is true doesn’t mean that others do. If I already have a high prior belief it should be easy for me to find the supporting evidence. This also means that I will be operating conservatively in the early stages as my belief is ‘dragged down’ by the memory of initial belief up to the point of convergence.

Order of discovery

It would also seem intuitively obvious that the order in which we uncover new evidence should make no difference to our eventual beliefs. We have generated 20 discrete pieces of evidence and updated belief at each stage. We have then reordered the evidence (re-sampling without replacement) and calculate the new belief trajectory. Interestingly we can have marked differences in belief at the end of the process. The results are presented without further discussion, but this may pose a significant problem in the application of this belief updating methodology.

The above re-sampling example assumes that we would actually assign the same ‘marginal belief change’ irrespective of the order of discovery. This may not be a valid assumption and we can look at an example from history. In 1818 Siméon Poisson deduced from Augustin Fresnel’s theory the necessity of a bright spot at the centre of the shadow of a circular opaque obstacle. With his counterintuitive result Poisson hoped to disprove the wave theory; however Dominique Arago experimentally verified the prediction and today the demonstration goes by the name “Poisson’s (or Arago’s) spot.” Since the spot occurs within the geometrical shadow, no particle theory of light could account for it, and its discovery in fact provided weighty evidence for the wave nature of light, much to Poisson’s chagrin. If I believed in the corpuscular theory of light I would be extremely surprised to see a Poisson spot. However once I have seen it and adjusted my belief accordingly, seeing it again would only have a very small impact on my belief; the new experiment contains very little information. This is the same as saying that the marginal belief change for a particular piece of evidence depends on my current belief and the history of how I arrived here. It doesn’t therefore seem valid to resample, as we deal with marginal change in belief, not absolute values as new evidence arrives.


Intuitive Bayesian methods for portfolio selection – Part I Background

April 7, 2009

Introduction

Disruptive platform technologies usually have a broad base of application. During early stage development, before there is a developed market, the selection of a particular product is usually a ‘high risk, low data’ decision. There are a large number of unknowns, both the known unknowns and the unknown unknowns; we seek the resolve these over time. In this type of situation it is difficult to make the initial portfolio selection decision and to effectively monitor the resolution of uncertainty, and determine the ultimate ‘chance of success’ for the product.

Problems in portfolio selection and project monitoring

The portfolio selection process, even when highly structured, often reduces to persuasion by advocates and champions. When a lot of data is being presented it is easy to forget ‘how we arrived’ at a particular position, assigning a higher importance to things that we heard recently (or long ago, depending on how your mind works). Soaring rhetoric can outweigh sober analysis and dispassionate appraisal of risk. It can be difficult to judge the ‘quality’ of a piece of information, which may find itself as a lynchpin in an argument to take a particular course of action. With a lot of unknowns it can be difficult to formulate go/no-go metrics and not relax the criteria when you get to the decision point.

Cognitive biases

The field of behavior economics examines some of the less rational beliefs of Homo economicus. Work by Tversky and Kahneman illustrate cases of overconfidence in our abilities, the desire to go with the herd and a propensity for rolling rationalization. Here is a list of cognitive biases that you can easily imagine arise in portfolio selection processes.

Objectives

  1. Develop a simple methodology and toolset that allows us to :-
  2. Reduce complex business decisions to specific and testable hypothesis, which can be definitively refuted.
  3. Systematically revise our ‘belief’ in a hypothesis as we receive new information.
  4. Integrate new information of many types and forms, of varying degrees of ‘quality’.
  5. Maintain a history of how we arrived at a particular belief to provide an ‘audit trial’ or ‘memory’ to support future decisions and actions.
  6. Integrate and logically connect hypothesis to create a ‘belief network’ that supports complex decision making.
  7. Avoid cognitive biases and increase objectivity

Logic and Probability

There are three main modes of argument, deduction, induction and abduction (inference to best explanation IBE). Inductive logic analyses risky arguments using probability ideas. There are however different interpretations of what ‘a probability is’.

Frequentists talk about probabilities only when dealing with experiments that are random and well-defined. The probability of a random event denotes the relative frequency of occurrence of an experiment’s outcome, when repeating the experiment. Frequentists consider probability to be the relative frequency “in the long run” of outcomes.

Bayesians, however, assign probabilities to any statement whatsoever, even when no random process is involved. Probability, for a Bayesian, is a way to represent an individual’s degree of belief in a statement, given the evidence.

Logical Probability is thought of as a logical relation between a hypothesis and the evidence for it. J.M. Keynes and Rudolf Carnap both favored a logical theory of probability. Personal probabilities are a private matter, they are up to the individual and anything goes so long as be basic rules of coherency are obeyed. Logical probability maintains that there are uniquely correct, uniquely rational judgments of the probability of a hypothesis in the light of evidence.

For the purposes of decision making in a business context there are very few cases where a Frequentists approach can be used. We tend to use the Bayesian notion of probability where belief allows us to make investment decisions.

It is plausible to connect personal degrees of belief and personal betting rates

You would not pay more than $1 to win $2 on the flip of a coin. If you have some domain specific business knowledge that allows you to exploit an opportunity, your betting rate would be markedly different from someone without that knowledge. During product development as uncertainty is resolved our beliefs are updated and we revise the level of investment we are willing to make. People have always used this ‘managerial flexibility’ and there is now a move to formalize this type of ‘real option’ thinking in investment and portfolio selection.

Verificationism and Falsifiability

There are two common problems in portfolio decision making, how do we extrapolate experience to the future? And how can we provide definitive go/no-go criteria when we do not know the problem well? The former is the problem of induction, and is the question of whether inductive reasoning leads to truth. That is, what is the justification for presupposing that a sequence of events in the future will occur as it always has in the past (for example, that the laws of physics will hold as they have always been observed to hold). If we cannot assume uniformity of nature for physical laws we definitely cannot do so in a business context where we know that the landscape changes very quickly.

Often a go/no-go criteria is framed in a way that allows it to get out of jail down the line. A criteria such as, “show interest from a customer” is quite broad. If in a month’s time if we hear a statement “Fred and Jeff seem quite interested”, this adds practically no new useful knowledge upon which to base a decision – “A difference that makes no difference is no difference”. It also allows us to introduce an ad hoc revisions to ‘pass’ the criteria. If we set criteria such as “one sale made by the end of the quarter”, then we have something that is definitively testable. This is a criterion that puts itself at risk, which can be refuted or falsified – falsification adds new knowledge as it allows us to eliminate options and make definite investment decisions i.e. don’t invest. Falsifiability was put forward as solution to the problem of induction by Karl Popper.

This is related to the Logical Positivist view of the verifiability theory of meaning: the meaning of a sentence consists in its method of verification. In other words, if a sentence or statement has no possible method of verification, it has no meaning. It is pointless to make a go/no-go goal such as, “demonstrate our value proposition and facilitate end to end knowledge transfer”, as there is no possible way to test this and it therefore falls into the category of a nonsensical statement (also known as bullshit bingo).


Game of Life Encryption

April 7, 2009

I have posted a previous model of Conway’s Game of Life that runs in Excel – click here to view. I’ve just been reading Daniel Dennet’s book, Freedom Evolves, which uses examples of cellular automata to help describe emergent behaviour. There are some interesting ideas and I’ve been having a play with the Game of Life. For any given point in time the future state is completely determined given the cells that are currently alive and the transition rules of the game. I’ve seeded life with a set of random cells and watch life run over a hundred generations; what is clear is that given the end state there are different ways in which we could have arrived there. In other words I can’t just reverse time and the transition rules to get back to the original state.

One way functions, i.e. easy in one direction and hard in the other, are used to encrypt messages (multiplication vs factoring). We could use the initial state of automata and transition rules to encrypt messages in a computationally inexpensive manner.


Monty Hall -Again

February 16, 2009

If you ever want to get people of a mathematical bent shouting at each other you should try to get them to agree the solution to the Month Hall problem.

“Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?”

There are lots of ways to approach the problem; I’ve just heard a new one from Hans Christian von Baeyer’s book, Information, which should convince any die hard “it makes no difference if you stick or switch” people.

“Imagine there are not three but, but a thousand curtains, and one car. Initially you pick, say, number 815 with a resigned shrug – realising that your chances of success are one in a thousand. The host (who knows precisely where the car is) now opens 998 empty cubicles. Not the one you have picked and not cubicle number 137. Now he asks politely: ‘Do you want to stick with your first guess, curtain number 815, or switch to curtain number 137?'”. What should you do? By changing the degree it makes it a lot more intuitive.

 


Dan Dennett TED Lecture on Memes

January 4, 2009

Matching Initials

December 13, 2008

Download the Excel Spreadsheet

A while ago we noticed that our company has a surprising number of people with matching initials – whenever we were writing meeting minutes we would have to use an initial for the person’s middle name to distinguish them. Out of 18 people (as it was then) there we four pairs of matching initials e.g. there were two BB’s, two JS’s etc.

What are the chances of there being exactly 4 sets of matching initials in a population of 18 people?

This seemed to be quite unlikely, however when you look at the problem it is almost the same as the famous Birthday Problem – In a group of 23 (or more) randomly chosen people, there is more than 50% probability that some pair of them will both have been born on the same day. Our initial problem is similar except we have 26×26=676 possible combinations of initials instead of 365 days of the year. The same approach can be used to calculate the odds of there being a match in our company of 18 people. However I wanted to know what the chances of there being exactly 4 sets of matching initials and got stuck, at which point I sent an email around the company (a lot of engineers and scientists) and resorted to a brute force Monte Carlo model.

The approach is outlined below (John Somerville cracked the problem the same way). We generate a random number between 1 and 676, which defines the possible set of two initials, for each the 18 people. We then do a pair wise comparison to see if there is a match between people. In the example below there is a match between Person 10 and Person 3. We can then run a series of iterations and keep track of the number of times a single match, double match etc occurs.


After a run of 10,000 iterations we got the table below. There was about a one in five chance of a single match, but for four matches the probability was very low indeed about 0.01-0.03% (only ran the simulation a couple of times). Not very likely at all!


Another guy, Maccas, came up with an even better simulation that took account of the fact that not all initials are equally likely e.g. John Smith, JS, is more prevalent that the initials ZZ. Alas the file is too big to link to from here. Here is a link on Wikipedia to letter frequencies .


Closed Form Solution

Not happy with just getting the numerical output I waited for one of my more gifted colleagues to come up with a closed form solution. Dave did not disappoint and sent the following MATLAB expression

Billy,

 

It is 1 in 52047

 

C=26*26

for i=1:14  

    Pbase(i)=(C-i+1)/C;

end;

c=0;

for i1=2:15

    for i2=4:16

        for i3=6:17

            for i4=8:18

              if (i1<i2)&(i2<i3)&(i3<i4)

                       c=c+1;

                       Prob(c)=((i1-1)/C)*((i2-3)/C)*((i3-5)/C)*((i4-7)/C)*prod(Pbase);

              end;

            end

        end

    end

end

a=sum(Prob)

 

This can be written with prettier conventional symbols. The number seems higher than that suggested by the simulations.

If anyone else has a better approach, numerical or closed form, please feel free to suggest……


A Collection of Random Clippings

December 7, 2008

“I pity Simplicio no less than I should some gentleman, who, having built a magnificent palace at great trouble and expense, employing myriads of artisans, and the seeing it threatened with ruin because of poor foundations, should attempt, in order to avoid the sad sight of walls destroyed, adorned as they were with so many lovely murals; or columns fall, which support the superb galleries, or gilded beams collapse, or doors, pediments and marble cornices, supplied at so much cost, spoiled – should attempt to prevent the collapse with chains, props, buttresses, iron bars and shores”. – Galileo’s Dialogue

“That is why, as soon as I reached an age that allowed me to escape the control of my teachers, I abandoned altogether the study of letters. And having decided to pursue only that knowledge which I might find in myself or in the great book of the world, I spent the rest of my youth travelling, visiting courts and armies, mixing with people of different character and rank, accumulating different experiences, putting myself to the test in situations in which I found myself by chance, and at all times giving due reflection to things as they presented themselves to me so as to derive some benefit from them. For it seemed to me that I could discover much more truth from the reasoning that we all make about things that affect us and will soon cause us harm if we misjudge them, than from the speculations in which a scholar engages in the privacy of his study, that have no consequence for him insofar as the further they are from common sense, the more he will be proud of them, because he has had to use so much more ingenuity and subtlety in the struggle to make them plausible”. – Descartes A Discourse on the Method

“And although logic really does contain many very true and excellent precepts, there are some many others mixed in with them that are either harmful or superfluous, that it is almost as difficult to separate the former from the latter as it is to extract a statue of Diana or Minerva from a rough block of marble”. – Descartes A Discourse on the Method

“Never accept anything to be true that I did not incontrovertibly know to be so; that is to say, carefully to avoid both prejudice and premature conclusions; and to include nothing in my judgements other than that which presented itself to my mind so clearly and distinctly, that I would have no occasion to doubt it. The second was to divide all the difficulties under examination into as many parts as possible, and as many as were required to solve them in the best way. The third was to conduct my thoughts in a given order, beginning with the simplest and most easily understood objects, and gradually ascending, as it were step by step, to the knowledge of the most complex; and positing an order even on those which do not have a natural order of precedence. The last was to undertake such enumerations and such general surveys that I would be sure to have left nothing out”. Descartes A Discourse on the Method

“I imitated those travellers who, finding themselves lost in a forest, must not wander in circles first to one side and then the other, and still less stop in one place, but have to walk as straight as possible in one direction, and not alter course for weak reasons, even if it might only have been chance which lead them to settle on the direction they had chosen; for by this means, even if they do not end up precisely where they want to be, they will eventually reach somewhere where they will most likely be better off than in the middle of a forest”. Descartes
A Discourse on the Method