Intuitive Bayesian methods for portfolio selection – Part II Bayes and Jeffrey

Bayes’ theorem in its common form describes the way in which one’s beliefs about observing ‘A’ are updated by having observed ‘B’. Bayes’ theorem relates the conditional and marginal probabilities of events A and B, where B has a non-vanishing probability.


Each term in Bayes’ theorem has a conventional name:

P(A) is the prior probability or marginal probability of A. It is “prior” in the sense that it does not take into account any information about B.

P(A|B) is the conditional probability of A, given B. It is also called the posterior probability because it is derived from or depends upon the specified value of B.

P(B|A) is the conditional probability of B given A.

P(B) is the prior or marginal probability of B, and acts as a normalizing constant.

Bayesian belief updating is the model we use for learning. We in effect already use it when we sit in meetings, discussing best options, as we will have individually modified belief over time as we receive new information – the problem is that it is difficult for others to see what evidence corroborates this belief, which opens up the door for our cognitive biases and simple heuristics.

Jeffrey’s Rule

During product selection and development we acquire and learn new information, which allows us to update our belief about how to make future investment. However, we know that some information is of a higher quality e.g. let’s say two people make exactly the same statement; one is a lead customer and the other is a stranger on the street, we know which is of a higher quality with a higher information content. Bayes’ rule relies on learning a definitive new truth to revise our belief. Most new knowledge we acquire during product development cannot be classed as definitively true e.g. one customer may say one thing and another may say something totally different. Jeffrey’ rule allows us to deal with opinion, rumor and weakly supporting evidence.

We can formulate a partition of hypothesis Ho and ~Ho

Ho = We will sell 10 products to customer x this year

We are at a trade show talking to a distributor who tells us he has heard that customer x is currently trialing our competitors products. We will call this new piece of evidence E

E = Customer x is currently trialing our competitors products

Before we had heard this we may have been quite bullish about the prospects of selling to customer x because we have had several meetings where they expressed interest and have been talking about using some demo equipment.

Pr(Ho)=0.8

However if it is true that customer x is currently trialing the competitor products then I figure that is bad news as they need to commit resource to testing and are further down the line with our competitors.

Pr(Ho/E)=0.1

If what I’ve heard is not true then I have no other reason to revise my prior belief

Pr(Ho/~E)=0.8

I represent my belief in light of the new rumor as Pr*, so that Pr*(Ho) stands for my belief in Ho in light of the new information E.

When talking to the distributor he can’t remember who he heard it from but is pretty sure that he is right. I might assign a probability that the information is right to 0.75.

Pr*(E)=0.75    Pr*(~E)=0.25

Jeffrey’s revision of Bayes’ rule is reminiscent of the rule for total probability

Pr*(Ho) =Pr(Ho/E)Pr*(E)+Pr(Ho/~E)Pr*(~E)

Jeffrey tells us to conclude that Pr*(Ho)=0.275. Before we heard the rumor we thought it was quite probable that we would sell to customer x, but things are looking a bit more bleak.

Dashboard representation

We can put together a dashboard that allows a user to start with a prior belief and update using Jeffrey’s rule. Two sliders are used to input Pr(Ho/E) and Pr*(E). The numeric inputs are augmented with descriptive labels.

Examples

If we receive a new piece of information that definitively refutes our hypothesis, but we know the source is completely unreliable then we would have no reason to update our belief e.g. if a stranger in the street says he wouldn’t buy our chemical detection equipment, this has no relevance or impact on my belief that the US Army will.

If we receive a new piece of information that we know is definitely true but is doesn’t add much to support our hypothesis then our posterior belief will be unchanged. For example, two people from one company tell me a piece of information separately. When I hear it from the first person I update my belief accordingly, when I hear it for the second time is gives me no new knowledge even though I believe the source completely.

Potential problems with the application of Jeffrey’s rule

Prior Belief

We can look at what happens if we start out with very different prior beliefs. If we are rationally updating with new evidence and agree on the impact and quality we should eventually converge on a common belief.

Evidence 

Pr(Ho/~E) 

Pr(Ho/E) 

Pr*(E) 

Updated 

0 

1.00 

0.16 

0.23 

0.81 

1 

0.81 

0.11 

0.44 

0.50 

2 

0.50 

0.64 

0.51 

0.57 

3 

0.57 

0.90

0.69 

0.80 

4 

0.80 

0.16 

0.04 

0.78 

5 

0.78 

0.74 

0.38 

0.76 

6 

0.76 

0.62 

0.40 

0.71 

Table 1 Change in belief from a starting belief of 1

Evidence 

Pr(Ho/~E) 

Pr(Ho/E) 

Pr*(E) 

Updated 

0 

0.00 

0.16 

0.23 

0.04 

1 

0.04 

0.11 

0.44 

0.07 

2 

0.07

0.64 

0.51 

0.36 

3 

0.36 

0.90 

0.69 

0.74 

4 

0.74 

0.16 

0.04 

0.72 

5 

0.72 

0.74 

0.38 

0.73 

6 

0.73 

0.62 

0.40 

0.68 

Table 2 change in belief from a starting belief of 0

The tables above and graph below illustrate the sequential application of Jeffrey’s rule. We start with differing prior beliefs and as new evidence arrives we update our belief. The dataset for Pr(Ho/E) and Pr*(E) are randomly generated number between 0 and 1. We can see that after 3-4 pieces of evidence we are starting to converge on a common belief. While not rigorous, inspection of simulated cases supports the idea that beliefs will converge irrespective of the staring belief.

Applying the principle of insufficient reason to prior belief

What happens if we start with no evidence at all for a hypothesis? We may be inclined to say that there is nothing to choose between the alternatives, true or false, so they should be treated as equally probable- this is the principle of insufficient reason or the principle of indifference. However we can look at a simple example; I state a hypothesis, “your car is red”. Initially without any evidence it doesn’t seem that the partition “your car is red” and “your car is not red” would have an equal probability.

In most business examples I can think of it is usually more likely for a specific hypothesis to be false; “this product will be successful” vs “this product will fail”. There are usually many more ways to fail than to be successful. We may be happy to assign a personal probability to the prior belief as opposed to assuming indifference. However this may allow certain hypothesis an ‘easy ride’ without forcing us to find evidence to corroborate or falsify. I prefer to operate the maxim ‘guilty until proven innocent’; assume the hypothesis is false until proven otherwise. This forces me to find evidence so I can justify my belief position – just because I think it is obvious that something is true doesn’t mean that others do. If I already have a high prior belief it should be easy for me to find the supporting evidence. This also means that I will be operating conservatively in the early stages as my belief is ‘dragged down’ by the memory of initial belief up to the point of convergence.

Order of discovery

It would also seem intuitively obvious that the order in which we uncover new evidence should make no difference to our eventual beliefs. We have generated 20 discrete pieces of evidence and updated belief at each stage. We have then reordered the evidence (re-sampling without replacement) and calculate the new belief trajectory. Interestingly we can have marked differences in belief at the end of the process. The results are presented without further discussion, but this may pose a significant problem in the application of this belief updating methodology.

The above re-sampling example assumes that we would actually assign the same ‘marginal belief change’ irrespective of the order of discovery. This may not be a valid assumption and we can look at an example from history. In 1818 Siméon Poisson deduced from Augustin Fresnel’s theory the necessity of a bright spot at the centre of the shadow of a circular opaque obstacle. With his counterintuitive result Poisson hoped to disprove the wave theory; however Dominique Arago experimentally verified the prediction and today the demonstration goes by the name “Poisson’s (or Arago’s) spot.” Since the spot occurs within the geometrical shadow, no particle theory of light could account for it, and its discovery in fact provided weighty evidence for the wave nature of light, much to Poisson’s chagrin. If I believed in the corpuscular theory of light I would be extremely surprised to see a Poisson spot. However once I have seen it and adjusted my belief accordingly, seeing it again would only have a very small impact on my belief; the new experiment contains very little information. This is the same as saying that the marginal belief change for a particular piece of evidence depends on my current belief and the history of how I arrived here. It doesn’t therefore seem valid to resample, as we deal with marginal change in belief, not absolute values as new evidence arrives.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: