While the time to the UK general election is roughly the same as that to the end of the football season (see here), there are considerable differences in the type of data presented on which to make predictions. The biggest difference is that for the football league tables, half of the games have already been played – and those points are ‘in the bag’. For the election, nothing has been decided. However, a good place to start the predictions is with opinion polls. The opinion polls at the start of January (Guardian/ICM – 17th Dec) gave the following.

Conservatives – 28%

Labour – 33%

Lib Dem – 14%

UKIP – 14%

Green 5%

Others 6%

The question I want answered is: What is the likelihood of the conservatives getting the most number of votes at the election. This is a very different question from who will win the most seats, who the prime minister will be and so on. But again, as with the football league the conversion of the data above to answer the question I want to ask isn’t straightforward. Here’s how I approached it.

Typically, there is a margin of error of 3% in a typical opinion poll (based only on the sample size of people tested). The margin of error is calculated as follows:

1.96 * (√((p.q)/n))

The important bit here being the 1.96, indicating in fact it is 95% likely (given a normal distribution) that the real margin of error falls in the limit given. However, assuming the normal distribution (and in the absence of other information), this means we can simulate scenarios based the standard deviation of the error being 3/1.96.

So, to simulate the actual percentage of votes cast for any party, we calculate a random number from a normal distribution with the mean as the value above for each party, and the SD as 3/1.96.

If we replicate this a lot of times (10,000) then we can calculate what we are in fact interested in -the likelihood of the conservatives getting the most votes. This means their percentage of the vote must be higher than labour’s.

The simulation gives a probability of 2.73% for the conservatives getting the most votes, compared to labour’s 97.27. Given the percentage in the opinion polls, no other parties stand a chance.

However, the total number of votes must add to 100%, and the current simulation does not account for this. If we enforce this addition to 100% – basically each party’s percentage is out by a certain margin of error, and we will only count replicate runs where the total for all parties sums to 100, then we get a slightly different result – a 3.39% chance of getting the most votes, compared to 96.61 for Labour. The difference is still there on multiple runs of this simulation, probably because it is more likely that a margin of error in favour of the conservatives would be offset by a margin of error against Labour (more likely, not definite).

So, there we have it – values from the quantitative data which I can begin to use. There are a number of factors which need to be taken into account to give me the best possible prediction, and given that manifestos are not written yet, expert opinion may be very important here. How these get combined will be the subject of further posts.

R code for these simulations is given here.