# Predicting the Bitcoin Halvening - With Probabilities

Many people following Bitcoin have been discussing the upcoming *halvening* and are interested in when it may occur for financial, curiosity and various other reasons.

At the very least you might want to throw a party with other Bitcoin enthusiasts and need to know when to schedule it. In this article we look at different ways to get a handle on when it might happen.

## TL;DR

There is considerable variation in block production times and we have a long way to go but the best estimate right now seems to be 2020-05-02 (which is a Saturday) ... plus or minus ... Note: all times are in UTC. Also, do your own research.

As of this writing ...

Last Block | 589676 |

Blocks to go | 40324 |

Last Block Time | 2019-08-11T19:52:44 |

10 min block estimate | 2020-05-17T20:32:44 |

Observed mean estimate | 2020-05-02T14:13:47 |

Simulation Estimate | |

min | 2020-04-18T12:19:05 |

-2 sd | 2020-04-26T21:39:49 |

-1 sd | 2020-04-29T18:21:09 |

mean | 2020-05-02T15:02:30 |

+1 sd | 2020-05-05T11:43:51 |

+2 sd | 2020-05-08T08:25:12 |

max | 2020-05-15T20:23:02 |

Estimates are as of this writing approx Aug 11/12 2019 based on block 589676. I'll update these in a few months and see how they change.

## Background

In Proof of Work (PoW) system like Bitcoin (BTC) workers (miners) compete to earn rewards by finding a solution to a formula and having their block accepted. There is a lot of variability in how often blocks are created due to random chance when searching for a solution and the number of miners and their hashpower competing to find a solution.

The system uses difficulty adjustments every 2016 blocks (~2 weeks) in an attempt to maintain a 10 minute average interval between blocks. If blocks are being created too quickly (ie. there is a lot of hashpower) the difficulty level is adjusted upwards. If blocks are taking too long it is adjusted downwards.

Additionally, in Bitcoin the reward is reduced by half every 210,000 blocks. This is referred to colloquially as *the halvening*. The next halvening is due at block 630,000 and in the rest of this article we attempt to get an estimate on what date and time that will happen.

## First Cut

At the time of this writing we are at block number 589,676, created at 2019-08-11T19:52:44, which means we have 40,324 blocks till the next halvening. If we assume 10 minute blocks we have about 280 days (40324 x 10 / (60 x 24)) till the halvening which turns out to be 2020-05-17T20:32:44.

However, if we look at the last few months of blocks we see that blocks were produced on average every 9.45 minutes. Which means the halvening should come a little earlier than that. Doing the math we get 2020-05-02T14:13:47.

It is important to keep in mind these are just points in time. The odds of the block being created at exactly that time are vanishingly small. What they really mean is that the halvening will occur 'around' that time; maybe a little before or maybe a little after. How much before or how much after is what we'll examine in the rest of the article.

## Estimating The Block Time With An Exponential PDF

Blocks are created in an approximate 10 minutes schedule. That creation schedule can be modeled with a Poisson distribution which would give us a good way to estimate how many blocks will be created in a given time frame (say a day). The time between blocks can then be modeled with an Exponential distribution. We can use this distribution to simulate how long it would take us to create N blocks.

This figure shows a histogram of block times from the last ~2 months and an exponential distribution fit to that data. The exponential distribution is parameterized by 'lambda' ~0.00176 (or its inverse called 'scale' which is ~566).

Using this distribution we can simulate possible times till the halvening by sampling 40,324 times (block delays) and summing them up. Each time we do this simulation we get a different final halvening time since we could get lucky on one run and have a lot of very fast blocks or unlucky and have a string of slow blocks. If we repeat the simulation many times we can get an idea of the distribution of possible final outcomes.

Date | |
---|---|

min | 2020-04-27T12:39:50 |

-2 sd | 2020-04-29T23:35:21 |

-1 sd | 2020-05-01T07:00:43 |

mean | 2020-05-02T14:26:06 |

+1 sd | 2020-05-03T21:51:29 |

+2 sd | 2020-05-05T05:16:52 |

max | 2020-05-07T22:02:08 |

Using this procedure we see that the possible dates for the halvening form a normal distribution around 2020-05-02T14:26:06 which is very close to the simple mean based estimate.

Additionally, we get an estimate of other possible dates and we can look at one or two standard deviations on either side to get a ~68% confidence and ~95% confidence range. We can view our mean as an estimate where we believe there is a 50% chance the halvening will happen before that date and a 50% chance it will happen afterwards. This model also predicts a 95% chance that the halvening will happen after 2020-04-29 and before 2020-05-05.

To read off other intervals we can look at the Empirical Cumulative Distribution Function (ECDF) instead of a histogram. We can be as confident as we like at the expense of a larger range of possible dates.

But that's not all ...

## If One Is Good More Must Be Better

So, now we have a better estimate of when the halvening will happen. But we can still do 'better'. Keeping in mind that better here means understanding the range of possible dates not getting a more precise but wrong answer.

Using the block time data for the last ~2 months we previously fit a single exponential distribution and got a lambda parameter value of 0.00176. But how confident are we of that? We know that is the 'best' fit but could it reasonably have been slightly more or slightly less and how would that affect our estimate?

Well luckily we have tools to help us get a better estimate of lambda. Using Pymc3 we can build a model and fairly confident that lambda is in the range of 0.00173 and 0.0018 and normally distributed around our previous estimate.

We can now use those values to generate different exponential distributions (with parameters drawn from that normal distribution) which we can use to simulate many different scenarios to get a better understanding of the possible halvening times.

Date | |
---|---|

min | 2020-04-18T12:19:05 |

-2 sd | 2020-04-26T21:39:49 |

-1 sd | 2020-04-29T18:21:09 |

mean | 2020-05-02T15:02:30 |

+1 sd | 2020-05-05T11:43:51 |

+2 sd | 2020-05-08T08:25:12 |

max | 2020-05-15T20:23:02 |

Now we see that our estimate is still centered around 2020-05-02 but with a slightly larger standard deviation we get a wider range. Our 95% confidence range is now 2020-04-26 to 2020-05-08. Perhaps a little less satisfying in some ways but in many ways it can be better to acknowledge and embrace our uncertainty than to go forward with a precise but inaccurate estimate.

## Conclusions

I hope you found this interesting, educational and useful but remember it is not investment advice, do you own research, I'm not you lawyer, eat your vegetables, etc.

Don't hesitate to let me know if you have any comments, questions or ideas on how to improve this analysis.

Good luck with the party plans (and send me an invite :).

Thanks.