# Chapter 3 The Mathematics You Need to Know

## 3.1 Random variables

A *random variable* (https://en.wikipedia.org/wiki/Random_variable) is is a variable whose value is subject to variations due to chance (i.e. randomness, in a mathematical sense). As we’ll see shortly, random variables abound in daily fantasy sports.

A large part of the skill in DFS involves dealing with random variables. There isn’t room in this ebook for a complete discussion of probability theory, but there are certain parts a player absolutely must know to be successful.

### 3.1.1 Discrete random variables

There are two types of random variables, *discrete* and *continuous*. Discrete random variables usually represent one of a finite set of possibilities. For example, a roll of a pair of dice results in a total between 2 and 12.

A discrete random variable has a *probability mass function*, which specifies the probability for each of the possible outcomes. For example, for the pair of dice, the probability mass function is

probability(total = 2) = 1/36

probability(total = 3) = 2/36

probability(total = 4) = 3/36

probability(total = 5) = 4/36

probability(total = 6) = 5/36

probability(total = 7) = 6/36

probability(total = 8) = 5/36

probability(total = 9) = 4/36

probability(total = 10) = 3/36

probability(total = 11) = 2/36

probability(total = 12) = 1/36

### 3.1.2 Continuous random variables

A *continuous random variable* can take on any value, usually a real number. For example, the heights of NBA players measured in inches would specify a continuous random variable.

A continuous random variable has a probability density function. For example, the familiar standardized Gaussian “bell-shaped curve” (https://en.wikipedia.org/wiki/Normal_distribution) has the probability density function

\[N(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^{2}}{2}}\]

### 3.1.3 Random variables we see in DFS

Some random variables we see in daily fantasy sports:

The number of fantasy points a player accrues in a game (continuous)

The total fantasy points a lineup scores in a contest (continuous)

The rank of a lineup among the entries in a contest (discrete)

Whether the lineup cashed or not: 1 if it did, 0 if it didn’t (discrete)

## 3.2 The Bernoulli and binomial distributions

The last entry in the list above - whether a lineup cashed or not - is an example of a *Bernoulli* random variable (https://en.wikipedia.org/wiki/Bernoulli_distribution). A Bernoulli random variable has two possible outcomes, which in games we usually refer to as “win” and “lose”.

To make calculations easier, we’ll use “1” for win and “0” for lose. The probability of a win is usually denoted by the letter *\(p\)*. The probability of a loss is usually denoted by the letter *\(q\)*; *\(p+q=1\)* and *\(q=1-p\)*.

Bernoulli variables aren’t very interesting; we wouldn’t just enter one lineup in one contest and walk away forever. So we need a random variable that models how many times we cash over a number of contests. And that’s a *binomial* random variable (https://en.wikipedia.org/wiki/Binomial_distribution).

A binomial random variable has an underlying Bernoulli random variable with parameters *\(p\)* and \(q\). We ask the question, “If we enter *\(N\)* contests, what’s the probability that we win none, one, two, and so on up to *\(N\)*?” And that’s the probability mass function for the binomial,

If we know *\(N\)* and we know *\(p\)*, we can compute the probability of winning exactly *\(k\)* contests out of *\(N\)* tries. That probability is

\[probability(wins=k)=\binom{N}{k}p^{k}q^{N-k}\]

where \(\binom{N}{k}\) is the number of combinations of *\(N\)* things taken *\(k\)* at a time. That’s interesting, but that doesn’t solve our problem. We know *\(N\)* - how many contests we entered. And we know *\(k\)* - how many of those contests we won. But we don’t know *\(p\)*. We need to know *\(p\)* to calculate expected values.

It turns out we can estimate *\(p\)* easily. The estimate of *\(p\)* is just

\[p_{est}=\frac{k}{N}\]

and

\[q_{est}=1-p_{est}\]

So if I entered 100 50/50 contests and cashed 60 of them, the estimate of *p* is 0.6 and the estimate of *q* is 0.4.

### 3.2.1 Confidence interval for *p*

Before we move on to expectations, there’s one more tool we’ll need. It turns out that not only can we estimate *\(p\)*, we can compute a *confidence interval* for *\(p\)* (https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval).

We want to say, “there’s a 95% probability that the value of *\(p\)* is between *\(p_{lower}\)* and *\(p_{upper}\)*”. As the Wikipedia article above notes, there are a number of options for doing this and all have certain limitations. For our purposes, the simplest one that we can copy and paste into a spreadsheet will do (https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Normal_approximation_interval). In the equations, *p _{est}* is the estimate of

*p*we computed above and

*q*.

_{est}= 1 - p_{est}\[p_{lower}=p_{est}-1.96\sqrt{\frac{1}{N}\cdot p_{est}\cdot q_{est}}\] \[p_{upper}=p_{est}+1.96\sqrt{\frac{1}{N}\cdot p_{est}\cdot q_{est}}\]

## 3.3 Expectations

Now that we have an estimate and a confidence interval for \(p\), we can estimate how much we expect to win or lose per dollar of entry fees. For a $1 50/50, we pay a dollar to enter. If we win, we get $1.80 back, so we win $0.80. If we lose, we lose the dollar. The estimated expectation per dollar is

\[EV_{est}=p_{est}\cdot0.8-q_{est}\]

In general, if *\(F\)* is the entry fee in dollars and *\(C\)* is the cash paid for a win in dollars, then

The winnings

*\(W\)*per dollar is*\((C-F)/F\)*The loss

*\(L\)*per dollar is \(F/F=1\)The win/loss ratio

*\(wlratio\)*is \(\frac{W}{L}=\frac{C-F}{F}\)

and the estimated expectation \(EV_{est}\) is

\[EV_{est}=p_{est}\cdot wlratio-q_{est}\]

with confidence interval

\[EV_{lower}=p_{lower}\cdot wlratio-q_{lower}\] \[EV_{upper}=p_{upper}\cdot wlratio-q_{upper}\]

In the spreadsheets, we’ll do this calculation for *\(p_{est}\)*, *\(p_{lower}\)* and \(p_{upper}\), generating a 95% confidence interval for \(EV\).

## 3.4 Unfavorable, fair and favorable games

We say a game is *unfavorable* if its \(EV\) is less than zero. We say it’s *fair* if it’s exactly zero and *favorable* if it’s greater than zero \[</span>@Epstein2014, chapter 3<span>\]. When we say “unfavorable”, this is what we mean: if we keep playing this game against *anyone*, eventually we’ll lose all our money to them.

In DFS, if \(EV\) is less than zero, our bankroll will eventually get wiped out, because we’re playing against both the site (the rake) and the other contestants. If the game is fair - our \(EV\) is exactly zero - in theory we can keep playing indefinitely and not get wiped out, but we also won’t grow our bankroll.

*So if we want to keep playing, and make money, we need a positive expectation - a favorable game.* In DFS, the only way we can do that in the absence of overlays is to have lineups that outscore enough of our competitors to cover the rake with our winnings.

In the following, to keep the calculations simple, we are going to limit ourselves to three types of contests with simple payout structures:

FanDuel 50/50s,

DraftKings 50/50s, and

DraftKings Triple-Ups.

Note that we will *not* be dealing with head-to-head contests! Why? Two reasons:

Opponent research takes too much time, and

The sample size is too small. In a head-to-head, we only get information about how our lineup stacks up against

*one*contestant out of the thousands who enter the contests.

For a 50/50, either FanDuel or DraftKings, \(wlratio\) is 0.8. For a DraftKings Triple-Up, \(wlratio\) is 2.0. We won’t be looking at FanDuel Triple-Ups because they’re multi-entry, which makes the math more complicated.