# Chapter 3 The Mathematics You Need to Know

## 3.1 Random variables

A *random variable* (https://en.wikipedia.org/wiki/Random_variable) is
is a variable whose value is subject to variations due to chance
(i.e. randomness, in a mathematical sense). As we’ll see shortly, random
variables abound in daily fantasy sports.

A large part of the skill in DFS involves dealing with random variables. There isn’t room in this ebook for a complete discussion of probability theory, but there are certain parts a player absolutely must know to be successful.

### 3.1.1 Discrete random variables

There are two types of random variables, *discrete* and *continuous*.
Discrete random variables usually represent one of a finite set of
possibilities. For example, a roll of a pair of dice results in a total
between 2 and 12.

A discrete random variable has a *probability mass function*, which
specifies the probability for each of the possible outcomes. For
example, for the pair of dice, the probability mass function is

probability(total = 2) = 1/36

probability(total = 3) = 2/36

probability(total = 4) = 3/36

probability(total = 5) = 4/36

probability(total = 6) = 5/36

probability(total = 7) = 6/36

probability(total = 8) = 5/36

probability(total = 9) = 4/36

probability(total = 10) = 3/36

probability(total = 11) = 2/36

probability(total = 12) = 1/36

### 3.1.2 Continuous random variables

A *continuous random variable* can take on any value, usually a real
number. For example, the heights of NBA players measured in inches would
specify a continuous random variable.

A continuous random variable has a probability density function. For example, the familiar standardized Gaussian “bell-shaped curve” (https://en.wikipedia.org/wiki/Normal_distribution) has the probability density function

\[N(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^{2}}{2}}\]

### 3.1.3 Random variables we see in DFS

Some random variables we see in daily fantasy sports:

The number of fantasy points a player accrues in a game (continuous)

The total fantasy points a lineup scores in a contest (continuous)

The rank of a lineup among the entries in a contest (discrete)

Whether the lineup cashed or not: 1 if it did, 0 if it didn’t (discrete)

## 3.2 The Bernoulli and binomial distributions

The last entry in the list above - whether a lineup cashed or not - is
an example of a *Bernoulli* random variable
(https://en.wikipedia.org/wiki/Bernoulli_distribution). A Bernoulli
random variable has two possible outcomes, which in games we usually
refer to as “win” and “lose”.

To make calculations easier, we’ll use “1” for win and “0” for lose. The
probability of a win is usually denoted by the letter *\(p\)*. The
probability of a loss is usually denoted by the letter *\(q\)*; *\(p+q=1\)*
and *\(q=1-p\)*.

Bernoulli variables aren’t very interesting; we wouldn’t just enter one
lineup in one contest and walk away forever. So we need a random
variable that models how many times we cash over a number of contests.
And that’s a *binomial* random variable
(https://en.wikipedia.org/wiki/Binomial_distribution).

A binomial random variable has an underlying Bernoulli random variable
with parameters *\(p\)* and \(q\). We ask the question, “If we enter *\(N\)*
contests, what’s the probability that we win none, one, two, and so on
up to *\(N\)*?” And that’s the probability mass function for the binomial,

If we know *\(N\)* and we know *\(p\)*, we can compute the probability of
winning exactly *\(k\)* contests out of *\(N\)* tries. That probability is

\[probability(wins=k)=\binom{N}{k}p^{k}q^{N-k}\]

where \(\binom{N}{k}\) is the number of combinations of *\(N\)* things taken
*\(k\)* at a time. That’s interesting, but that doesn’t solve our problem.
We know *\(N\)* - how many contests we entered. And we know *\(k\)* - how
many of those contests we won. But we don’t know *\(p\)*. We need to know
*\(p\)* to calculate expected values.

It turns out we can estimate *\(p\)* easily. The estimate of *\(p\)* is just

\[p_{est}=\frac{k}{N}\]

and

\[q_{est}=1-p_{est}\]

So if I entered 100 50/50 contests and cashed 60 of them, the estimate
of *p* is 0.6 and the estimate of *q* is 0.4.

### 3.2.1 Confidence interval for *p*

Before we move on to expectations, there’s one more tool we’ll need. It
turns out that not only can we estimate *\(p\)*, we can compute a
*confidence interval* for *\(p\)*
(https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval).

We want to say, “there’s a 95% probability that the value of *\(p\)* is
between *\(p_{lower}\)* and *\(p_{upper}\)*”. As the Wikipedia article above
notes, there are a number of options for doing this and all have certain
limitations. For our purposes, the simplest one that we can copy and
paste into a spreadsheet will do
(https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Normal_approximation_interval).
In the equations, *p _{est}* is the estimate of

*p*we computed above and

*q*.

_{est}= 1 - p_{est}\[p_{lower}=p_{est}-1.96\sqrt{\frac{1}{N}\cdot p_{est}\cdot q_{est}}\] \[p_{upper}=p_{est}+1.96\sqrt{\frac{1}{N}\cdot p_{est}\cdot q_{est}}\]

## 3.3 Expectations

Now that we have an estimate and a confidence interval for \(p\), we can estimate how much we expect to win or lose per dollar of entry fees. For a $1 50/50, we pay a dollar to enter. If we win, we get $1.80 back, so we win $0.80. If we lose, we lose the dollar. The estimated expectation per dollar is

\[EV_{est}=p_{est}\cdot0.8-q_{est}\]

In general, if *\(F\)* is the entry fee in dollars and *\(C\)* is the cash
paid for a win in dollars, then

The winnings

*\(W\)*per dollar is*\((C-F)/F\)*The loss

*\(L\)*per dollar is \(F/F=1\)The win/loss ratio

*\(wlratio\)*is \(\frac{W}{L}=\frac{C-F}{F}\)

and the estimated expectation \(EV_{est}\) is

\[EV_{est}=p_{est}\cdot wlratio-q_{est}\]

with confidence interval

\[EV_{lower}=p_{lower}\cdot wlratio-q_{lower}\] \[EV_{upper}=p_{upper}\cdot wlratio-q_{upper}\]

In the spreadsheets, we’ll do this calculation for *\(p_{est}\)*,
*\(p_{lower}\)* and \(p_{upper}\), generating a 95% confidence interval for
\(EV\).

## 3.4 Unfavorable, fair and favorable games

We say a game is *unfavorable* if its \(EV\) is less than zero. We say
it’s *fair* if it’s exactly zero and *favorable* if it’s greater than
zero (Epstein 2014, chap. 3). When we say
“unfavorable”, this is what we mean: if we keep playing this game
against *anyone*, eventually we’ll lose all our money to them.

In DFS, if \(EV\) is less than zero, our bankroll will eventually get wiped out, because we’re playing against both the site (the rake) and the other contestants. If the game is fair - our \(EV\) is exactly zero - in theory we can keep playing indefinitely and not get wiped out, but we also won’t grow our bankroll.

*So if we want to keep playing, and make money, we need a positive
expectation - a favorable game.* In DFS, the only way we can do that in
the absence of overlays is to have lineups that outscore enough of our
competitors to cover the rake with our winnings.

In the following, to keep the calculations simple, we are going to limit ourselves to three types of contests with simple payout structures:

FanDuel 50/50s,

DraftKings 50/50s, and

DraftKings Triple-Ups.

Note that we will *not* be dealing with head-to-head contests! Why? Two
reasons:

Opponent research takes too much time, and

The sample size is too small. In a head-to-head, we only get information about how our lineup stacks up against

*one*contestant out of the thousands who enter the contests.

For a 50/50, either FanDuel or DraftKings, \(wlratio\) is 0.8. For a DraftKings Triple-Up, \(wlratio\) is 2.0. We won’t be looking at FanDuel Triple-Ups because they’re multi-entry, which makes the math more complicated.

### References

Epstein, R.A. 2014. *The Theory of Gambling and Statistical Logic, Revised Edition*. Elsevier Science.