# Ballot numbers

In this post I want to present the ballot numbers, what are they? well, as you may alredy guessed, they are related to elections somehow. How exactly? that’s what we’re going to find out. Let’s start with a quick warm-up:

### Warming up

“In an election with two candidates, thousand people participate. What is the probability that the election will end in a tie?” Think about what the answer should be – maybe it’s about 1 / 1000, or maybe 1 / 10 or 1 / 10,000, or maybe even none of those? Pick some number that you believe is the answer, and we’ll see right away if you intuition is good or not.

#### Finding the solution

Ok, before we’ll solve it, let’s understand what we are asked to do. We want to find the probability of an event. What is the event? that the election will end in tie, that is, both candidates will recieve 500 (=1000 / 2) votes. To get the probability we need to find the number of possible ways for the election to end up in a tie and divide it by the number off all the possible events:

P=\frac{\#\text{ both recieved 500 votes}}{\#\text{ all of the possible events}}

Great, now that we understand what we need to do, let’s solve the problem:

• Note: we assume that every participant can only vote once. Moreover, we assume each one of the participant is making a choice randomly, so no candidate has any sort of priority.

For the elections to end in a tie we need that both candidates will recieve 500 votes. But this is equivalent to one of them recieve 500 votes (since there are total of 1000 votes) How many events are there?

So there are 1000 participants, and we need to choose 500 people from them that will vote to the same candidate – every choice yields different event. Thus the answer is:

1000\choose500

Nice, so now we only need to find out how many events are there. Well, there are 1000 participants, each one of them of them has 2 options. We can line up the people in a row and label them – “the first participant, the second participant….” Can you recognize what kind of a selection is that?

This is a situation of a choice where order matters and repitions are allowed – there are:

2^{1000}

different events. Great, now we can find the probability:

P=\frac{1000\choose500}{2^{1000}}\thickapprox 0.025=\frac{1}{40}

What about that? The odds are not that low… that’s pretty cool right?

Ok, it’s time to move to a bit more complicated question:

Now I’m asking a different question. Suppose that the results of the elections were:

501:499

(Poor loser, he almost won it!) The question is, what are the chances that the winner was strictly ahead of the loser throughout the count?

For example, if there were only 6 participants and the result was:

AABAAB

Then A was strictly ahead throughout the elections. After one vote he led 1:0, after 2, he led 2:0. After three votes, he still led with 2:1, after the forth, 3:1, then 4:1 and finally 4:2.

However, if the sequence were:

ABABAA

Then A didn’t strictly led – after two votes there was a tie.

Ok, I’ll solve this problem for the general case – participant A recieved $a$ votes, participant B recieved $b$ votes, and $a>b$.

So in total, there were $a+b$ participants, and the number of events that participant A got $a$ votes (which is the same number as the number of events that participant B got $b$ votes) is:

a+b\choose a

Let’s denote by $S$ the set of all the possible sequences where participant A strictly led. Then the probabilty is:

P=\frac{|S|}{a+b\choose a}

The only thing left to figure out is the size of the set.

### The lattice

To find the size of $S$ we will use a geometric description of the problem – (don’t worry though, I won’t ask you to calculate the area of a triangle or something like that…).

I am going to look at the lattice $\mathbb{Z}^2$, which is the collection of all the points in the plane with integer coordinates:

\mathbb{Z}^2=\{(x,y):x,y\in\mathbb{Z}\}

We are going to look at lattice paths that starts at $(0,0)$ and ends at $(a,b)$ where we can only move on step right or up at a time. If we move right, we add one for the count of A, if we go up, we add one for the count of B. For exapmle:

Here the first vote was for A, then B got one vote, after that, A got 2 in a row, then B got two in a row, and finally A got one. Hence, this path descirbes the sequence:

ABAABBA

What path are we interested in? well our paths are exactly the ones that go only through points $(x,y)$ with $x>y$, since if $x\leq y$ then in that specific position, B led or they were tied.

In other words, we are intersted in paths that are always under the line $y=x$ and they never ‘touch’ the line:

So we need to count how many paths are there such as green one.

As it turns out, it’s easier to count the number of paths such as the red one, so we’ll do that (to get the number of the desired path we just substract the number of bad paths from the total number of paths)

There are two types of bad paths:

### Type one – bad from the beginning

Those are the paths that first go up to (0,1) – That is, B got the first vote, so it is obviously a bad one.

Here’s the trick: The number of such path is exactly the number of paths from $(0,1)$ to $(a,b)$. In those paths we need to walk $a$ steps right and $b-1$ steps up. Or, in an equivalent way, we need to choose $a$ times to go right out of $a+(b-1)$ choices. Thus, the number of such paths is:

a+b-1\choose a

An important remark: Note that such path will always intersect the line $y=x$. Why? since by out assumption $a>b$, thus the point $(a,b)$ is under the line $y=x$. Remember this fact, since it’s the key for counting the second type!

### Type 2 – starts good, but turns out to be bad

Now A has the lead first, but B may take the the lead later on, or they might be tied at some point.

We are going to do another trick here – sometehing a bit more complicated than what we’ve done in the first type of bad paths. Consider this path:

This is a classic example of bad path of the second type. I am now going to consider only the part of the path from the origin until it’s first intersection with the line $y=x$ and reflect this part along the line. This action yields a new path:

Would you look at that! This action yield a path from the first type. So we’ve found a way to translate paths of the second type to paths of the first type in a unique way – different second type paths yield different first type paths.

Is there a way to reverse this process? If I’ll give you a path of the first type and I want you to translate it to path of the second type, can you do it?

Well it depeneds on a simple fact – does every line of the first type intersects the line $y=x$? If it is, then of course we can reverse the action, we just do the same action – we consider only the part of the path from the origin until it’s first intersection with the line $y=x$ and reflect this part along the line.

Luckily, every line of the first type intersects the line $y=x$. That’s what I explained in the remark!

Great so this correspondence between the first type and the second type is one-to-one and onto. That is, we match every first type path to it’s unique second type path and vice versa.

Therefore, the number of second type path is exactly the same as paths of the same type, which is:

a+b-1\choose a

So in total, we have:

{a+b-1\choose a} +{a+b-1\choose a}=2{a+b-1\choose a}

Bad paths, hence, the number of good paths is:

|S|=\#\text{ total amount of paths} - \text{\# bad paths} = {a+b\choose a}-2{a+b-1\choose a}

Let’s simplify it (recall that ${x\choose y}=\frac{x!}{y!(x-y)!}$):

|S|= {a+b\choose a}-2{a+b-1\choose a}=\frac{(a+b)!}{a!b!}-2\frac{(a+b-1)!}{a!(b-1)!}  \\
= \frac{(a+b)!}{a!b!} (1-2\cdot \frac{b}{a+b})={a+b\choose b} (\frac{a+b-2b}{a+b})=\frac{a-b}{a+b}\cdot{a+b\choose b}

Ok, this is actually looking pretty nice, and numbers of such form have a name – they are called the ballot numbers . Now let’s calculate the probability:

P=\frac{|S|}{a+b\choose a}=\frac{\frac{a-b}{a+b}\cdot{a+b\choose b} }{a+b\choose a}=\frac{a-b}{a+b}

Wow! This is such a simple expression! How great is that?

If we substitute $a=501, b=499$ we’ll get that:

P=\frac{501-499}{501+499}=\frac{2}{1000}=0.002

And that’s a pretty low chance… So even though B lost the election, he probably led or was tied with A at some point of the election, so, good for him.

## What if we allow a tie?

Well, the process is pretty similar, now instead of the requierment of $x>y$ we only require that $x+1 > y$ (which is equivalent to [$x=y$ or $x> y$])

The case is really similar, however there is only one type of bad paths since from the origin you can't go above the line $y=x+1$ after one step.

The reflection of such bad lines along the line starts at $(-1,1)$ and ends, of course, at $(a,b)$. So from $(-1,1)$ we have to move $a+1$ steps right and $b-1$ steps up. This gives us a total of $a+b$ steps where we choose $a+1$ of them to be steps to the right. Therefore, the number of bad paths is:

a+b\choose a+1

And the total number of paths stays the same as before. Thus, the size of the set of the good paths is:

|S|=\#\text{ total amount of paths} - \text{\# bad paths} = {a+b\choose a}-{a+b\choose a+1}
=\frac{(a+b)!}{a!b!}-\frac{(a+b)!}{(a+1)!(b-1)!}=\frac{(a+b)!}{a!b!}(1-\frac{b}{a+1})
=\frac{(a+b)!}{a!b!}(\frac{a+1-b}{a+1})=\frac{a+1-b}{a+1}\cdot{a+b\choose a}

So, the probabilty is:

P=\frac{\frac{a+1-b}{a+1}\cdot{a+b\choose a}}{{a+b\choose a}}=\frac{a+1-b}{a+1}

Cool. If $a=501,b=499$ we get:

P=\frac{501+1-499}{501+1}=\frac{3}{502}\thickapprox 0.006

## Summary

What do you think about this solution? I think it's a really elegant one - we represent the question in a geomteric way and find a really nice solution with almost no effort!

I just want to mention one more thing before I'll end this post: Note that if $a=b=n$ where we allow tie (of course we'll allow, the result is a tie), we'll get that the number of events where A strictly led / was tied with B is:

\frac{n+1-n}{n+1}\cdot{n+n\choose n}=\frac{1}{n+1}\cdot{2n\choose n}

This number probably won't say anything to you now, but you will not believe how useful is this number. it is called the $n^{th}$ catalan number - $C_n$.

Why exactly is it so useful? We'll find out on the next post...