# Kirchhoff’s theorem

In the last post we’ve seen that the number of spanning trees in the complete graph $K_n$ is $n^{n-2}$. That’s a great result, however, I am not satisfied yet…

So we know how many spanning trees $K_n$ has – big deal! Why restrict ourseleves only to $K_n$? wouldn’t it be better to find the number of spanning trees of any graph? Of course it will, and that’s what we will do today.

With the help of Kirchhoff’s theorem, by the end of the post, we will be able to find the number of spanning trees in any graph.

This is not going to be so easy though… Linear algebra is going to be involved here! However, I find the proof that I am going to present here marvelous, and shows how such an unexpected things (such as *spoiler*: determinants) are going to be exactly what we were looking for.

So let’s not spend any more time and dive right in:

## Incidence Matrix & The Laplacian

If you recall, in the second post where I loaded tons of definitions, one of the definition of an incidence matrix. I have defined the incidence matrix $M(G)$ of a graph, as a $V\times E$ matrix where $V$ is the number of vertices and $E$ is the number of edges.

It was defined as:

M_{v,e}=\begin{cases}
2 & e\text{\text{ is a loop incident to }} v\\
1 & e\text{ is incident to }v\text{ but not a loop}\\
0 & \text{else}
\end{cases}

I am going to deal only with simple graphs (graphs with no loops), therefore, we can narrow down the options to get:

M_{v,e}=\begin{cases}

1 & e\text{ is incident to }v\\
0 & \text{else}
\end{cases}

Great, I am now going to generalize the definition since I will deal with a directed graph. . The new definition is:

M_{v,e}=\begin{cases}

1 & e\text{ leaves }v\\
-1 & e\text{ enters }v\\
0 & \text{else}
\end{cases}

Let’s see an example:

For this graph, the corresponding incidence matrix is:

\begin{array}{c}
v_{1}\\
v_{2}\\
v_{3}\\
v_{4}\\
v_{5}
\end{array}\overset{\begin{array}{cccc}
e_{1} & e_{2} & e_{3} & e_{4}\end{array}}{\begin{pmatrix}1 & 0 & 0 & 0\\
1 & 1 & 0 & 0\\
0 & 1 & 1 & 1\\
0 & 0 & 0 & 1\\
0 & 0 & 1 & 0
\end{pmatrix}}

Now Let’s add some directions:

Cool, now the incidence matrix is:

\begin{array}{c}
v_{1}\\
v_{2}\\
v_{3}\\
v_{4}\\
v_{5}
\end{array}\overset{\begin{array}{cccc}
e_{1}\  & \ e_{2}  & \ \ \ \ e_{3} & \ \ e_{4}\end{array}}{\begin{pmatrix}1 & 0 & 0 & 0\\
-1 & 1 & 0 & 0\\
0 & -1 & -1 & 1\\
0 & 0 & 0 & -1\\
0 & 0 & 1 & 0
\end{pmatrix}}

(Notice how the sum of each column is zero).

#### Meet the Laplacian

There is something that kind of bugs me in this matrix – it is not a square matrix, however, we can easilly create one from it, we can define:

C=M\cdot M^T \in M_{V\times V}(F)

Not only this matrix is squared, it is also symmetric. Let’s try to calculate it’s entries:

C_{vv}= (M\cdot M^T)_{vv}=\sum_{e}M_{ve}M^T_{ev}

This is just the definition of a matrix multiplication (recall that the columns represent the edges of the graph). Now what is $M^T_{ev}$? This is exactly $M_{ev}$. Therefore we get:

C_{vv}=\sum_{e}M_{ve}\cdot M_{ve}=\sum_e(M_{ve})^2

Now there are two cases:

• $e$ enters / leaves $v$, thus, $M_{ve}=\pm 1$ which implies that $(M_{ve})^2=1$.
• Otherwise, $M_{ve}=0$.

Notice what we have just found out: Every edge $e$ that is incident to $v$ increases $C_{vv}$ by one. Therefore, $C_{vv}$ is exactly the degree of $v$.

That’s really nice! Let’s see the general case now:

C_{vw}= (M\cdot M^T)_{vw}=\sum_{e}M_{ve}M^T_{ew}=\sum_{e}M_{ve}M_{we}

If $e$ is incident to both $v$ and $w$, then $M_{ve}=1,M_{we}=-1$ or $M_{ve}=-1,M_{we}=1$. Either way, their product is $-1$. Since the graph is simple, only one edge $e$ can be incident to both $v$ and $u$ – so we get that $C_{vw} = -1$.

On the other hand, $e$ is not incident to $v$ (then $M_{ve}=0)$) or it is not incident to $w$ (then $M_{ve}=0)$). Either way, their product is $0$.

Let’s summarize what we have got so far:

C_{vw}=\begin{cases}
\deg(v) & v=w\\
-1 & v\text{ is adjacent to}\\
0 & \text{else}
\end{cases}w

This matrix has a special name – It is called The Laplacian of $G$, and denoted by $L(G)$.

For example, in our graph:

The laplacian is:

L=\begin{array}{c}
v_{1}\\
v_{2}\\
v_{3}\\
v_{4}\\
v_{5}
\end{array}\overset{\begin{array}{ccccc}
v_{1}\ \ \  & v_{2}\ \ \  & v_{3}\ \ \  & v_{4}\ \  & v_{5}\end{array}}{\left(\begin{array}{ccccc}
1 & -1 & 0 & 0 & 0\\
-1 & 2 & -1 & 0 & 0\\
0 & -1 & 3 & -1 & -1\\
0 & 0 & -1 & 1 & 0\\
0 & 0 & -1 & 0 & 1
\end{array}\right)}

Great, we are now ready to present the theorem:

## The theorem

Suppose that $G$ is a conncected graph with $n$ vertices, then the number of spanning trees of G is:

\det (L(G)^{(i)})

For every $1\leq i \leq n$. Where $L(G)^{(i)}$ is a $n-1 \times n-1$ matrix, that obtained from deleting the $i^{th}$ row and the $i^{th}$ column from $L(G)$.

Pause here for a second to process what’s going on here. This is really, really not-trivial! What does a determinant has to do with spanning trees???

To summarize it before I’ll start proving the theorem – I’ll just say that I didn’t see it coming at all!

## The proof

The proof is going to be in parts, first I need to prove a theorem from linear algebra that is not so popular – most of the students won’t see it in a linear algebra class…

Then The proof statrs, I am going to use the theorem from linear algebra right away and then, magic happens!

Let’s do it:

### Cauchy–Binet formula

Suppose that $P\in M_{n\times m}, Q\in M_{m\times n}$. Thier product, $PQ\in M_{n\times n}$ is a square matrix, thus, it has a detrminant.

Cauchy–Binet formula states that:

\det(PQ)=\sum_{1\leq j_1<\dots< j_n\leq m}\det(P_{j_1j_2\dots j_n})\cdot\det(Q_{j_1j_2\dots j_n})

Wow, hold up for a second, this seems really complicated. Let’s explain it step-by-step.

The matrix $P_{j_1j_2\dots j_n}$ is just the $n\times n$ matrix:

\left(\begin{array}{cccc}
| & | &  & |\\
C_{j_{1}}(P) & C_{j_{2}}(P) & \dots & C_{j_{n}}(P)\\
| & | &  & |
\end{array}\right)

( $C_i(P)$ is the $i^{th}$ column of $P$)

Moreover, $Q_{j_1j_2\dots j_n}$is just the $n\times n$ matrix:

\left(\begin{array}{ccc}
- & R_{j_{1}}(Q) & -\\
- & R_{j_{2}}(Q) & -\\
& \vdots\\
- & R_{j_{n}}(Q) & -
\end{array}\right)

( $R_i(Q)$ is the $i^{th}$ row of $Q$)

What this formula says, is that in order to calculate the dertminant of $PQ$ you have to go over all the ways to pick $n$ ordered, different columns – $j_1,j_2,\dots j_k$, calculate the determinant of the sub-matrices $P_{j_1j_2\dots j_n},Q_{j_1j_2\dots j_n}$. Then, calculate thier product, and sum everything up.

Notice that if $n=m$ then there is only one way to choose $n$ ordered, different columns: $j_1=1,j_2=2\dots$ moreover, $P_{123...n}=P,Q_{123...n}=Q$. Thus we get the familiar identity:

\det(PQ)=\det (P)\cdot \det(Q)

Ok, let’s prove it – If you feel like the proof is getting kind of heavy, you can skip it and move on to the proof of the main theorem. You will still be able to understand the proof (you just won’t be able to justify the use of this formula) :

#### Proving the formula

By the definition of a determinant:

\det(PQ)=\sum_{\sigma\in S_n}\text{sgn}(\sigma)\prod_{i=1}^n(PQ)_{i\sigma(i)}

Using matrix multiplication to get:

\sum_{\sigma\in S_n}\text{sgn}(\sigma)\prod_{i=1}^n(PQ)_{i\sigma(i)}=\sum_{\sigma\in S_n}\text{sgn}(\sigma)\prod_{i=1}^n(\sum_{j=1}^m P_{ij}\cdot Q_{j\sigma(i)})

Let’s focus on the product:

\prod_{i=1}^n(\sum_{j=1}^m P_{ij}\cdot Q_{j\sigma(i)})=(\sum_{j=1}^m P_{1j}\cdot Q_{j\sigma(1)})\cdot(\sum_{j=1}^m P_{2j}\cdot Q_{j\sigma(2)})\cdots(\sum_{j=1}^m P_{nj}\cdot Q_{j\sigma(n)})

The product has $n$ factors. Each factor is made up of a sum of $m$ elements. If we will simplify the product, we will get a sum made of expressions of the form:

P_{1j_1}Q_{j_1\sigma(1)}\cdot P_{2j_2}Q_{j_2\sigma(2)}\cdots P_{nj_n}Q_{j_n\sigma(n)}

Where $1\leq j_1,\dots, j_n \leq m$ (they can be the same).

##### Quick example

For example, if $m=3$, $n=2$, $j_1=3,j_2=1$ then and $\sigma(1)=2,\sigma(2)=1$:

P_{1j_1}Q_{j_1\sigma(1)}\cdot P_{2j_2}Q_{j_2\sigma(2)}=P_{13}Q_{32}\cdot P_{22}Q_{21}

Is an element of the sum.

##### Back to the proof

From every element of the sum I can ‘pick’ whatever $1\leq j\leq m$ that I want. Therefore, the product is actually a sum, made from all the possible expressions $P_{1j_1}Q_{j_1\sigma(1)}\cdot P_{2j_2}Q_{j_2\sigma(2)}\cdots P_{nj_n}Q_{j_n\sigma(n)}$. We now have:

\prod_{i=1}^n(\sum_{j=1}^m P_{ij}\cdot Q_{j\sigma(i)})=\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}Q_{j_1\sigma(1)}\cdot P_{2j_2}Q_{j_2\sigma(2)}\cdots P_{nj_n}Q_{j_n\sigma(n)}
=\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}P_{2j_2}\cdots P_{nj_n}\ \cdot \ Q_{j_1\sigma(1)}Q_{j_2\sigma(2)}\cdots Q_{j_n\sigma(n)}

It may seem scary and all, but if you understand it – you realize that this is actually quite elegant.

We can now apply this expression in the determinant formula to get:

\sum_{\sigma\in S_n}\text{sgn}(\sigma)\prod_{i=1}^n(\sum_{j=1}^m P_{ij}\cdot Q_{j\sigma(i)})=
\sum_{\sigma\in S_n}\text{sgn}(\sigma)\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}P_{2j_2}\cdots P_{nj_n}\ \cdot \ Q_{j_1\sigma(1)}Q_{j_2\sigma(2)}\cdots Q_{j_n\sigma(n)}

The sums are finite – so there is no problem to change to order of summation. In addition, the expression $P_{1j_1}P_{2j_2}\cdots P_{nj_n}$ is independent of $\sigma$. Therefore, we get that the expression in the above equals to:

\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}P_{2j_2}\cdots P_{nj_n}\ \cdot \ \sum_{\sigma\in S_n}\text{sgn}(\sigma)\cdot Q_{j_1\sigma(1)}Q_{j_2\sigma(2)}\cdots Q_{j_n\sigma(n)}

Notice that by definition:

\sum_{\sigma\in S_n}\text{sgn}(\sigma)\cdot Q_{j_1\sigma(1)}Q_{j_2\sigma(2)}\cdots Q_{j_n\sigma(n)}=\det(Q_{j_1j_2\dots j_n})

Therefore:

\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}P_{2j_2}\cdots P_{nj_n}\ \cdot \det(Q_{j_1j_2\dots j_n})

Now, if $j_{k_1}=j_{k_2}$ for some $1\leq k_1,k_2\leq n$, then $Q_{j_1j_2\dots j_n}$ has two identical columns, which implies that it’s determinant is 0. If $n> m$, by the pigeonhole principle, there are always such $k_1,k_2$. Therefore, the sum in the above, which is the determinant, is zero.

That makes sense – If $n > m$ then $P,Q$ are at most of degree $m$ (why?) then so as their product. Thus, their product is not invertible (it’s rank is smaller than $n$). And that implies 0 determinant.

So from now on we shall assume that $n\leq m$.

##### Kind of a tricky part

From the argument in the above we can conclude that the sum is:

\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}P_{2j_2}\cdots P_{nj_n}\ \cdot \det(Q_{j_1j_2\dots j_n})\ \ :( j_1\neq j_2\neq\dots\neq j_n)

(Again, if two j_k’s are the same, the detrminant is zero, thus it doesn’t affect the sum…) We can also assume that $j_1< j_2<\dots < j_n$ – However, then we would have to consider all the possible permutations of those numbers. For example, If we pick $j_1=1,j_2=2,j_3=3$ Then They can ‘appear’ in the sum in many ways:

• $P_{11}P_{22}P_{33}$
• $P_{11}P_{23}P_{32}$
• $P_{12}P_{21}P_{33}$
• $P_{12}P_{23}P_{31}$
• $P_{13}P_{21}P_{32}$
• $P_{13}P_{22}P_{31}$

Therefore, we can write the expression as:

\sum_{1\leq j_1< j_2,\dots< j_n\leq m}\sum_{\tau\in S_n}P_{1j_{\tau(1)}}P_{{2j_{\tau(2)}}}\cdots P_{{nj_{\tau(n)}}}\ \cdot \det(Q_{j_{\tau(1)}j_{\tau(2)}\dots j_{\tau(n)}})


Recall that:

\det(Q_{j_{\tau(1)}j_{\tau(2)}\dots j_{\tau(n)}})

=\text{sgn}(\tau)\det(Q_{j_1j_2\dots j_n})

Thus:

\sum_{1\leq j_1< j_2,\dots< j_n\leq m}\sum_{\tau\in S_n}P_{1j_{\tau(1)}}P_{{2j_{\tau(2)}}}\cdots P_{{nj_{\tau(n)}}}\ \cdot\det(Q_{j_{\tau(1)}j_{\tau(2)}\dots j_{\tau(n)}})
=

\sum_{1\leq j_1< j_2,\dots< j_n\leq m}\sum_{\tau\in S_n}P_{1j_{\tau(1)}}P_{{2j_{\tau(2)}}}\cdots P_{{nj_{\tau(n)}}} \cdot \text{sgn}(\tau)\det(Q_{j_1j_2\dots j_n})=

\sum_{1\leq j_1< j_2,\dots< j_n\leq m}\overbrace{\sum_{\tau\in S_n}P_{1j_{\tau(1)}}P_{{2j_{\tau(2)}}}\cdots P_{{nj_{\tau(n)}}} \cdot \text{sgn}(\tau))}^{\det(P_{j_1j_2\dots j_n})\ \ (*)}\cdot\det(Q_{j_1j_2\dots j_n})=
=\sum_{1\leq j_1< j_2,\dots< j_n\leq m}\det(P_{j_1j_2\dots j_n})\cdot\det(Q_{j_1j_2\dots j_n})

And that’s exactly what we wanted! (make sure you understand why (*) is true – This is basically the definition, though it might be a little confusing…)

## Proving Kirchhoff’s theorem

By definition, $L(G)=M\cdot M^T$. Therefore, $L(G)^{(i)}=(M\cdot M^T)^{(i)}$. However, $(M\cdot M^T)^{(i)}=M^i\cdot (M^i)^T$ where $M^i$ is a matrix that obtained from removing the $i^{th}$ row. Convine yourself why it is true – it follows immediately from the definition of matrix multiplication and the fact that the $i^{th}$ row of $M$ is the $i^{th}$ column of $M^T$.

We can now use Cauchy–Binet formula to get:

\det(L(G)^{(i)})=\sum_{j_1,\dots,j_{n-1}}\det(M_{j_1j_2\dots j_{n-1}}^i)\det((M^i)^T_{j_1j_2\dots j_{n-1}})

Notice that $(M^i)^T_{j_1j_2\dots j_{n-1}}=(M^i_{j_1j_2\dots j_{n-1}})^T$ Thus, $\det(M_{j_1j_2\dots j_{n-1}}^i)=\det((M^i)^T_{j_1j_2\dots j_{n-1}}$. So:

\det(L(G)^{(i)})=\sum_{j_1,\dots,j_{n-1}}(\det(M_{j_1j_2\dots j_{n-1}}^i))^2

Ok, the most important thing now is to understand that every choice of $j_1,\dots,j_{n-1}$ is in fact a choice of columns of $M_{j_1j_2\dots j_{n-1}}^i$ – which represent edges.

Therefore, we can treat any coice of $j_1,\dots,j_{n-1}$ as a coice of $n-1$ edges.

My goal now is to show that:

\det(M_{j_{1}j_{2}\dots j_{n-1}}^{i})=\begin{cases}
\pm1 & \text{if }j_{1},\dots,j_{n-1}\text{ span a tree}\\
0 & \text{else}
\end{cases}

If I will be able to prove so, this means that every choice of edges that span a tree will increase the determinant by 1. If not, then nothing changes.

The key here is to understand that if a choice of edges spans a tree, then we have found a spanning tree, and increased the determinant by one. The sum goes over all the possible choices of edges, therefore, if edges span a tree, the detrminant will ‘add the tree to the count’. And that’s it, that would prove the theorem!

So let’s do it.

#### The edges don’t span a tree

Suppose that $j_1,\dots,j_{n-1}$ don’t span a tree. Therefore, the graph is not connected (If it is connected, then it is a tree by definition). So we can observe at a component $A$ that exclude $i$.

The corresponding rows of this component add to $0$ – since we’ve seen that every column in $M$ adds up to 0, and since we have removed the $i^{th}$ row, the only chance that the sum of a column $e$ in $M^i_{j1\dots j_{n-1}}$ won’t be zero – is if $i$ is it’s endpoint, and we’ve picked $A$ such that $i\notin A$ (So if $i$ is an endpoint of $e$, then the corresponding rows of $A$ will be 0 at the corresponding column to $e$…)

So we have found a linear-dependent rows in $M^i_{j1\dots j_{n-1}}$, therefore, it has determinant zero.

#### The edges span a tree

Now suppose that $j_1,\dots, j_n$ span a tree.

For example. If we pick $i = 1$ and the graph spanned by the edges is:

Then, to get $M^i_{j_1\dots j_n}$ we need to remove that first row (and we already know what $M$ is for this graph (here):

M^i_{j_1\dots j_n}=\begin{array}{c}
v_{2}\\
v_{3}\\
v_{4}\\
v_{5}
\end{array}\overset{\begin{array}{cccc}
e_{1}\ \ \  & e_{2}\ \ \  & e_{3} & e_{4}\end{array}}{\left(\begin{array}{cccc}
-1 & 1 & 0 & 0\\
0 & -1 & -1 & 1\\
0 & 0 & 0 & -1\\
0 & 0 & 1 & 0
\end{array}\right)}

Ok, we now have a tree. Recall that a tree has at least two leaves, therefore we can pick a leaf $u_1\neq i$. By definition, it has degree one, thus it is incident to only one edge $f_1$. We shall now permute the rows / columns of the matrix to get a new matrix where $u_1,f_1$ are the first row / column.

In our example, we can pick $v_4$ as $u_1$, and the corresponding edge will be $e_4$. We now permute the matrix to get:

M_{1}=\begin{array}{c}
\bold{v_{4}}\\
v_{3}\\
v_{2}\\
v_{5}
\end{array}\overset{\begin{array}{cccc}
\bold{e_{4}}\ \ \  & e_{2}\ \ \  & e_{3} & e_{1}\end{array}}{\left(\begin{array}{cccc}
-1 & 0 & 0 & 0\\
1 & -1 & -1 &0\\
0 & 1 & 0 & -1\\
0 & 0 & 1 & 0
\end{array}\right)}

We now remove $u_1,f_1$ from the graph and we are left with another tree (removing a leaf results a tree). Again, we can pick $u_2,f_2$ similar to before, and permute the new matrix such that $u_2,f_2$ will be the second row / column. and keep doing so, until we’ve picked $u_{n-1},f_{n-1}$.

Notice how this process guarantees that on the diagonal line, the entries are $\pm 1$, and $u_i \notin f_j$ for $i – Thus, by the end of the process we will get a lower triangular matrix. Moreover, since we are just permuting rows and colums, we are only changing the detreminant by a factor of $\pm 1$.

In our example, the resulting matrix from the process is:

M_{3}=\begin{array}{c}
{v_{4}}\\
v_{5}\\
v_{3}\\
v_{2}
\end{array}\overset{\begin{array}{cccc}
{e_{4}}\ \ \  & e_{3}\ \ \  & e_{2} & e_{1}\end{array}}{\left(\begin{array}{cccc}
-1 & 0 & 0 & 0\\
0 & 1 &0 & 0\\
1 & 1 & -1 &0\\
0& 0& 1 & -1
\end{array}\right)}

The determinant of a lower triangular matrix is the product of the entries on the diagonal line, therefore, the determinant of the new matrix is $\pm 1$. Thus, the determinant of $M^i_{j_1\dots j_n}$ is $\pm 1$ as well. And that’s exactly what we wanted!

## Summary

Well that was pretty long, we had a lot preparations to do. However, the proof itself wasn’t that long at all, and that’s so great!

Notice how such a non-trivial question – finding number of spanning trees in any graph, gets such an unexpected solution!

And by the way, I am not the only who thinks that this is a beautiful proof, the proof is based on one from THE BOOK.

Just one more thing before I’ll end this post – Notice that if $G=K_n$, then the laplacian is;

L(K_n)=\left(\begin{array}{ccccc}
n-1 & -1 & \cdots & \cdots & -1\\
-1 & n-1 &  &  & \vdots\\
& -1 & \ddots\\
\vdots & \vdots &  & \ddots & -1\\
-1 & -1 & \cdots & -1 & n-1
\end{array}\right)

Pick any $1\leq i\leq n$ you want and calculate $\det(L(K_n)^{(i)})$. What will you get? My guess is $n^{n-2}$