In the last post we’ve seen that the number of spanning trees in the complete graph is . That’s a great result, however, I am not satisfied yet…

So we know how many spanning trees has – big deal! Why restrict ourseleves only to ? wouldn’t it be better to find the number of spanning trees of **any** graph? Of course it will, and that’s what we will do today.

With the help of **Kirchhoff’s theorem**, by the end of the post, we will be able to find the number of spanning trees in any graph.

This is not going to be so easy though… **Linear algebra** is going to be involved here! However, I find the proof that I am going to present here **marvelous**, and shows how such an unexpected things (such as *spoiler*: determinants) are going to be exactly what we were looking for.

So let’s not spend any more time and dive right in:

## Incidence Matrix & The Laplacian

If you recall, in the second post where I loaded tons of definitions, one of the definition of an **incidence matrix**. I have defined the incidence matrix of a graph, as a matrix where is the number of vertices and is the number of edges.

It was defined as:

M_{v,e}=\begin{cases} 2 & e\text{\text{ is a loop incident to }} v\\ 1 & e\text{ is incident to }v\text{ but not a loop}\\ 0 & \text{else} \end{cases}

I am going to deal only with **simple** graphs (graphs with no loops), therefore, we can narrow down the options to get:

M_{v,e}=\begin{cases} 1 & e\text{ is incident to }v\\ 0 & \text{else} \end{cases}

Great, I am now going to generalize the definition since I will deal with a **directed graph**. . The new definition is:

M_{v,e}=\begin{cases} 1 & e\text{ leaves }v\\ -1 & e\text{ enters }v\\ 0 & \text{else} \end{cases}

Let’s see an example:

For this graph, the corresponding incidence matrix is:

\begin{array}{c} v_{1}\\ v_{2}\\ v_{3}\\ v_{4}\\ v_{5} \end{array}\overset{\begin{array}{cccc} e_{1} & e_{2} & e_{3} & e_{4}\end{array}}{\begin{pmatrix}1 & 0 & 0 & 0\\ 1 & 1 & 0 & 0\\ 0 & 1 & 1 & 1\\ 0 & 0 & 0 & 1\\ 0 & 0 & 1 & 0 \end{pmatrix}}

Now Let’s add some directions:

Cool, now the incidence matrix is:

\begin{array}{c} v_{1}\\ v_{2}\\ v_{3}\\ v_{4}\\ v_{5} \end{array}\overset{\begin{array}{cccc} e_{1}\ & \ e_{2} & \ \ \ \ e_{3} & \ \ e_{4}\end{array}}{\begin{pmatrix}1 & 0 & 0 & 0\\ -1 & 1 & 0 & 0\\ 0 & -1 & -1 & 1\\ 0 & 0 & 0 & -1\\ 0 & 0 & 1 & 0 \end{pmatrix}}

(Notice how the sum of each column is zero).

#### Meet the Laplacian

There is something that kind of bugs me in this matrix – it is not a **square** matrix, however, we can easilly create one from it, we can define:

C=M\cdot M^T \in M_{V\times V}(F)

Not only this matrix is squared, it is also **symmetric**. Let’s try to calculate it’s entries:

C_{vv}= (M\cdot M^T)_{vv}=\sum_{e}M_{ve}M^T_{ev}

This is just the definition of a matrix multiplication (recall that the columns represent the **edges** of the graph). Now what is ? This is exactly . Therefore we get:

C_{vv}=\sum_{e}M_{ve}\cdot M_{ve}=\sum_e(M_{ve})^2

Now there are two cases:

- enters / leaves , thus, which implies that .
- Otherwise, .

Notice what we have just found out: Every edge that is incident to increases by one. Therefore, is exactly the degree of .

That’s really nice! Let’s see the general case now:

C_{vw}= (M\cdot M^T)_{vw}=\sum_{e}M_{ve}M^T_{ew}=\sum_{e}M_{ve}M_{we}

If is incident to both and , then or . Either way, their product is . Since the graph is simple, only **one** edge can be incident to both and – so we get that .

On the other hand, is not incident to (then ) or it is not incident to (then ). Either way, their product is .

Let’s summarize what we have got so far:

C_{vw}=\begin{cases} \deg(v) & v=w\\ -1 & v\text{ is adjacent to}\\ 0 & \text{else} \end{cases}w

This matrix has a special name – It is called **The Laplacian** of , and denoted by .

For example, in our graph:

The laplacian is:

L=\begin{array}{c} v_{1}\\ v_{2}\\ v_{3}\\ v_{4}\\ v_{5} \end{array}\overset{\begin{array}{ccccc} v_{1}\ \ \ & v_{2}\ \ \ & v_{3}\ \ \ & v_{4}\ \ & v_{5}\end{array}}{\left(\begin{array}{ccccc} 1 & -1 & 0 & 0 & 0\\ -1 & 2 & -1 & 0 & 0\\ 0 & -1 & 3 & -1 & -1\\ 0 & 0 & -1 & 1 & 0\\ 0 & 0 & -1 & 0 & 1 \end{array}\right)}

Great, we are now ready to present the theorem:

## The theorem

Suppose that is a conncected graph with vertices, then the number of spanning trees of G is:

\det (L(G)^{(i)})

For every . Where is a matrix, that obtained from deleting the row and the column from .

Pause here for a second to process what’s going on here. This is really, really not-trivial! What does a determinant has to do with spanning trees???

To summarize it before I’ll start proving the theorem – I’ll just say that I didn’t see it coming at all!

## The proof

The proof is going to be in parts, first I need to prove a theorem from linear algebra that is not so popular – most of the students won’t see it in a linear algebra class…

Then The proof statrs, I am going to use the theorem from linear algebra right away and then, magic happens!

Let’s do it:

### Cauchy–Binet formula

Suppose that . Thier product, is a square matrix, thus, it has a detrminant.

Cauchy–Binet formula states that:

\det(PQ)=\sum_{1\leq j_1<\dots< j_n\leq m}\det(P_{j_1j_2\dots j_n})\cdot\det(Q_{j_1j_2\dots j_n})

Wow, hold up for a second, this seems really complicated. Let’s explain it step-by-step.

The matrix is just the matrix:

\left(\begin{array}{cccc} | & | & & |\\ C_{j_{1}}(P) & C_{j_{2}}(P) & \dots & C_{j_{n}}(P)\\ | & | & & | \end{array}\right)

( is the column of )

Moreover, is just the matrix:

\left(\begin{array}{ccc} - & R_{j_{1}}(Q) & -\\ - & R_{j_{2}}(Q) & -\\ & \vdots\\ - & R_{j_{n}}(Q) & - \end{array}\right)

( is the row of )

What this formula says, is that in order to calculate the dertminant of you have to go over all the ways to pick ordered, different columns – , calculate the determinant of the sub-matrices . Then, calculate thier product, and sum everything up.

Notice that if then there is only one way to choose ordered, different columns: moreover, . Thus we get the familiar identity:

\det(PQ)=\det (P)\cdot \det(Q)

Ok, let’s prove it – If you feel like the proof is getting kind of heavy, you can skip it and move on to the proof of the main theorem. You will still be able to understand the proof (you just won’t be able to justify the use of this formula) :

#### Proving the formula

By the definition of a determinant:

\det(PQ)=\sum_{\sigma\in S_n}\text{sgn}(\sigma)\prod_{i=1}^n(PQ)_{i\sigma(i)}

Using matrix multiplication to get:

\sum_{\sigma\in S_n}\text{sgn}(\sigma)\prod_{i=1}^n(PQ)_{i\sigma(i)}=\sum_{\sigma\in S_n}\text{sgn}(\sigma)\prod_{i=1}^n(\sum_{j=1}^m P_{ij}\cdot Q_{j\sigma(i)})

Let’s focus on the product:

\prod_{i=1}^n(\sum_{j=1}^m P_{ij}\cdot Q_{j\sigma(i)})=(\sum_{j=1}^m P_{1j}\cdot Q_{j\sigma(1)})\cdot(\sum_{j=1}^m P_{2j}\cdot Q_{j\sigma(2)})\cdots(\sum_{j=1}^m P_{nj}\cdot Q_{j\sigma(n)})

The product has factors. Each factor is made up of a sum of elements. If we will simplify the product, we will get a sum made of expressions of the form:

P_{1j_1}Q_{j_1\sigma(1)}\cdot P_{2j_2}Q_{j_2\sigma(2)}\cdots P_{nj_n}Q_{j_n\sigma(n)}

Where (they can be the same).

##### Quick example

For example, if , , then and :

P_{1j_1}Q_{j_1\sigma(1)}\cdot P_{2j_2}Q_{j_2\sigma(2)}=P_{13}Q_{32}\cdot P_{22}Q_{21}

Is an element of the sum.

##### Back to the proof

From every element of the sum I can ‘pick’ whatever that I want. Therefore, the product is actually a sum, made from all the possible expressions . We now have:

\prod_{i=1}^n(\sum_{j=1}^m P_{ij}\cdot Q_{j\sigma(i)})=\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}Q_{j_1\sigma(1)}\cdot P_{2j_2}Q_{j_2\sigma(2)}\cdots P_{nj_n}Q_{j_n\sigma(n)}

=\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}P_{2j_2}\cdots P_{nj_n}\ \cdot \ Q_{j_1\sigma(1)}Q_{j_2\sigma(2)}\cdots Q_{j_n\sigma(n)}

It may seem scary and all, but if you understand it – you realize that this is actually quite **elegant**.

We can now apply this expression in the determinant formula to get:

\sum_{\sigma\in S_n}\text{sgn}(\sigma)\prod_{i=1}^n(\sum_{j=1}^m P_{ij}\cdot Q_{j\sigma(i)})=

\sum_{\sigma\in S_n}\text{sgn}(\sigma)\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}P_{2j_2}\cdots P_{nj_n}\ \cdot \ Q_{j_1\sigma(1)}Q_{j_2\sigma(2)}\cdots Q_{j_n\sigma(n)}

The sums are finite – so there is no problem to change to order of summation. In addition, the expression is independent of . Therefore, we get that the expression in the above equals to:

\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}P_{2j_2}\cdots P_{nj_n}\ \cdot \ \sum_{\sigma\in S_n}\text{sgn}(\sigma)\cdot Q_{j_1\sigma(1)}Q_{j_2\sigma(2)}\cdots Q_{j_n\sigma(n)}

Notice that by definition:

\sum_{\sigma\in S_n}\text{sgn}(\sigma)\cdot Q_{j_1\sigma(1)}Q_{j_2\sigma(2)}\cdots Q_{j_n\sigma(n)}=\det(Q_{j_1j_2\dots j_n})

Therefore:

\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}P_{2j_2}\cdots P_{nj_n}\ \cdot \det(Q_{j_1j_2\dots j_n})

Now, if for some , then has two identical columns, which implies that it’s determinant is 0. If , by the pigeonhole principle, there are always such . Therefore, the sum in the above, which is the determinant, is zero.

That makes sense – If then are at most of degree (why?) then so as their product. Thus, their product is not invertible (it’s rank is smaller than ). And that implies 0 determinant.

So from now on we shall assume that .

##### Kind of a tricky part

From the argument in the above we can conclude that the sum is:

\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}P_{2j_2}\cdots P_{nj_n}\ \cdot \det(Q_{j_1j_2\dots j_n})\ \ :( j_1\neq j_2\neq\dots\neq j_n)

(Again, if two j_k’s are the same, the detrminant is zero, thus it doesn’t affect the sum…) We can also assume that – However, then we would have to consider all the possible permutations of those numbers. For example, If we pick Then They can ‘appear’ in the sum in many ways:

Therefore, we can write the expression as:

\sum_{1\leq j_1< j_2,\dots< j_n\leq m}\sum_{\tau\in S_n}P_{1j_{\tau(1)}}P_{{2j_{\tau(2)}}}\cdots P_{{nj_{\tau(n)}}}\ \cdot \det(Q_{j_{\tau(1)}j_{\tau(2)}\dots j_{\tau(n)}})

Recall that:

\det(Q_{j_{\tau(1)}j_{\tau(2)}\dots j_{\tau(n)}}) =\text{sgn}(\tau)\det(Q_{j_1j_2\dots j_n})

Thus:

\sum_{1\leq j_1< j_2,\dots< j_n\leq m}\sum_{\tau\in S_n}P_{1j_{\tau(1)}}P_{{2j_{\tau(2)}}}\cdots P_{{nj_{\tau(n)}}}\ \cdot\det(Q_{j_{\tau(1)}j_{\tau(2)}\dots j_{\tau(n)}}) =

\sum_{1\leq j_1< j_2,\dots< j_n\leq m}\sum_{\tau\in S_n}P_{1j_{\tau(1)}}P_{{2j_{\tau(2)}}}\cdots P_{{nj_{\tau(n)}}} \cdot \text{sgn}(\tau)\det(Q_{j_1j_2\dots j_n})=

\sum_{1\leq j_1< j_2,\dots< j_n\leq m}\overbrace{\sum_{\tau\in S_n}P_{1j_{\tau(1)}}P_{{2j_{\tau(2)}}}\cdots P_{{nj_{\tau(n)}}} \cdot \text{sgn}(\tau))}^{\det(P_{j_1j_2\dots j_n})\ \ (*)}\cdot\det(Q_{j_1j_2\dots j_n})=

=\sum_{1\leq j_1< j_2,\dots< j_n\leq m}\det(P_{j_1j_2\dots j_n})\cdot\det(Q_{j_1j_2\dots j_n})

And that’s exactly what we wanted! (make sure you understand why (*) is true – This is basically the definition, though it might be a little confusing…)

## Proving Kirchhoff’s theorem

By definition, . Therefore, . However, where is a matrix that obtained from removing the row. Convine yourself why it is true – it follows immediately from the definition of matrix multiplication and the fact that the row of is the **column** of .

We can now use Cauchy–Binet formula to get:

\det(L(G)^{(i)})=\sum_{j_1,\dots,j_{n-1}}\det(M_{j_1j_2\dots j_{n-1}}^i)\det((M^i)^T_{j_1j_2\dots j_{n-1}})

Notice that Thus, . So:

\det(L(G)^{(i)})=\sum_{j_1,\dots,j_{n-1}}(\det(M_{j_1j_2\dots j_{n-1}}^i))^2

Ok, the most important thing now is to understand that every choice of is in fact a choice of columns of – which represent **edges**.

Therefore, we can treat any coice of as a coice of edges.

My goal now is to show that:

\det(M_{j_{1}j_{2}\dots j_{n-1}}^{i})=\begin{cases} \pm1 & \text{if }j_{1},\dots,j_{n-1}\text{ span a tree}\\ 0 & \text{else} \end{cases}

If I will be able to prove so, this means that every choice of edges that span a tree will increase the determinant by 1. If not, then nothing changes.

The key here is to understand that if a choice of edges spans a tree, then we have found a spanning tree, and increased the determinant by one. The sum goes over all the possible choices of edges, therefore, if edges span a tree, the detrminant will ‘add the tree to the count’. And that’s it, that would prove the theorem!

So let’s do it.

#### The edges don’t span a tree

Suppose that don’t span a tree. Therefore, the graph is **not** connected (If it is connected, then it is a tree by definition). So we can observe at a component that exclude .

The corresponding rows of this component add to – since we’ve seen that every column in adds up to 0, and since we have removed the row, the only chance that the sum of a column in won’t be zero – is if is it’s endpoint, and we’ve picked such that (So if is an endpoint of , then the corresponding rows of will be 0 at the corresponding column to …)

So we have found a linear-dependent rows in , therefore, it has determinant **zero**.

#### The edges span a tree

Now suppose that span a tree.

For example. If we pick and the graph spanned by the edges is:

Then, to get we need to remove that **first **row (and we already know what is for this graph (here):

M^i_{j_1\dots j_n}=\begin{array}{c} v_{2}\\ v_{3}\\ v_{4}\\ v_{5} \end{array}\overset{\begin{array}{cccc} e_{1}\ \ \ & e_{2}\ \ \ & e_{3} & e_{4}\end{array}}{\left(\begin{array}{cccc} -1 & 1 & 0 & 0\\ 0 & -1 & -1 & 1\\ 0 & 0 & 0 & -1\\ 0 & 0 & 1 & 0 \end{array}\right)}

Ok, we now have a tree. Recall that a tree has **at least** **two** leaves, therefore we can pick a leaf . By definition, it has degree one, thus it is incident to only **one** edge . We shall now permute the rows / columns of the matrix to get a new matrix where are the first row / column.

In our example, we can pick as , and the corresponding edge will be . We now permute the matrix to get:

M_{1}=\begin{array}{c} \bold{v_{4}}\\ v_{3}\\ v_{2}\\ v_{5} \end{array}\overset{\begin{array}{cccc} \bold{e_{4}}\ \ \ & e_{2}\ \ \ & e_{3} & e_{1}\end{array}}{\left(\begin{array}{cccc} -1 & 0 & 0 & 0\\ 1 & -1 & -1 &0\\ 0 & 1 & 0 & -1\\ 0 & 0 & 1 & 0 \end{array}\right)}

We now remove from the graph and we are left with another tree (removing a leaf results a tree). Again, we can pick similar to before, and permute the new matrix such that will be the **second** row / column. and keep doing so, until we’ve picked .

Notice how this process guarantees that on the diagonal line, the entries are , and for – Thus, by the end of the process we will get a **lower triangular matrix**. Moreover, since we are just permuting rows and colums, we are only changing the detreminant by a factor of .

In our example, the resulting matrix from the process is:

M_{3}=\begin{array}{c} {v_{4}}\\ v_{5}\\ v_{3}\\ v_{2} \end{array}\overset{\begin{array}{cccc} {e_{4}}\ \ \ & e_{3}\ \ \ & e_{2} & e_{1}\end{array}}{\left(\begin{array}{cccc} -1 & 0 & 0 & 0\\ 0 & 1 &0 & 0\\ 1 & 1 & -1 &0\\ 0& 0& 1 & -1 \end{array}\right)}

The determinant of a lower triangular matrix is the product of the entries on the diagonal line, therefore, the determinant of the new matrix is . Thus, the determinant of is as well. And that’s **exactly** what we wanted!

## Summary

Well that was pretty long, we had a lot preparations to do. However, the proof itself **wasn’t **that long at all, and that’s so great!

Notice how such a **non-trivial** question – finding number of spanning trees in **any **graph, gets such an unexpected solution!

And by the way, I am not the only who thinks that this is a beautiful proof, the proof is based on one from THE BOOK.

Just one more thing before I’ll end this post – Notice that if , then the laplacian is;

L(K_n)=\left(\begin{array}{ccccc} n-1 & -1 & \cdots & \cdots & -1\\ -1 & n-1 & & & \vdots\\ & -1 & \ddots\\ \vdots & \vdots & & \ddots & -1\\ -1 & -1 & \cdots & -1 & n-1 \end{array}\right)

Pick any you want and calculate . What will you get? My guess is …