In the last post we’ve seen that the number of spanning trees in the complete graph is
. That’s a great result, however, I am not satisfied yet…
So we know how many spanning trees has – big deal! Why restrict ourseleves only to
? wouldn’t it be better to find the number of spanning trees of any graph? Of course it will, and that’s what we will do today.
With the help of Kirchhoff’s theorem, by the end of the post, we will be able to find the number of spanning trees in any graph.
This is not going to be so easy though… Linear algebra is going to be involved here! However, I find the proof that I am going to present here marvelous, and shows how such an unexpected things (such as *spoiler*: determinants) are going to be exactly what we were looking for.
So let’s not spend any more time and dive right in:
Incidence Matrix & The Laplacian
If you recall, in the second post where I loaded tons of definitions, one of the definition of an incidence matrix. I have defined the incidence matrix of a graph, as a
matrix where
is the number of vertices and
is the number of edges.
It was defined as:
M_{v,e}=\begin{cases} 2 & e\text{\text{ is a loop incident to }} v\\ 1 & e\text{ is incident to }v\text{ but not a loop}\\ 0 & \text{else} \end{cases}
I am going to deal only with simple graphs (graphs with no loops), therefore, we can narrow down the options to get:
M_{v,e}=\begin{cases} 1 & e\text{ is incident to }v\\ 0 & \text{else} \end{cases}
Great, I am now going to generalize the definition since I will deal with a directed graph. . The new definition is:
M_{v,e}=\begin{cases} 1 & e\text{ leaves }v\\ -1 & e\text{ enters }v\\ 0 & \text{else} \end{cases}
Let’s see an example:

For this graph, the corresponding incidence matrix is:
\begin{array}{c} v_{1}\\ v_{2}\\ v_{3}\\ v_{4}\\ v_{5} \end{array}\overset{\begin{array}{cccc} e_{1} & e_{2} & e_{3} & e_{4}\end{array}}{\begin{pmatrix}1 & 0 & 0 & 0\\ 1 & 1 & 0 & 0\\ 0 & 1 & 1 & 1\\ 0 & 0 & 0 & 1\\ 0 & 0 & 1 & 0 \end{pmatrix}}
Now Let’s add some directions:

Cool, now the incidence matrix is:
\begin{array}{c} v_{1}\\ v_{2}\\ v_{3}\\ v_{4}\\ v_{5} \end{array}\overset{\begin{array}{cccc} e_{1}\ & \ e_{2} & \ \ \ \ e_{3} & \ \ e_{4}\end{array}}{\begin{pmatrix}1 & 0 & 0 & 0\\ -1 & 1 & 0 & 0\\ 0 & -1 & -1 & 1\\ 0 & 0 & 0 & -1\\ 0 & 0 & 1 & 0 \end{pmatrix}}
(Notice how the sum of each column is zero).
Meet the Laplacian
There is something that kind of bugs me in this matrix – it is not a square matrix, however, we can easilly create one from it, we can define:
C=M\cdot M^T \in M_{V\times V}(F)
Not only this matrix is squared, it is also symmetric. Let’s try to calculate it’s entries:
C_{vv}= (M\cdot M^T)_{vv}=\sum_{e}M_{ve}M^T_{ev}
This is just the definition of a matrix multiplication (recall that the columns represent the edges of the graph). Now what is ? This is exactly
. Therefore we get:
C_{vv}=\sum_{e}M_{ve}\cdot M_{ve}=\sum_e(M_{ve})^2
Now there are two cases:
enters / leaves
, thus,
which implies that
.
- Otherwise,
.
Notice what we have just found out: Every edge that is incident to
increases
by one. Therefore,
is exactly the degree of
.
That’s really nice! Let’s see the general case now:
C_{vw}= (M\cdot M^T)_{vw}=\sum_{e}M_{ve}M^T_{ew}=\sum_{e}M_{ve}M_{we}
If is incident to both
and
, then
or
. Either way, their product is
. Since the graph is simple, only one edge
can be incident to both
and
– so we get that
.
On the other hand, is not incident to
(then
) or it is not incident to
(then
). Either way, their product is
.
Let’s summarize what we have got so far:
C_{vw}=\begin{cases} \deg(v) & v=w\\ -1 & v\text{ is adjacent to}\\ 0 & \text{else} \end{cases}w
This matrix has a special name – It is called The Laplacian of , and denoted by
.
For example, in our graph:

The laplacian is:
L=\begin{array}{c} v_{1}\\ v_{2}\\ v_{3}\\ v_{4}\\ v_{5} \end{array}\overset{\begin{array}{ccccc} v_{1}\ \ \ & v_{2}\ \ \ & v_{3}\ \ \ & v_{4}\ \ & v_{5}\end{array}}{\left(\begin{array}{ccccc} 1 & -1 & 0 & 0 & 0\\ -1 & 2 & -1 & 0 & 0\\ 0 & -1 & 3 & -1 & -1\\ 0 & 0 & -1 & 1 & 0\\ 0 & 0 & -1 & 0 & 1 \end{array}\right)}
Great, we are now ready to present the theorem:
The theorem
Suppose that is a conncected graph with
vertices, then the number of spanning trees of G is:
\det (L(G)^{(i)})
For every . Where
is a
matrix, that obtained from deleting the
row and the
column from
.
Pause here for a second to process what’s going on here. This is really, really not-trivial! What does a determinant has to do with spanning trees???
To summarize it before I’ll start proving the theorem – I’ll just say that I didn’t see it coming at all!
The proof
The proof is going to be in parts, first I need to prove a theorem from linear algebra that is not so popular – most of the students won’t see it in a linear algebra class…
Then The proof statrs, I am going to use the theorem from linear algebra right away and then, magic happens!
Let’s do it:
Cauchy–Binet formula
Suppose that . Thier product,
is a square matrix, thus, it has a detrminant.
Cauchy–Binet formula states that:
\det(PQ)=\sum_{1\leq j_1<\dots< j_n\leq m}\det(P_{j_1j_2\dots j_n})\cdot\det(Q_{j_1j_2\dots j_n})
Wow, hold up for a second, this seems really complicated. Let’s explain it step-by-step.
The matrix is just the
matrix:
\left(\begin{array}{cccc} | & | & & |\\ C_{j_{1}}(P) & C_{j_{2}}(P) & \dots & C_{j_{n}}(P)\\ | & | & & | \end{array}\right)
( is the
column of
)
Moreover, is just the
matrix:
\left(\begin{array}{ccc} - & R_{j_{1}}(Q) & -\\ - & R_{j_{2}}(Q) & -\\ & \vdots\\ - & R_{j_{n}}(Q) & - \end{array}\right)
( is the
row of
)
What this formula says, is that in order to calculate the dertminant of you have to go over all the ways to pick
ordered, different columns –
, calculate the determinant of the sub-matrices
. Then, calculate thier product, and sum everything up.
Notice that if then there is only one way to choose
ordered, different columns:
moreover,
. Thus we get the familiar identity:
\det(PQ)=\det (P)\cdot \det(Q)
Ok, let’s prove it – If you feel like the proof is getting kind of heavy, you can skip it and move on to the proof of the main theorem. You will still be able to understand the proof (you just won’t be able to justify the use of this formula) :
Proving the formula
By the definition of a determinant:
\det(PQ)=\sum_{\sigma\in S_n}\text{sgn}(\sigma)\prod_{i=1}^n(PQ)_{i\sigma(i)}
Using matrix multiplication to get:
\sum_{\sigma\in S_n}\text{sgn}(\sigma)\prod_{i=1}^n(PQ)_{i\sigma(i)}=\sum_{\sigma\in S_n}\text{sgn}(\sigma)\prod_{i=1}^n(\sum_{j=1}^m P_{ij}\cdot Q_{j\sigma(i)})
Let’s focus on the product:
\prod_{i=1}^n(\sum_{j=1}^m P_{ij}\cdot Q_{j\sigma(i)})=(\sum_{j=1}^m P_{1j}\cdot Q_{j\sigma(1)})\cdot(\sum_{j=1}^m P_{2j}\cdot Q_{j\sigma(2)})\cdots(\sum_{j=1}^m P_{nj}\cdot Q_{j\sigma(n)})
The product has factors. Each factor is made up of a sum of
elements. If we will simplify the product, we will get a sum made of expressions of the form:
P_{1j_1}Q_{j_1\sigma(1)}\cdot P_{2j_2}Q_{j_2\sigma(2)}\cdots P_{nj_n}Q_{j_n\sigma(n)}
Where (they can be the same).
Quick example
For example, if ,
,
then and
:
P_{1j_1}Q_{j_1\sigma(1)}\cdot P_{2j_2}Q_{j_2\sigma(2)}=P_{13}Q_{32}\cdot P_{22}Q_{21}
Is an element of the sum.
Back to the proof
From every element of the sum I can ‘pick’ whatever that I want. Therefore, the product is actually a sum, made from all the possible expressions
. We now have:
\prod_{i=1}^n(\sum_{j=1}^m P_{ij}\cdot Q_{j\sigma(i)})=\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}Q_{j_1\sigma(1)}\cdot P_{2j_2}Q_{j_2\sigma(2)}\cdots P_{nj_n}Q_{j_n\sigma(n)}
=\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}P_{2j_2}\cdots P_{nj_n}\ \cdot \ Q_{j_1\sigma(1)}Q_{j_2\sigma(2)}\cdots Q_{j_n\sigma(n)}
It may seem scary and all, but if you understand it – you realize that this is actually quite elegant.
We can now apply this expression in the determinant formula to get:
\sum_{\sigma\in S_n}\text{sgn}(\sigma)\prod_{i=1}^n(\sum_{j=1}^m P_{ij}\cdot Q_{j\sigma(i)})=
\sum_{\sigma\in S_n}\text{sgn}(\sigma)\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}P_{2j_2}\cdots P_{nj_n}\ \cdot \ Q_{j_1\sigma(1)}Q_{j_2\sigma(2)}\cdots Q_{j_n\sigma(n)}
The sums are finite – so there is no problem to change to order of summation. In addition, the expression is independent of
. Therefore, we get that the expression in the above equals to:
\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}P_{2j_2}\cdots P_{nj_n}\ \cdot \ \sum_{\sigma\in S_n}\text{sgn}(\sigma)\cdot Q_{j_1\sigma(1)}Q_{j_2\sigma(2)}\cdots Q_{j_n\sigma(n)}
Notice that by definition:
\sum_{\sigma\in S_n}\text{sgn}(\sigma)\cdot Q_{j_1\sigma(1)}Q_{j_2\sigma(2)}\cdots Q_{j_n\sigma(n)}=\det(Q_{j_1j_2\dots j_n})
Therefore:
\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}P_{2j_2}\cdots P_{nj_n}\ \cdot \det(Q_{j_1j_2\dots j_n})
Now, if for some
, then
has two identical columns, which implies that it’s determinant is 0. If
, by the pigeonhole principle, there are always such
. Therefore, the sum in the above, which is the determinant, is zero.
That makes sense – If then
are at most of degree
(why?) then so as their product. Thus, their product is not invertible (it’s rank is smaller than
). And that implies 0 determinant.
So from now on we shall assume that .
Kind of a tricky part
From the argument in the above we can conclude that the sum is:
\sum_{1\leq j_1,j_2,\dots,j_n\leq m}P_{1j_1}P_{2j_2}\cdots P_{nj_n}\ \cdot \det(Q_{j_1j_2\dots j_n})\ \ :( j_1\neq j_2\neq\dots\neq j_n)
(Again, if two j_k’s are the same, the detrminant is zero, thus it doesn’t affect the sum…) We can also assume that – However, then we would have to consider all the possible permutations of those numbers. For example, If we pick
Then They can ‘appear’ in the sum in many ways:
Therefore, we can write the expression as:
\sum_{1\leq j_1< j_2,\dots< j_n\leq m}\sum_{\tau\in S_n}P_{1j_{\tau(1)}}P_{{2j_{\tau(2)}}}\cdots P_{{nj_{\tau(n)}}}\ \cdot \det(Q_{j_{\tau(1)}j_{\tau(2)}\dots j_{\tau(n)}})
Recall that:
\det(Q_{j_{\tau(1)}j_{\tau(2)}\dots j_{\tau(n)}}) =\text{sgn}(\tau)\det(Q_{j_1j_2\dots j_n})
Thus:
\sum_{1\leq j_1< j_2,\dots< j_n\leq m}\sum_{\tau\in S_n}P_{1j_{\tau(1)}}P_{{2j_{\tau(2)}}}\cdots P_{{nj_{\tau(n)}}}\ \cdot\det(Q_{j_{\tau(1)}j_{\tau(2)}\dots j_{\tau(n)}}) =
\sum_{1\leq j_1< j_2,\dots< j_n\leq m}\sum_{\tau\in S_n}P_{1j_{\tau(1)}}P_{{2j_{\tau(2)}}}\cdots P_{{nj_{\tau(n)}}} \cdot \text{sgn}(\tau)\det(Q_{j_1j_2\dots j_n})=
\sum_{1\leq j_1< j_2,\dots< j_n\leq m}\overbrace{\sum_{\tau\in S_n}P_{1j_{\tau(1)}}P_{{2j_{\tau(2)}}}\cdots P_{{nj_{\tau(n)}}} \cdot \text{sgn}(\tau))}^{\det(P_{j_1j_2\dots j_n})\ \ (*)}\cdot\det(Q_{j_1j_2\dots j_n})=
=\sum_{1\leq j_1< j_2,\dots< j_n\leq m}\det(P_{j_1j_2\dots j_n})\cdot\det(Q_{j_1j_2\dots j_n})
And that’s exactly what we wanted! (make sure you understand why (*) is true – This is basically the definition, though it might be a little confusing…)
Proving Kirchhoff’s theorem
By definition, . Therefore,
. However,
where
is a matrix that obtained from removing the
row. Convine yourself why it is true – it follows immediately from the definition of matrix multiplication and the fact that the
row of
is the
column of
.
We can now use Cauchy–Binet formula to get:
\det(L(G)^{(i)})=\sum_{j_1,\dots,j_{n-1}}\det(M_{j_1j_2\dots j_{n-1}}^i)\det((M^i)^T_{j_1j_2\dots j_{n-1}})
Notice that Thus,
. So:
\det(L(G)^{(i)})=\sum_{j_1,\dots,j_{n-1}}(\det(M_{j_1j_2\dots j_{n-1}}^i))^2
Ok, the most important thing now is to understand that every choice of is in fact a choice of columns of
– which represent edges.
Therefore, we can treat any coice of as a coice of
edges.
My goal now is to show that:
\det(M_{j_{1}j_{2}\dots j_{n-1}}^{i})=\begin{cases} \pm1 & \text{if }j_{1},\dots,j_{n-1}\text{ span a tree}\\ 0 & \text{else} \end{cases}
If I will be able to prove so, this means that every choice of edges that span a tree will increase the determinant by 1. If not, then nothing changes.
The key here is to understand that if a choice of edges spans a tree, then we have found a spanning tree, and increased the determinant by one. The sum goes over all the possible choices of edges, therefore, if edges span a tree, the detrminant will ‘add the tree to the count’. And that’s it, that would prove the theorem!
So let’s do it.
The edges don’t span a tree
Suppose that don’t span a tree. Therefore, the graph is not connected (If it is connected, then it is a tree by definition). So we can observe at a component
that exclude
.
The corresponding rows of this component add to – since we’ve seen that every column in
adds up to 0, and since we have removed the
row, the only chance that the sum of a column
in
won’t be zero – is if
is it’s endpoint, and we’ve picked
such that
(So if
is an endpoint of
, then the corresponding rows of
will be 0 at the corresponding column to
…)
So we have found a linear-dependent rows in , therefore, it has determinant zero.
The edges span a tree
Now suppose that span a tree.
For example. If we pick and the graph spanned by the edges is:

Then, to get we need to remove that first row (and we already know what
is for this graph (here):
M^i_{j_1\dots j_n}=\begin{array}{c} v_{2}\\ v_{3}\\ v_{4}\\ v_{5} \end{array}\overset{\begin{array}{cccc} e_{1}\ \ \ & e_{2}\ \ \ & e_{3} & e_{4}\end{array}}{\left(\begin{array}{cccc} -1 & 1 & 0 & 0\\ 0 & -1 & -1 & 1\\ 0 & 0 & 0 & -1\\ 0 & 0 & 1 & 0 \end{array}\right)}
Ok, we now have a tree. Recall that a tree has at least two leaves, therefore we can pick a leaf . By definition, it has degree one, thus it is incident to only one edge
. We shall now permute the rows / columns of the matrix to get a new matrix where
are the first row / column.
In our example, we can pick as
, and the corresponding edge will be
. We now permute the matrix to get:
M_{1}=\begin{array}{c} \bold{v_{4}}\\ v_{3}\\ v_{2}\\ v_{5} \end{array}\overset{\begin{array}{cccc} \bold{e_{4}}\ \ \ & e_{2}\ \ \ & e_{3} & e_{1}\end{array}}{\left(\begin{array}{cccc} -1 & 0 & 0 & 0\\ 1 & -1 & -1 &0\\ 0 & 1 & 0 & -1\\ 0 & 0 & 1 & 0 \end{array}\right)}
We now remove from the graph and we are left with another tree (removing a leaf results a tree). Again, we can pick
similar to before, and permute the new matrix such that
will be the second row / column. and keep doing so, until we’ve picked
.
Notice how this process guarantees that on the diagonal line, the entries are , and
for
– Thus, by the end of the process we will get a lower triangular matrix. Moreover, since we are just permuting rows and colums, we are only changing the detreminant by a factor of
.
In our example, the resulting matrix from the process is:
M_{3}=\begin{array}{c} {v_{4}}\\ v_{5}\\ v_{3}\\ v_{2} \end{array}\overset{\begin{array}{cccc} {e_{4}}\ \ \ & e_{3}\ \ \ & e_{2} & e_{1}\end{array}}{\left(\begin{array}{cccc} -1 & 0 & 0 & 0\\ 0 & 1 &0 & 0\\ 1 & 1 & -1 &0\\ 0& 0& 1 & -1 \end{array}\right)}
The determinant of a lower triangular matrix is the product of the entries on the diagonal line, therefore, the determinant of the new matrix is . Thus, the determinant of
is
as well. And that’s exactly what we wanted!
Summary
Well that was pretty long, we had a lot preparations to do. However, the proof itself wasn’t that long at all, and that’s so great!
Notice how such a non-trivial question – finding number of spanning trees in any graph, gets such an unexpected solution!
And by the way, I am not the only who thinks that this is a beautiful proof, the proof is based on one from THE BOOK.
Just one more thing before I’ll end this post – Notice that if , then the laplacian is;
L(K_n)=\left(\begin{array}{ccccc} n-1 & -1 & \cdots & \cdots & -1\\ -1 & n-1 & & & \vdots\\ & -1 & \ddots\\ \vdots & \vdots & & \ddots & -1\\ -1 & -1 & \cdots & -1 & n-1 \end{array}\right)
Pick any you want and calculate
. What will you get? My guess is
…