Intro to Combinatorics

UWaterloo, Math 239, Spring 2020

"Symmetry is one of the most powerful tools in the mathematics arsenal."
David Jao

Tools for Enumeration

Main interest: counting the size of various sets.

Let us recall the elementary operations that can be performed with sets. Let A,BA, B be finite sets. We have the following operations:

UNION: AB={xxA or xB}A\cup B = \{x | x \in A \text{ or } x \in B\}

INTERSECTION: AB={xxA and xB}A\cap B = \{x | x\in A \text{ and } x \in B\}

In order to count the size of a set, one of the approaches used is to build that set using unions of smaller sets, whose sizes are known. It is important, therefore, to understand the relationship between the cardinality of a set and the cardinalities of the sets that it is built from.

Most readers should be aware of the following identity: AB=A+BAB|A\cup B| = |A| + |B| - |A\cap B| which is of utmost importance when it comes to counting the size of sets as it provides a relationship between the cardinality of a set ABA\cup B and the cardinalities A|A| and B|B|. A much simpler case arises when AB=A\cap B = \emptyset i.e. when ABA\cup B is a disjoint union- the identity reduces to AB=A+B|A\cup B| = |A| + |B|.

This identity can be extended to a more general one. Given sets A1,,AnA_1, \dots, A_n with AiAj=A_i\cap A_j = \emptyset for 1i<jn1 \leq i < j \leq n, we have A1An=j=1nAj|A_1\cup \dots \cup A_n| = \sum_{j=1}^n |A_j|. This is readily proven by induction on nn. In the case of n=2n=2, the identity holds, as seen above. Now, assuming that the identity holds for some n>2n > 2, we can verify it for case n+1n+1:

A1AnAn+1=(A1An)An+1=(A1An)+An+1(A1An)An+1=i=1nAi+An+1(A1An+1)(AnAn+1)=i=1n+1Ai=i=1n+1Ai\begin{aligned} |A_1\cup \dots \cup A_n \cup A_{n+1}| &= |(A_1\cup \dots \cup A_n) \cup A_{n+1}|\\ &= |(A_1\cup \dots \cup A_n)| + |A_{n+1}| - |(A_1\cup \dots \cup A_n)\cap A_{n+1}|\\ &= \sum_{i=1}^n |A_i| + |A_{n+1}| - |(A_1 \cap A_{n+1})\cup \dots \cup (A_n\cap A_{n+1})|\\ &= \sum_{i=1}^{n+1} |A_i| - |\emptyset| = \sum_{i=1}^{n+1} |A_i| \end{aligned}

This allows us to state one of the techniques used for problem solving in this course: Recognizing that a set can be expressed as a disjoint union of other sets. The general identity derived above proves to be helpful in this technique. Now, back to set operations:

CARTESIAN PRODUCT: A×B={(a,b)aA,bB}A\times B = \{(a, b) | a \in A, b \in B\}. This object has the property that A×B=AB|A \times B| = |A|\cdot |B|.

CARTESIAN POWER: Ak={(a1,,ak)a1A,,akA}A^k = \{(a_1, \dots, a_k) | a_1\in A, \dots, a_k\in A\}. This object can be expressed alternatively as the cartesian product of AA with itself, kk times and has the property that Ak=Ak|A^k| = |A|^k.

Example: Let S={0,1}k={(z1,,zk)z1,,zk{0,1}}S=\{0,1\}^k = \{(z_1, \dots, z_k) | z_1, \dots, z_k \in \{0,1\}\}. SS is called the set of kk-tuples of bits. Using the property of the cartesian product outlined above, we see that S={0,1}k={0,1}k=2k|S| =|\{0,1\}^k| = |\{0,1\}|^k = 2^k.

Example: Let S={(z1,z2,z3){0,1}3z1+z2+z31}S=\{(z_1, z_2, z_3)\in \{0,1\}^3 | z_1 + z_2 + z_3 \leq 1\}. Find S|S|. In this case, it may be easy to write out the elements of SS and count them but we shall use the method of expressing SS as a union of 2 disjoint sets to see how it works.

We can write S=ABS=A\cup B where A={(z1,z2,z3){0,1}3z1+z2+z3=0}A=\{(z_1,z_2,z_3)\in \{0,1\}^3 | z_1 + z_2 + z_3 = 0\} and B={(z1,z2,z3){0,1}3z1+z2+z3=1}B=\{(z_1,z_2,z_3)\in \{0,1\}^3 | z_1 + z_2 + z_3 = 1\}. So clearly, AB=A\cap B = \emptyset by the definition of AA and BB. This allows us to say that S=A+B|S| = |A| + |B|.

Now, we can count: A={(0,0,0)}A = \{(0,0,0)\} so A=1|A| = 1 and B={(1,0,0),(0,1,0),(0,0,1)}B = \{(1,0,0), (0,1,0), (0,0,1)\} so that B=3|B| = 3. Therefore, S=1+3=4|S| = 1 + 3 = 4.

Binomial Coefficients

These come up a lot when it comes to enumeration and counting problems so getting comfortable with the binomial coefficients is important.

Definition: Sk,n:={Ω{1,,n}Ω=k}S_{k,n} := \{\Omega \in \{1, \dots, n\} | |\Omega| = k\}. This will prove to be an important set definition once some of its properties have been uncovered. Sk,nS_{k,n} is the set of all subsets of {1,,n}\{1, \dots, n\} of size kk.

In general, this set can be used to understand many real-life situations e.g. the number of 5-card hands from a standard deck of 52 cards is S5,52|S_{5,52}|.

THEOREM: let 0kn0 \leq k \leq n. Then Sk,n=n(n1)(nk+1)k!|S_{k,n}| = \frac{n(n-1)\cdots(n - k + 1)}{k!}.

Proof: Let Lk,n:={(a1,,ak){1,,n}kaiaj for ij}L_{k,n} := \{(a_1,\dots,a_k)\in \{1,\dots,n\}^k | a_i\neq a_j \text{ for } i\neq j \}. Lk,nL_{k,n} is the set of all kk-tuples of distinct elements of {1,,n}\{1,\dots,n\}.

What is Lk,n|L_{k,n}|? Well, since the elements in the kk-tuples are distinct, there are nn possibilities for the first one, n1n-1 possibilities for the second one and so on. In general, for the jj-th element in the tuple, there n(j1)n-(j-1) possibilities. Therefore, we get that Lk,n=n(n1)(n(k1))|L_{k,n}| = n(n-1)\cdots (n-(k-1)).

Now, we shall analyse a relationship between Lk,nL_{k,n} and Sk,nS_{k,n} which will allow us to compute Sk,n|S_{k,n}|. Observe the following:

  • Each (a1,a2,,ak)Lk,n(a_1, a_2, \dots, a_k) \in L_{k,n} can be mapped to an element {a1,,ak}Sk,n\{a_1,\dots, a_k\} \in S_{k,n}. This mapping removes the ordering from the kk-tuple.
  • Each {a1,,ak}Sk,n\{a_1,\dots, a_k\} \in S_{k,n} is then the image of k!k! different elements in Lk,nL_{k,n}.

Using these 2 observations, we may see the following:

Lk,n=ΩSk,nk!=Sk,nk!|L_{k,n}| = \sum_{\Omega \in S_{k,n}} k! = |S_{k,n}|\cdot k!

    Sk,n=Lk,nk!\implies |S_{k,n}| = \frac{|L_{k,n}|}{k!}, as desired.

Now, we may move forward to define the following object:

Binomial Coefficient: (nk):=n(n1)(nk+1)k!\binom{n}{k} := \frac{n(n-1)\cdots(n - k + 1)}{k!} for 0kn0\leq k \leq n. Note that by this defintion, it can also be seen that (nk)=n!(nk)!k!\binom{n}{k} = \frac{n!}{(n-k)!k!}. This version of the definition immediately reveals the symmetrical nature of the binomial coeffcient: (nk)=(nnk)\binom{n}{k} = \binom{n}{n-k} because of the commutativity of multiplication in the denominator. For convenience and convention, we also define (nk)=0\binom{n}{k} = 0 for k>nk > n.

Now, let Ak,n={(z1,,zn){0,1}nz1++zn=k}A_{k,n} = \{(z_1,\dots,z_n)\in \{0,1\}^n | z_1 + \dots + z_n = k\} for 0kn0\leq k \leq n. What is Ak,n|A_{k,n}|? In previous proof, the size of Sk,nS_{k,n} was determined by analysing its relationship with Lk,nL_{k,n} whose size we could easily determine: we showed Lk,nL_{k,n} is k!k! times larger than Sk,nS_{k,n}, essentially proving a 1 to many mapping Sk,nLk,nS_{k,n} \mapsto L_{k,n}. This strategy can also be used to find Ak,n|A_{k,n}| with the only difference being that this time, the relationship is 1-1 with a set of size (nk)\binom{n}{k}. We conveniently already have such a set! - Sk,n=(nk)|S_{k,n}| = \binom{n}{k}, as previously shown.

THEOREM: Ak,n=(nk)|A_{k,n}| = \binom{n}{k}

Proof: Observe the following:

  • Each (z1,,zn)Ak,n(z_1,\dots, z_n) \in A_{k,n} has the propery that there must be kk positions
    which where the element in the nn-tuple is a 1. So each of these nn-tuples can be mapped
    to a subset Ω={j{1,,n}zj=1}Sk,n\Omega = \{j\in \{1,\dots,n\} | z_j = 1\} \in S_{k,n}.
  • Each subset ΩSk,n\Omega \in S_{k,n} can be mapped to an element (z1,,zk)Ak,n(z_1,\dots, z_k)\in A_{k,n}
    where zj=1    jΩz_j = 1 \iff j\in \Omega

This shows a 1-1 mapping: for each set in Sk,nS_{k,n}, there is one corresponding nn-tuple in Ak,nA_{k,n} and for every nn-tuple in Ak,nA_{k,n}, there is one corresponding set in Sk,nS_{k,n}. Therefore, Sk,n=Ak,n|S_{k,n}| = |A_{k,n}|, as desired.

THEOREM: Binomial Theorem: Let nn be a non-negative integer. Then (1+x)n=k=0n(nk)xk(1 + x)^n = \sum_{k=0}^n \binom{n}{k} x^k.


We can write 1+x=z{0,1}xz1 + x = \sum_{z\in\{0,1\}}x^z. Using this, we can do the following:

(1+x)n=(z{0,1}xz)n=(z1{0,1}xz1)(zn{0,1}xzn)=(z1,,zn){0,1}nxz1++zn\begin{aligned} (1 + x)^n &= (\sum_{z\in \{0,1\}} x^z)^n\\ &= (\sum_{z_1\in \{0,1\}} x^{z_1})\cdots (\sum_{z_n\in \{0,1\}} x^{z_n})\\ &= \sum_{(z_1,\dots, z_n)\in \{0,1\}^n} x^{z_1+\dots + z_n} \end{aligned}

At this point, what may help tie this problem up is the realisation that the set {0,1}n=A0,nA1,nAn,n\{0,1\}^n = A_{0,n}\cup A_{1,n}\cup \dots \cup A_{n,n}, a disjoint union. This is a useful decomposition because we already know about the size of An,kA_{n,k}. Then:

(1+x)n=(z1,,zn){0,1}nxz1++zn=k=0n((z1,,zn)Ak,nxz1++zn)=k=0n((z1,,zn)Ak,nxk)=k=0nAk,nxk=k=0n(nk)xk\begin{aligned} (1 + x)^n &= \sum_{(z_1,\dots, z_n)\in \{0,1\}^n} x^{z_1+\dots + z_n}\\ &= \sum_{k=0}^n (\sum_{(z_1,\dots, z_n)\in A_{k,n}} x^{z_1+\dots + z_n})\\ &= \sum_{k=0}^n (\sum_{(z_1,\dots, z_n)\in A_{k,n}} x^k)\\ &= \sum_{k=0}^n |A_{k,n}| x^k\\ &= \sum_{k=0}^n \binom{n}{k} x^k \end{aligned}

As desired.

Example: This binomial theorem can be useful when simplifying expressions. Here is a simple example: Simplify k=0n(nk)\sum_{k=0}^n \binom{n}{k}. At first glance, this looks quite similar to the formula of the binomial theorem, just with xx set to 1. Let's use this observation:

k=1n(nk)xk=(1+x)n\sum_{k=1}^n \binom{n}{k} x^k = (1 + x)^n

Setting x=1x = 1:

k=1n(nk)=2n\sum_{k=1}^n \binom{n}{k} = 2^n. And there we have it- a simplified expression.

Exercise: Show that 0jn2(n2j)=2n1\sum_{0\leq j \leq \frac{n}{2}} \binom{n}{2j} = 2^{n-1}. Basically, that the sum of the binomial coefficient (nk)\binom{n}{k} where kk is even is half of the sum of all of the terms. Again, we may use the binomial theorem here to do most of the heavy lifting. Realise, first that

0jn2(n2j)=12(k=0n(nk)+k=0n(nk)(1)k)\sum_{0\leq j\leq \frac{n}{2}} \binom{n}{2j} = \frac{1}{2}\left(\sum_{k=0}^n \binom{n}{k} + \sum_{k= 0}^n \binom{n}{k}(-1)^k\right)

Here, the expression in question has been replaced as an average of 2 terms, both derived from the binomial theorem. The second term in the RHS sum has the job of cancelling out all of the odd terms in the sum from the first term. Then, since the even terms will have been doubly added, the sum needs to be halved, making it an average.

Next, note that the second term in the RHS sum is just the binomial theorem for xx set to 1-1, giving us (1+(1))n=0(1+(-1))^n = 0. Therefore, we get:

0jn2(n2j)=12(2n+0)=122n=2n1\begin{aligned} \sum_{0\leq j \leq \frac{n}{2}} \binom{n}{2j} &= \frac{1}{2}\left(2^n + 0 \right)\\ &= \frac{1}{2} 2^n\\ &= 2^{n-1} \end{aligned}

As desired.

Pascal's Triangle: This is an object of interest in mathematics simply because of the number of properties that emerge from it. It is formed by arranging the binomial coefficients by nn and kk into a triangle.

Pascal's Triangle with binomial coefficients==
Pascal's Triangle with numeric values

One of the first things that one may notice is that every number in the triangle is a sum of the 2 numbers above it. For example, 10 is the sum of 4 and 6 above it. Therefore, it's easy to construct the triangle, row by row but more importantly, this observation can be conjenctured as follows:

(nk)=?(n1k1)+(n1k)\binom{n}{k} =? \binom{n-1}{k-1} + \binom{n-1}{k}

We'll prove this, and similar identities in the next section.

Combinatorial Proofs

  • Essentially, these are the kinds of proofs which rely on counting
  • A key method used is establishing identities by counting a set in two different

Example: 1kn\forall 1\leq k \leq n, (nk)=(n1k1)+(n1k)\binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k}. As conjenctured from Pascal's triangle above.

Proof: We want to approach the solution by counting a set in two different ways. One way will give us the expression on the LHS and the other way will give us the expression for the RHS. Since these expressions are representing the cardinality of the same set, they must be equal.

Recall Sk,n={Ω{1,,n}Ω=k}S_{k,n} = \{\Omega \in \{1,\dots,n\}||\Omega| = k\} which has the property that Sk,n=(nk)|S_{k,n}| = \binom{n}{k}. This gives us the RHS expression. Now, we need to count the size of Sk,nS_{k,n} in a different way which would give us the LHS expression.

Let A={ΩSk,nnΩ}A=\{\Omega\in S_{k,n}| n\in \Omega\} and B={ΩSk,nnΩ}B=\{\Omega\in S_{k,n}| n\notin \Omega\}. Clearly, AB=A\cap B = \emptyset and Sk,n=ABS_{k,n} = A\cup B which, by our findings in the previous section, conventiently allows us to do Sk,n=A+B|S_{k,n}| = |A| + |B|.

Case k=1k=1: In this case, A={{n}}A=\{\{n\}\} and B={{1},,{n1}}B=\{\{1\},\dots, \{n-1\}\}. So that (n1)=S1,n=A+B=1+(n1)=(n10)+(n11)\binom{n}{1} = |S_{1,n}| = |A| + |B| = 1 + (n-1) = \binom{n-1}{0} + \binom{n-1}{1}.

Case k>1k>1: In this case, AA and BB can be viewed as follows:

A={{n}QQ{1,,n} and Q=k1}={{n}QQSk1,n1}\begin{aligned} A &= \{\{n\}\cup Q | Q \subseteq \{1,\dots, n\} \text{ and } |Q| = k-1\}\\ &= \{\{n\}\cup Q | Q\in S_{k-1, n-1}\} \end{aligned}

B={Ω{1,,n1}Ω=k}=Sk,n1\begin{aligned} B &= \{\Omega\subseteq \{1,\dots, n-1\} | |\Omega| = k\}\\ &= S_{k,n-1} \end{aligned}

Using this representation, we see that

Sk,n=A+B=Sk1,n1+Sk,n1=(n1k1)+(n1k)\begin{aligned} |S_{k,n}| &= |A| + |B|\\ &= |S_{k-1,n-1}| + |S_{k,n-1}|\\ &= \binom{n-1}{k-1} + \binom{n-1}{k} \end{aligned}

As desired.

This identity can also be proven similarly using the set Ak,nA_{k,n}.

Example: (Hockey Stick Identity). This is one which also arises from Pascal's triangle. In this case, for any number in Pascal's Triangle, the "hockey stick" of that number is the set of numbers formed by moving one row up and to the left, then following the diagonal of numbers from there to the right end of the triangle. This identity basically says that for a number in Pascal's triangle, the sum of all other numbers in its "hockey stick" is itself. Mathematically represented as:

(n+kn)=i=0k(n+i1n1)\binom{n+k}{n} = \sum_{i=0}^k \binom{n+i-1}{n-1} for n,k0n,k \geq 0.

We shall prove this by counting a set in two different ways. Take Sn,n+k={Ω{1,,n+k}Ω=n}S_{n,n+k} = \{\Omega\subseteq \{1,\dots,n+k\}||\Omega| = n\}. So that Sn,n+k=(n+kn)|S_{n,n+k}| = \binom{n+k}{n} giving us the LHS of the identity. Now, we need to count Sn,n+kS_{n,n+k} differently to get the RHS.

To start, we can partition Sn,n+kS_{n,n+k} as follows. Let Wi={ΩSn,n+klargest element of Wi is n+i}W_i = \{\Omega\in S_{n,n+k} | \text{largest element of }W_i\text{ is }n+i\} for 0ik0\leq i\leq k. So then we can represent Sn,n+k=W0WkS_{n,n+k} = W_0\cup \dots \cup W_k, a disjoint union. Before moving on, it is important to represent Wi={{n+i}QQSn1,n+i1}W_i=\{\{n+i\}\cup Q| Q\in S_{n-1, n+i-1}\}.

Sn,n+k=i=0kWi=i=0kSn1,n+i1=i=0k(n+i1n1)\begin{aligned} |S_{n,n+k}| &= \sum_{i=0}^k |W_i|\\ &= \sum_{i=0}^k |S_{n-1,n+i-1}|\\ &= \sum_{i=0}^k \binom{n+i-1}{n-1} \end{aligned}

As desired.


Bijections are important tools when it comes to combinatorial proof. They enable one to show that two sets have the same size and do so conveniently. Let us try to understand bijections. How does one know that two sets have equal size?

Example: Let S={1,,100}S=\{1,\dots,100\} be the set of integers between 1 and 100. Let T={1,4,9,16,,10000}T=\{1,4,9,16,\dots,10000\} be the set of perfect squares between 1 and 10000.

In this example, clearly, these two sets have the same number of elements. A pairing of 1 with 1, 2 with 4, 3 with 9, 4 with 16 and so on all the way to 100 with 10000 should make this clear. There are no uncounted elements and since there are 100 elements in SS, the fact that they can be paired uniquely with elements in TT (with none being left unpaired from TT or SS) shows that TT has size 100 too. Let us formalize this notion:

Definition: A bijection is a mapping f:STf:S\mapsto T satisfying:

  • x1,x2S(f(x1)=f(x2))    (x1=x2)\forall x_1,x_2\in S (f(x_1)=f(x_2)) \iff (x_1 = x_2)

This says that "For all x1,x2x_1,x_2 in SS, f(x1)=f(x2)f(x_1)=f(x_2) if and only if x1=x2x_1=x_2". A function satisfying this property is said to be one-to-one. This is because every element of SS maps to a unique element in TT under the action of ff.

  • yT(xS[f(x)=y])\forall y\in T (\exists x\in S [f(x) = y])

This says that "For every yy in TT, there exists an xx in SS which satisfies f(x)=yf(x) = y. A function satisfying this property is said to be onto. This is because the image of SS under the action of ff is the whole of TT i.e. every element in TT can be obtained as a result of the action of ff on some value in SS.

To test that a function is a bijection, it is sufficient to show that it is both one-to-one and onto as stated in the above 2 bullets.

Example (Cont'd.): We can now formalize the pairing that was discussed in the example above. Define a function f:STf:S\mapsto T like so: xS\forall x\in S, we have that f(x)=x2f(x)=x^2. Let us show that this function satisfies the above two properties to be called a bijection.

  • Verify one-to-one: If we have x1,x2Sx_1,x_2\in S such that f(x1)=f(x2)f(x_1) = f(x_2). This would mean that x12=x22x_1^2 = x_2^2. But since both x1,x2x_1,x_2 are members of SS, they must be positive. So taking the square root leaves us with x1=x2x_1 = x_2. Therefore, ff is one-to-one.

  • Verify onto: If we have yTy\in T, then by definition, it is a perfect square between 1 and 10000 therefore, its square root must lie between 1 and 100, which are the members of SS. Therefore, we can take x=ySx=\sqrt{y}\in S and then we get that f(x)=(x)2=yf(x) = (\sqrt{x})^2 = y. This shows that ff is onto.

Therefore, the function ff is a bijection. We still have to prove that this bijection actually gives us the result that the sets SS and TT have the same size.

THEOREM: Suppose S,TS,T are two finite sets and f:STf:S\mapsto T. Then:

  • If ff is one-to-one, then ST|S| \leq |T|
  • If ff is onto, then ST|S| \geq |T|
  • If ff is a bijection, then S=T|S| = |T|


  1. Suppose that ff is one-to-one. This means that every element of SS maps to a unique element in TT under the action of ff. Therefore, there must be at least S|S| elements in T|T| i.e. ST|S| \leq |T|.
  2. Suppose that ff is onto. Then, every element in TT is the image of some element in SS. Therefore, by the definition of a function (that an input will always produce the same output), this allows us to see that the size of SS is at least the size of TT i.e. ST|S| \geq |T|.
  3. If ff is a bijection, it is both one-to-one and onto. Therefore, we see that it must be the case that (ST)(ST)(|S| \leq |T|)\land (|S| \geq |T|), which leaves us with one option: S=T|S| = |T|.

As desired.

There is another method for showing that a function is a bijection. This one relies on the inverse of a function, which comes sort of naturally when speaking of bijective maps- you need a way to reverse the action of the function. Inverse functions will capture this idea and provide yet another tool for combinatorial proofs.

Inverse Functions

Definition An inverse of a function f:STf:S\mapsto T is a function
f1:TSf^{-1}: T \mapsto S which satisfies the following properties:

  • xS(f1(f(x))=x)\forall x\in S (f^{-1}(f(x)) = x).
    This says that for every element in SS, the action of ff on xx can be undone through the action of f1f^{-1}.
  • yT(f(f1(y))=y)\forall y\in T (f(f^{-1}(y)) = y). This says that for every element in TT, the action of f1f^{-1} can be undone through the action of ff.

This definition can be used to verify that a function is an inverse function. Now, let us the utility of inverse functions through this theorem:

THEOREM: A function f:STf:S\mapsto T has an inverse function if and only if it is a bijection.

Proof: (Forwards) Assume that ff has an inverse function
f1:TSf^{-1}:T\mapsto S. We need to show that ff is a bijection.

  • (Proving that ff is one-to-one) Take x1,x2Sx_1,x_2\in S such that f(x1)=f(x2)f(x_1)=f(x_2). Since f1f^{-1} is the inverse of ff, it reverses the action of ff, so we get

f1(f(x1))=f1(f(x2))    x1=x2f^{-1}(f(x_1)) = f^{-1}(f(x_2)) \implies x_1 = x_2

This shows that ff is one-to-one.

  • (Proving that ff is onto) Pick a yTy\in T. We need to find a value xSx\in S such that f(x)=yf(x) = y. Well, take x=f1(y)x = f^{-1}(y) and then:

f(x)=f(f1(y))=yf(x) = f(f^{-1}(y)) = y

This shows that ff is onto and, together with the above, that ff is a bijection.

(Backwards) Now, let us assume that ff is a bijection. Need to show that ff has an inverse function.

We can construct one as follows: given a yTy\in T, we can find an xSx\in S such that f(x)=yf(x)=y. This is possible because the assumption of bijection means that ff is onto. Keeping this in mind, we make the following definition:

Let f1:TSf^{-1}:T\mapsto S be defined for all yTy\in T, f1(y)=xf^{-1}(y) = x as stated in the passage above. To prove that f1f^{-1} is an inverse:

  • Pick an xSx\in S. Say y=f(x)y = f(x). Then: f1(y)=f1(f(x))=xf^{-1}(y) = f^{-1}(f(x)) = x.
    We see that f1f^{-1} reverses the action of ff.
  • Now pick a yTy\in T. Say x=f1(y)x=f^{-1}(y). Then: f(x)=f(f1(y))=yf(x) = f(f^{-1}(y)) = y

This completes the proof of the backwards direction and finally proves the theorem above.

Generating Series

Context: Let SS be a set. We define a "weight" function as a function w:S{0,1,2,3,}w:S\mapsto \{0,1,2,3,\dots\}.

Question: How many δS\delta \in S such that w(δ)=kw(\delta) = k?


  1. With S={a,b,c}S=\{a,b,c\} and w:S{0,1,2,3,}w:S\mapsto \{0,1,2,3,\dots\} defined by: w(a)=1w(a) = 1, w(b)=1w(b) = 1, w(c)=2w(c) = 2. There are two elements of SS with weight 1 and one element with weight 22. This is the kind of information that we wish to capture.
  2. If S={Ω{0,1,2,,n}w(Ω)=Ω}S=\{\Omega\subseteq \{0,1,2,\dots, n\}| w(\Omega) = |\Omega|\} (SS is the set of subsets of {1,2,,n}\{1,2,\dots,n\} with the weight of a subset being defined as the size of the subset). The number of elements with weight kk is easily obtained (from our previous work with Sk,nS_{k,n}) as (nk)\binom{n}{k}.

Definition: A generating function for a set SS with respect to a weight function w:S{0,1,2,}w:S\mapsto \{0,1,2,\dots\} encodes information regarding the counts of elements in SS with the weights {0,1,2,}\{0,1,2,\dots\}. It is denoted by ΦS(x)\Phi_{S}(x) and is defined as

ΦS(x)=δSxw(δ)\Phi_{S}(x) = \sum_{\delta \in S} x^{w(\delta)}

When one stares at this definition long enough, one realises that it is a way of keeping track of the set of weights that occur in SS, with respect to the weight function ww.

We can manipulate the above definition to make this clearer:

ΦS(x)=δSxw(δ)=k0((δS)(w(δ)=k))xk=k0akxk\begin{aligned} \Phi_{S}(x) &= \sum_{\delta \in S} x^{w(\delta)}\\ &= \sum_{k\geq 0}\left( \sum_{(\delta \in S) \land (w(\delta) = k)} \right) x^k\\ &= \sum_{k\geq 0} a_k x^k \end{aligned}

Where ak={δSw(δ)=k}a_k = |\{\delta \in S | w(\delta) = k\}|.

The coefficient of the xnx^n term tells us the number of elements in the set SS which have the weight nn (of course, with respect to the weight function that was used to compute the series).


  1. What is the generating series of the corresponding example 1 above?
    We compute: ΦS(x)=xw(a)+xw(b)+xw(c)=x1+x1+x2=2x+x2\Phi_S(x) = x^{w(a)} + x^{w(b)} + x^{w(c)} = x^1+x^1+x^2 = 2x + x^2. Note how even if the weight function were defined differently
    (like w(a)=w(c)=1w(a) = w(c) = 1 and w(b)=2w(b) = 2). The resulting generating series is the same. What this means is that there is information about the weights of elements of SS that is not preserved.
  2. For corresponding example 2 above, we have the generating function

ΦS(x)=k0{ΩSw(Ω)=k}xk=k0(nk)xk=k=0n(nk)xk=(1+x)n\begin{aligned} \Phi_S(x) &= \sum_{k\geq 0} |\{\Omega \in S | w(\Omega) = k\}|\cdot x^k\\ &= \sum_{k\geq 0} \binom{n}{k}x^k\\ &= \sum_{k=0}^n \binom{n}{k} x^k\\ &= (1+x)^n \end{aligned}

This may all be nice, but what is the point of encoding this information in a generating series?

Theorem: Suppose that SS is a finite set with the weight funtion
w:S{0,1,2,}w:S \mapsto \{0,1,2,\dots\}. Let ΦS\Phi_S be the generating series for SS with respect to ww. Then:

  1. S=ΦS(1)|S| = \Phi_S(1). This says that the size of the set SS is equal to its generating series evaluated at x=1x=1.
  2. δSw(δ)=dΦsdxx=1\sum_{\delta \in S} w(\delta) = \left.\frac{d\Phi_s}{dx}\right|_{x=1}. This says that the sum of the weights of the elements in SS is exactly equal to the first derivative of the generating series, evaluated at x=1x=1.

Let's see why this works:


  1. Here is a short proof for statement 1 above

ΦS(1)=δSxw(δ)x=1=δS1w(δ)=δS1=S\begin{aligned} \Phi_S(1) &= \left.\sum_{\delta \in S} x^{w(\delta)}\right|_{x=1}\\ &= \sum_{\delta\in S} 1^{w(\delta)} = \sum_{\delta \in S} 1 \\ &= |S| \end{aligned}
2. Similarly, here is a short proof for statement 2

dΦSdxx=1=δSw(δ)xw(δ)1x=1=δSw(δ)1w(δ)1=δSw(δ)\begin{aligned} \left.\frac{d\Phi_S}{dx}\right|_{x=1} &= \left.\sum_{\delta \in S} w(\delta) x^{w(\delta) - 1}\right|_{x=1}\\ &= \sum_{\delta \in S} w(\delta) 1^{w(\delta) - 1}\\ &= \sum_{\delta \in S} w(\delta)\\ \end{aligned}

Example: Consider a sequence of nn coin flips and suppose you win $1 with each time that you get heads after tails or tails after heads. Compute the expected winnings using generating series.

The above theorem allows us to get the sum of the weights of the elements in SS as the number of elements. Putting these together, we can get the average weight of an element in SS. Contextualizing to this example, we can benefit by defining a weight function in terms of the rewards and then compute the average as explained to get the expected winnings. Take a moment to contemplate how you would solve this before moving on...

We need to get a 'mathematical handle' on the occurrence of tails after heads and heads after tails. So, for a sequence of nn coin flips, we can concoct n1n-1 binary variables indicating whether the next flip in the sequence differs from the current one:

i[1,n1]\forall i\in [1,n-1] define zi={1,if (i+1)th outcome is different from ith0,otherwisez_i = \left\{\begin{array}{lr} 1, & \text{if }(i+1)\text{th outcome is different from }i\text{th}\\ 0, & \text{otherwise} \end{array}\right.

Using this string of binary variables z1,z2,,zn1z_1,z_2,\dots,z_{n-1}, one can derive useful information about the sequence of nn coin flips. We can put this all in a set S={(z1,z2,zn1)zj{0,1}}S=\{(z_1, z_2, \dots z_{n-1}) | z_j\in \{0,1\}\} with the weight function defined by w((z1,,zn1))=i=1n1ziw((z_1,\dots,z_{n-1})) = \sum_{i=1}^{n-1} z_i.

We can now view S={0,1}n1S = \{0,1\}^{n-1} and the weight function is essentially computing the number of ones in an entry from SS- each of which corresponds to a dollar reward (as per the rules of the game). Then we can compute ΦS(x)=i0aixi\Phi_S(x) = \sum_{i\geq 0} a_i x^{i}, where aia_i is the number of elements in SS which have weight ii. Fortunatly, we have alredy done this (see work on Ak,nA_{k,n})- we know that ai=(n1i)a_i = \binom{n-1}{i}. Therefore, we finally get ΦS(x)=(1+x)n1\Phi_S(x) = (1+x)^{n-1}, by the binomial theorem.

Now, we use the theorem above:

S=ΦS(1)=2n1|S| = \Phi_S(1) = 2^{n-1}

δSw(δ)=dΦSdxx=1=(n1)(1+x)n2x=1=(n1)2n2\begin{aligned} \sum_{\delta \in S} w(\delta) &= \left.\frac{d\Phi_S}{dx}\right|_{x=1}\\ &= \left.(n-1)(1+x)^{n-2}\right|_{x=1}\\ &= (n-1)2^{n-2} \end{aligned}

We can readily compute the expected winnings now:

Expected winnings=dΦSdxx=1S=(n1)2n22n1\begin{aligned} \text{Expected winnings} &= \frac{\left.\frac{d\Phi_S}{dx}\right|_{x=1}}{|S|}\\ &= \frac{(n-1)2^{n-2}}{2^{n-1}}\\ \end{aligned}

So far, we have only been considering the cases in which the set SS is finite. What happens to ΦS\Phi_S when SS is an infinite set? In this case, we have to make the following distinction:

  • If SS is finite, then ΦS(x)\Phi_S(x) is a polynomial
  • If SS is infinite, then ΦS(s)\Phi_S(s) is a formal power series (given that the coefficients aka_k are finite)

At this point, we introduce the concept of formal power series through which we shall obtain tools to deal with generating series.

Formal Power Series

A formal power series is an expression c0+c1x1+c2x2+c_0 + c_1x^1 + c_2x^2 + \dots where (c0,c1,c2,)(c_0, c_1, c_2, \dots) is a sequence of rational numbers (for our intents and purposes, the coefficients are rational numbers, but the important part is that they are finite). This is to concretize the notion of a generating series as a formal power series, and to then to apply the properties of formal power series in the realm of generating series.

Let C(x)C(x) be a formal power series (to be abbreviated as f.p.s). We can write C(x)=i=0cixiC(x) = \sum_{i=0}^\infty c_i x^i and define [xn]=cn[x^n] = c_n. So if C(x)=i=02ixiC(x) = \sum_{i=0}^\infty 2^i x^i, then [x0]C(x)=1[x^0]C(x) = 1, [x1]C(x)=2[x^1]C(x) = 2, [x2]C(x)=4[x^2]C(x) = 4 and so on. Let us now consider the operations which can be performed with formal power series.

Operations on Formal Power Series

Consider two formal power series A(x)=i=0aixiA(x) = \sum_{i=0}^\infty a_i x^i and B(x)=i=0bixiB(x) = \sum_{i=0}^\infty b_i x^i.

EQUALITY: ((A(x)=B(x))    (kN(ak=bk)))((A(x) = B(x)) \iff (\forall k \in \mathbb{N} (a_k = b_k))). This states that two formal power series are considered to be equal if and only if the coefficients aka_k and bkb_k are the same for all natural number kk.

ADDITION: A(x)+B(x)=k=0(ak+bk)xkA(x) + B(x) = \sum_{k=0}^\infty (a_k+b_k)x^k. This is a valid definition for addition since the resulting series is also a f.p.s- the coefficients (ak+bk)(a_k+b_k) are all finite rational numbers (by the declaration that AA and BB are formal power series above).

MULTIPLICATION: Defined as follows:

A(x)B(x)=j=0k=0ajbkxj+k=n=0(j=0najbnj)xn\begin{aligned} A(x)B(x) &= \sum_{j=0}^\infty \sum_{k=0}^\infty a_jb_k x^{j+k}\\ &= \sum_{n=0}^\infty \left( \sum_{j=0}^n a_j b_{n-j} \right)x^n\\ \end{aligned}

Note that we can identify the term [xn]A(x)B(x)=j=0najbnj[x^n]A(x)B(x) = \sum_{j=0}^n a_j b_{n-j} from the last expression above. This term is also a rational number since it is a finite sum of product of pairs of rational numbers.

Example: Let A(x)=(1+x)mA(x) = (1+x)^m and B(x)=k=0m2kxkB(x) = \sum_{k=0}^m 2^k x^k. Let us compute [xn]A(x)B(x)[x^n]A(x)B(x).

We know, from the binomial theorem that A(x)=j=0m(mj)xjA(x) = \sum_{j=0}^m \binom{m}{j}x^j. So [xn]A(x)=(mj)[x^n]A(x) = \binom{m}{j}. It is also clear from the summand of B(x)B(x) that [xn]B(x)=2n[x^n]B(x)=2^n. Putting these two together:

[xn]A(x)B(x)=i=0n[xi]A(x)[xni]B(x)=i=0n(mi)2ni=2ni=0n(mi)2i\begin{aligned} [x^n]A(x)B(x) &= \sum_{i=0}^n [x^i]A(x)[x^{n-i}]B(x) \\ &= \sum_{i=0}^n \binom{m}{i}2^{n-i} \\ &= 2^n \sum_{i=0}^n \binom{m}{i}2^{-i} \\ \end{aligned}

Now, notice what happens when n>mn>m- the term (mi)\binom{m}{i} will be zero for values of i>mi>m sum would effiectively run from 1 to mm.

2ni=0m(mi)2i=2ni=0m(mi)(12)i=2n(1+12)m=2nm3m\begin{aligned} 2^n\sum_{i=0}^m \binom{m}{i} 2^{-i} &= 2^n\sum_{i=0}^m \binom{m}{i} \left(\frac{1}{2}\right)^i\\ &= 2^n \left( 1+\frac{1}{2} \right)^m\\ &= 2^{n-m}3^m \end{aligned}

So we have that [xn]A(x)B(x)=2nm3m[x^n]A(x)B(x)=2^{n-m}3^m.

Example (Multiplicative Shift): This example illustrates how for all k,n0k,n\geq 0, the term [xn]xkA(x)[x^n]x^k A(x), is similar to a shift in indices. Let A(x)=j0ajxjA(x)=\sum_{j\geq 0}a_j x^j.

[xn]xkA(x)=[xn]xk(j0ajxj)=[xn]j=0ajxj+k  , let l=k+j=[xn]l=kalkxl={ank  , if nk00  , if nk<0\begin{aligned} [x^n] x^k A(x) &= [x^n] x^k \left( \sum_{j\geq 0} a_j x^j \right)\\ &= [x^n] \sum_{j= 0}^\infty a_j x^{j+k} \;\text{, let }l = k + j\\ &= [x^n] \sum_{l=k}^\infty a_{l-k} x^l\\ &= \begin{cases} a_{n-k} &\;\text{, if }n-k\geq 0\\ 0 &\;\text{, if }n-k <0 \end{cases} \end{aligned}

This is a rule/identity that will be used time and time again.

Definition (Formal Power Series Inverses): Let A(x),B(x)A(x),B(x) be formal power series satisfying A(x)B(x)=1A(x)B(x)=1, then one says that B(x)B(x) is the inverse of A(x)A(x) and writes B(x)=A(x)1B(x)=A(x)^{-1} or B(x)=1A(x)B(x)=\frac{1}{A(x)}.

Example: Let A(x)=1xA(x)=1-x, B(x)=i=0xkB(x)=\sum_{i=\geq 0} x^k.

We can write A(x)A(x), using the binomial theorem, as
A(x)=(1x)1=k=01(1k)(x)kA(x)=(1-x)^1=\sum_{k=0}^1 \binom{1}{k} (-x)^k. Using this, it can be seen that [xn]A(x)=(1)n(1n)[x^n]A(x) = (-1)^n\binom{1}{n} and this forms the sequence (1,1,0,0,0,)(1,-1,0,0,0,\dots).

Now, using the multiplication rule:

an=[xn]A(x)B(x)=j=0n[xj]A(x)[xnj]B(x)=j=0n(1)j(1j)(1)={0 if n01 if n=0\begin{aligned} a_n = [x^n]A(x)B(x) &= \sum_{j = 0}^n [x^j]A(x) [x^{n-j}]B(x)\\ &= \sum_{j = 0}^n (-1)^j\binom{1}{j} (1)\\ &= \begin{cases} 0 &\text{ if }n\geq 0\\ 1 &\text{ if }n=0 \end{cases} \end{aligned}

The only term with non-zero coefficent in A(x)B(x)A(x)B(x) is the x0x^0 term (the constant term), which has coefficent 1. Therfore, we see that A(x)B(x)=1A(x)B(x) =1 ans as such we can write i=0xi=11x\sum_{i=0}^\infty x^i = \frac{1}{1-x}, using the notation in the definiton of inverses in formal power series.

Theorem (Negative Binomial Series): As an extension of the example above, we have the following theorem: for all k=1,2,3,k=1,2,3,\dots we have that

1(1x)k=n=0(n+k1k1)xn\begin{aligned} \frac{1}{(1-x)^k} &= \sum_{n=0}^\infty \binom{n+k-1}{k-1} x^n \end{aligned}

Proof: The proof of this theorem is provided using induction on kk.

In the base case when k=1k=1, we check:

[xn]1(1x)1=[xn]j=0(j+1111)xj[xn]11x=[xn]j=0(j0)xj[xn]i=0xi=[xn]j=0xj1=1\begin{aligned} [x^n]\frac{1}{(1-x)^1} &= [x^n]\sum_{j=0}^\infty \binom{j+1-1}{1-1} x^j\\ [x^n]\frac{1}{1-x} &= [x^n]\sum_{j=0}^\infty \binom{j}{0} x^j\\ [x^n]\sum_{i=0}^\infty x^i &= [x^n]\sum_{j=0}^\infty x^j\\ 1 &= 1 \end{aligned}

The base case holds. Now, onto the inductive step where it is assumed that the theorem holds for some kk. Now, to show that it holds for k+1k+1.

1(1x)k+1=1(1x)k1(1x)=(n=0(n+k1k1)xn)(i=0xi)\begin{aligned} \frac{1}{(1-x)^{k+1}} &= \frac{1}{(1-x)^k} \frac{1}{(1-x)}\\ &= \left( \sum_{n=0}^\infty \binom{n+k-1}{k-1} x^n \right)\cdot \left( \sum_{i=0}^\infty x^i \right) \end{aligned}

Now that we have 1(1x)k+1\frac{1}{(1-x)^{k+1}} expressed as a product of two power series, we can use the multiplication rule to compute the coefficient for the xnx^n term.

[xn]1(1x)k+1=[xn](j=0(j+k1k1)xj)(i=0xi)=j=0n(j+k1k1)   by the Multiplication Rule=(n+kk)   by the Hockey Stick Identity\begin{aligned} [x^n]\frac{1}{(1-x)^{k+1}} &= [x^n]\left( \sum_{j=0}^\infty \binom{j+k-1}{k-1} x^j \right)\cdot \left( \sum_{i=0}^\infty x^i \right)\\ &= \sum_{j=0}^n \binom{j+k-1}{k-1} \;\text{ by the Multiplication Rule}\\ &= \binom{n+k}{k} \;\text{ by the Hockey Stick Identity} \end{aligned}

The inductive step holds too and therefore so does the theorem for all k=1,2,k=1,2,\dots. This completes the proof. \blacksquare

This theorem is also an important one and should be familiarized with as it can help with quick conversions to and from power series.

NOTE: There are conditions under which formal power series have inverses. Here's an example highlighting a case when the f.p.s. has no inverse. Consider A(x)=xA(x)=x, which is claimed to have no inverse. Let us proove this.

The proof provided here is by contradiction. Suppose A(x)A(x) did have an inverse. Then, there would exist another f.p.s. B(x)=i0bixiB(x)=\sum_{i\geq 0}b_i x^i such that A(x)B(x)=xB(x)=1A(x)B(x)=xB(x)=1. This looks like a good place to use the "Multiplicative shift" mentioned before to get:

[xn]A(x)B(x)=[xn]xB(x)={bn1   if n00   if n=0\begin{aligned} [x^n]A(x)B(x) &= [x^n]xB(x)\\ &= \begin{cases} b_{n-1} &\;\text{ if }n\geq 0\\ 0 &\;\text{ if }n=0 \end{cases} \end{aligned}

Note that if B(x)B(x) is an inverse for A(x)A(x), then [x0]A(x)B(x)=1[x^0]A(x)B(x)=1, but with the result above, we see that [x0]A(x)B(x)=0[x^0]A(x)B(x)=0, which is a contradiction. The starting assumption, that A(x)A(x) has an inverse must have been wrong then. Therefore, A(x)=xA(x)=x has no inverse.

Theorem (Condition for Inverses): An f.p.s. A(x)A(x) has an inverse iff
[x0]A(x)0[x^0]A(x)\neq 0.

Proof: [Forwards] First, assume that [x0]A(x)=an=0[x^0]A(x)=a_n=0. Now, we need to show that A(x)A(x) has no inverse (This is proving the contrapositive which is logically equivalent).

Say B(x)B(x) is another f.p.s. with [x0]B(x)=b0[x^0]B(x)=b_0.
Let us check [x0]A(x)B(x)=a0b0=0[x^0]A(x)B(x) = a_0b_0 = 0 since a0=0a_0 = 0. This means that A(x)B(x)A(x)B(x) can never be equal to 11, no matter the f.p.s. B(x)B(x). Therefore, A(x)A(x) has no inverse. This proves that if A(x)A(x) has an inverse then a00a_0\neq 0.

[Backwards] Now, suppose that [x0]A(x)0[x^0]A(x)\neq 0. We need to show that A(x)A(x) has an inverse. To do this, we let B(x)=i0bixiB(x)=\sum_{i\geq 0}b_i x^i and show that A(x)B(x)=1A(x)B(x)=1 has a single solution. We can equivalenty solve for:

[xn]A(x)B(x)=j=0najbnj\begin{aligned} [x^n]A(x)B(x) &= \sum_{j=0}^n a_j b_{n-j} \end{aligned}

If n=0n=0, then [x0]A(x)B(x)=a0b0[x^0]A(x)B(x) = a_0b_0. Comparing this to the [x0][x^0] coefficient on the Right Hand Side (RHS) of the equation A(x)B(x)=1A(x)B(x)=1, we get that a0b0=1a_0b_0 = 1, therefore b0=1a0b_0 = \frac{1}{a_0}. This is a valid quotient since a0a_0 is non-zero by assumption.

If n>0n>0, then [xn]A(x)B(x)=a0bn+j=1najbnj[x^n]A(x)B(x) = a_0b_n + \sum_{j=1}^n a_jb_{n-j}. All coefficents on the RHS of the equation are zero (except the constant term) so we can solve for bnb_n from above as bn=1a0j=1najbnjb_n = \frac{-1}{a_0}\sum_{j=1}^n a_jb_{n-j}