Intro to Combinatorics

UWaterloo, Math 239, Spring 2020

"Symmetry is one of the most powerful tools in the mathematics arsenal."
David Jao


Tools for Enumeration

Main interest: counting the size of various sets.

Let us recall the elementary operations that can be performed with sets. Let A,BA, B be finite sets. We have the following operations:

UNION: AB={xxA or xB}A\cup B = \{x | x \in A \text{ or } x \in B\}

INTERSECTION: AB={xxA and xB}A\cap B = \{x | x\in A \text{ and } x \in B\}

In order to count the size of a set, one of the approaches used is to build that set using unions of smaller sets, whose sizes are known. It is important, therefore, to understand the relationship between the cardinality of a set and the cardinalities of the sets that it is built from.

Most readers should be aware of the following identity: AB=A+BAB|A\cup B| = |A| + |B| - |A\cap B| which is of utmost importance when it comes to counting the size of sets as it provides a relationship between the cardinality of a set ABA\cup B and the cardinalities A|A| and B|B|. A much simpler case arises when AB=A\cap B = \emptyset i.e. when ABA\cup B is a disjoint union- the identity reduces to AB=A+B|A\cup B| = |A| + |B|.

This identity can be extended to a more general one. Given sets A1,,AnA_1, \dots, A_n with AiAj=A_i\cap A_j = \emptyset for 1i<jn1 \leq i < j \leq n, we have A1An=j=1nAj|A_1\cup \dots \cup A_n| = \sum_{j=1}^n |A_j|. This is readily proven by induction on nn. In the case of n=2n=2, the identity holds, as seen above. Now, assuming that the identity holds for some n>2n > 2, we can verify it for case n+1n+1:

A1AnAn+1=(A1An)An+1=(A1An)+An+1(A1An)An+1=i=1nAi+An+1(A1An+1)(AnAn+1)=i=1n+1Ai=i=1n+1Ai\begin{aligned} |A_1\cup \dots \cup A_n \cup A_{n+1}| &= |(A_1\cup \dots \cup A_n) \cup A_{n+1}|\\ &= |(A_1\cup \dots \cup A_n)| + |A_{n+1}| - |(A_1\cup \dots \cup A_n)\cap A_{n+1}|\\ &= \sum_{i=1}^n |A_i| + |A_{n+1}| - |(A_1 \cap A_{n+1})\cup \dots \cup (A_n\cap A_{n+1})|\\ &= \sum_{i=1}^{n+1} |A_i| - |\emptyset| = \sum_{i=1}^{n+1} |A_i| \end{aligned}

This allows us to state one of the techniques used for problem solving in this course: Recognizing that a set can be expressed as a disjoint union of other sets. The general identity derived above proves to be helpful in this technique. Now, back to set operations:

CARTESIAN PRODUCT: A×B={(a,b)aA,bB}A\times B = \{(a, b) | a \in A, b \in B\}. This object has the property that A×B=AB|A \times B| = |A|\cdot |B|.

CARTESIAN POWER: Ak={(a1,,ak)a1A,,akA}A^k = \{(a_1, \dots, a_k) | a_1\in A, \dots, a_k\in A\}. This object can be expressed alternatively as the cartesian product of AA with itself, kk times and has the property that Ak=Ak|A^k| = |A|^k.

Example: Let S={0,1}k={(z1,,zk)z1,,zk{0,1}}S=\{0,1\}^k = \{(z_1, \dots, z_k) | z_1, \dots, z_k \in \{0,1\}\}. SS is called the set of kk-tuples of bits. Using the property of the cartesian product outlined above, we see that S={0,1}k={0,1}k=2k|S| =|\{0,1\}^k| = |\{0,1\}|^k = 2^k.

Example: Let S={(z1,z2,z3){0,1}3z1+z2+z31}S=\{(z_1, z_2, z_3)\in \{0,1\}^3 | z_1 + z_2 + z_3 \leq 1\}. Find S|S|. In this case, it may be easy to write out the elements of SS and count them but we shall use the method of expressing SS as a union of 2 disjoint sets to see how it works.

We can write S=ABS=A\cup B where A={(z1,z2,z3){0,1}3z1+z2+z3=0}A=\{(z_1,z_2,z_3)\in \{0,1\}^3 | z_1 + z_2 + z_3 = 0\} and B={(z1,z2,z3){0,1}3z1+z2+z3=1}B=\{(z_1,z_2,z_3)\in \{0,1\}^3 | z_1 + z_2 + z_3 = 1\}. So clearly, AB=A\cap B = \emptyset by the definition of AA and BB. This allows us to say that S=A+B|S| = |A| + |B|.

Now, we can count: A={(0,0,0)}A = \{(0,0,0)\} so A=1|A| = 1 and B={(1,0,0),(0,1,0),(0,0,1)}B = \{(1,0,0), (0,1,0), (0,0,1)\} so that B=3|B| = 3. Therefore, S=1+3=4|S| = 1 + 3 = 4.

Binomial Coefficients

These come up a lot when it comes to enumeration and counting problems so getting comfortable with the binomial coefficients is important.

Definition: Sk,n:={Ω{1,,n}Ω=k}S_{k,n} := \{\Omega \in \{1, \dots, n\} | |\Omega| = k\}. This will prove to be an important set definition once some of its properties have been uncovered. Sk,nS_{k,n} is the set of all subsets of {1,,n}\{1, \dots, n\} of size kk.

In general, this set can be used to understand many real-life situations e.g. the number of 5-card hands from a standard deck of 52 cards is S5,52|S_{5,52}|.

THEOREM: let 0kn0 \leq k \leq n. Then Sk,n=n(n1)(nk+1)k!|S_{k,n}| = \frac{n(n-1)\cdots(n - k + 1)}{k!}.

Proof: Let Lk,n:={(a1,,ak){1,,n}kaiaj for ij}L_{k,n} := \{(a_1,\dots,a_k)\in \{1,\dots,n\}^k | a_i\neq a_j \text{ for } i\neq j \}. Lk,nL_{k,n} is the set of all kk-tuples of distinct elements of {1,,n}\{1,\dots,n\}.

What is Lk,n|L_{k,n}|? Well, since the elements in the kk-tuples are distinct, there are nn possibilities for the first one, n1n-1 possibilities for the second one and so on. In general, for the jj-th element in the tuple, there n(j1)n-(j-1) possibilities. Therefore, we get that Lk,n=n(n1)(n(k1))|L_{k,n}| = n(n-1)\cdots (n-(k-1)).

Now, we shall analyse a relationship between Lk,nL_{k,n} and Sk,nS_{k,n} which will allow us to compute Sk,n|S_{k,n}|. Observe the following:

  • Each (a1,a2,,ak)Lk,n(a_1, a_2, \dots, a_k) \in L_{k,n} can be mapped to an element {a1,,ak}Sk,n\{a_1,\dots, a_k\} \in S_{k,n}. This mapping removes the ordering from the kk-tuple.
  • Each {a1,,ak}Sk,n\{a_1,\dots, a_k\} \in S_{k,n} is then the image of k!k! different elements in Lk,nL_{k,n}.

Using these 2 observations, we may see the following:

Lk,n=ΩSk,nk!=Sk,nk!|L_{k,n}| = \sum_{\Omega \in S_{k,n}} k! = |S_{k,n}|\cdot k!

    Sk,n=Lk,nk!\implies |S_{k,n}| = \frac{|L_{k,n}|}{k!}, as desired.

Now, we may move forward to define the following object:

Binomial Coefficient: (nk):=n(n1)(nk+1)k!\binom{n}{k} := \frac{n(n-1)\cdots(n - k + 1)}{k!} for 0kn0\leq k \leq n. Note that by this defintion, it can also be seen that (nk)=n!(nk)!k!\binom{n}{k} = \frac{n!}{(n-k)!k!}. This version of the definition immediately reveals the symmetrical nature of the binomial coeffcient: (nk)=(nnk)\binom{n}{k} = \binom{n}{n-k} because of the commutativity of multiplication in the denominator. For convenience and convention, we also define (nk)=0\binom{n}{k} = 0 for k>nk > n.

Now, let Ak,n={(z1,,zn){0,1}nz1++zn=k}A_{k,n} = \{(z_1,\dots,z_n)\in \{0,1\}^n | z_1 + \dots + z_n = k\} for 0kn0\leq k \leq n. What is Ak,n|A_{k,n}|? In previous proof, the size of Sk,nS_{k,n} was determined by analysing its relationship with Lk,nL_{k,n} whose size we could easily determine: we showed Lk,nL_{k,n} is k!k! times larger than Sk,nS_{k,n}, essentially proving a 1 to many mapping Sk,nLk,nS_{k,n} \mapsto L_{k,n}. This strategy can also be used to find Ak,n|A_{k,n}| with the only difference being that this time, the relationship is 1-1 with a set of size (nk)\binom{n}{k}. We conveniently already have such a set! - Sk,n=(nk)|S_{k,n}| = \binom{n}{k}, as previously shown.

THEOREM: Ak,n=(nk)|A_{k,n}| = \binom{n}{k}

Proof: Observe the following:

  • Each (z1,,zn)Ak,n(z_1,\dots, z_n) \in A_{k,n} has the propery that there must be kk positions
    which where the element in the nn-tuple is a 1. So each of these nn-tuples can be mapped
    to a subset Ω={j{1,,n}zj=1}Sk,n\Omega = \{j\in \{1,\dots,n\} | z_j = 1\} \in S_{k,n}.
  • Each subset ΩSk,n\Omega \in S_{k,n} can be mapped to an element (z1,,zk)Ak,n(z_1,\dots, z_k)\in A_{k,n}
    where zj=1    jΩz_j = 1 \iff j\in \Omega

This shows a 1-1 mapping: for each set in Sk,nS_{k,n}, there is one corresponding nn-tuple in Ak,nA_{k,n} and for every nn-tuple in Ak,nA_{k,n}, there is one corresponding set in Sk,nS_{k,n}. Therefore, Sk,n=Ak,n|S_{k,n}| = |A_{k,n}|, as desired.

THEOREM: Binomial Theorem: Let nn be a non-negative integer. Then (1+x)n=k=0n(nk)xk(1 + x)^n = \sum_{k=0}^n \binom{n}{k} x^k.

Proof:

We can write 1+x=z{0,1}xz1 + x = \sum_{z\in\{0,1\}}x^z. Using this, we can do the following:

(1+x)n=(z{0,1}xz)n=(z1{0,1}xz1)(zn{0,1}xzn)=(z1,,zn){0,1}nxz1++zn\begin{aligned} (1 + x)^n &= (\sum_{z\in \{0,1\}} x^z)^n\\ &= (\sum_{z_1\in \{0,1\}} x^{z_1})\cdots (\sum_{z_n\in \{0,1\}} x^{z_n})\\ &= \sum_{(z_1,\dots, z_n)\in \{0,1\}^n} x^{z_1+\dots + z_n} \end{aligned}

At this point, what may help tie this problem up is the realisation that the set {0,1}n=A0,nA1,nAn,n\{0,1\}^n = A_{0,n}\cup A_{1,n}\cup \dots \cup A_{n,n}, a disjoint union. This is a useful decomposition because we already know about the size of An,kA_{n,k}. Then:

(1+x)n=(z1,,zn){0,1}nxz1++zn=k=0n((z1,,zn)Ak,nxz1++zn)=k=0n((z1,,zn)Ak,nxk)=k=0nAk,nxk=k=0n(nk)xk\begin{aligned} (1 + x)^n &= \sum_{(z_1,\dots, z_n)\in \{0,1\}^n} x^{z_1+\dots + z_n}\\ &= \sum_{k=0}^n (\sum_{(z_1,\dots, z_n)\in A_{k,n}} x^{z_1+\dots + z_n})\\ &= \sum_{k=0}^n (\sum_{(z_1,\dots, z_n)\in A_{k,n}} x^k)\\ &= \sum_{k=0}^n |A_{k,n}| x^k\\ &= \sum_{k=0}^n \binom{n}{k} x^k \end{aligned}

As desired.

Example: This binomial theorem can be useful when simplifying expressions. Here is a simple example: Simplify k=0n(nk)\sum_{k=0}^n \binom{n}{k}. At first glance, this looks quite similar to the formula of the binomial theorem, just with xx set to 1. Let's use this observation:

k=1n(nk)xk=(1+x)n\sum_{k=1}^n \binom{n}{k} x^k = (1 + x)^n

Setting x=1x = 1:

k=1n(nk)=2n\sum_{k=1}^n \binom{n}{k} = 2^n. And there we have it- a simplified expression.

Exercise: Show that 0jn2(n2j)=2n1\sum_{0\leq j \leq \frac{n}{2}} \binom{n}{2j} = 2^{n-1}. Basically, that the sum of the binomial coefficient (nk)\binom{n}{k} where kk is even is half of the sum of all of the terms. Again, we may use the binomial theorem here to do most of the heavy lifting. Realise, first that

0jn2(n2j)=12(k=0n(nk)+k=0n(nk)(1)k)\sum_{0\leq j\leq \frac{n}{2}} \binom{n}{2j} = \frac{1}{2}\left(\sum_{k=0}^n \binom{n}{k} + \sum_{k= 0}^n \binom{n}{k}(-1)^k\right)

Here, the expression in question has been replaced as an average of 2 terms, both derived from the binomial theorem. The second term in the RHS sum has the job of cancelling out all of the odd terms in the sum from the first term. Then, since the even terms will have been doubly added, the sum needs to be halved, making it an average.

Next, note that the second term in the RHS sum is just the binomial theorem for xx set to 1-1, giving us (1+(1))n=0(1+(-1))^n = 0. Therefore, we get:

0jn2(n2j)=12(2n+0)=122n=2n1\begin{aligned} \sum_{0\leq j \leq \frac{n}{2}} \binom{n}{2j} &= \frac{1}{2}\left(2^n + 0 \right)\\ &= \frac{1}{2} 2^n\\ &= 2^{n-1} \end{aligned}

As desired.

Pascal's Triangle: This is an object of interest in mathematics simply because of the number of properties that emerge from it. It is formed by arranging the binomial coefficients by nn and kk into a triangle.

Pascal's Triangle with binomial coefficients==
Pascal's Triangle with numeric values

One of the first things that one may notice is that every number in the triangle is a sum of the 2 numbers above it. For example, 10 is the sum of 4 and 6 above it. Therefore, it's easy to construct the triangle, row by row but more importantly, this observation can be conjenctured as follows:

(nk)=?(n1k1)+(n1k)\binom{n}{k} =? \binom{n-1}{k-1} + \binom{n-1}{k}

We'll prove this, and similar identities in the next section.

Combinatorial Proofs

  • Essentially, these are the kinds of proofs which rely on counting
  • A key method used is establishing identities by counting a set in two different
    ways.

Example: 1kn\forall 1\leq k \leq n, (nk)=(n1k1)+(n1k)\binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k}. As conjenctured from Pascal's triangle above.

Proof: We want to approach the solution by counting a set in two different ways. One way will give us the expression on the LHS and the other way will give us the expression for the RHS. Since these expressions are representing the cardinality of the same set, they must be equal.

Recall Sk,n={Ω{1,,n}Ω=k}S_{k,n} = \{\Omega \in \{1,\dots,n\}||\Omega| = k\} which has the property that Sk,n=(nk)|S_{k,n}| = \binom{n}{k}. This gives us the RHS expression. Now, we need to count the size of Sk,nS_{k,n} in a different way which would give us the LHS expression.

Let A={ΩSk,nnΩ}A=\{\Omega\in S_{k,n}| n\in \Omega\} and B={ΩSk,nnΩ}B=\{\Omega\in S_{k,n}| n\notin \Omega\}. Clearly, AB=A\cap B = \emptyset and Sk,n=ABS_{k,n} = A\cup B which, by our findings in the previous section, conventiently allows us to do Sk,n=A+B|S_{k,n}| = |A| + |B|.

Case k=1k=1: In this case, A={{n}}A=\{\{n\}\} and B={{1},,{n1}}B=\{\{1\},\dots, \{n-1\}\}. So that (n1)=S1,n=A+B=1+(n1)=(n10)+(n11)\binom{n}{1} = |S_{1,n}| = |A| + |B| = 1 + (n-1) = \binom{n-1}{0} + \binom{n-1}{1}.

Case k>1k>1: In this case, AA and BB can be viewed as follows:

A={{n}QQ{1,,n} and Q=k1}={{n}QQSk1,n1}\begin{aligned} A &= \{\{n\}\cup Q | Q \subseteq \{1,\dots, n\} \text{ and } |Q| = k-1\}\\ &= \{\{n\}\cup Q | Q\in S_{k-1, n-1}\} \end{aligned}

B={Ω{1,,n1}Ω=k}=Sk,n1\begin{aligned} B &= \{\Omega\subseteq \{1,\dots, n-1\} | |\Omega| = k\}\\ &= S_{k,n-1} \end{aligned}

Using this representation, we see that

Sk,n=A+B=Sk1,n1+Sk,n1=(n1k1)+(n1k)\begin{aligned} |S_{k,n}| &= |A| + |B|\\ &= |S_{k-1,n-1}| + |S_{k,n-1}|\\ &= \binom{n-1}{k-1} + \binom{n-1}{k} \end{aligned}

As desired.

This identity can also be proven similarly using the set Ak,nA_{k,n}.

Example: (Hockey Stick Identity). This is one which also arises from Pascal's triangle. In this case, for any number in Pascal's Triangle, the "hockey stick" of that number is the set of numbers formed by moving one row up and to the left, then following the diagonal of numbers from there to the right end of the triangle. This identity basically says that for a number in Pascal's triangle, the sum of all other numbers in its "hockey stick" is itself. Mathematically represented as:

(n+kn)=i=0k(n+i1n1)\binom{n+k}{n} = \sum_{i=0}^k \binom{n+i-1}{n-1} for n,k0n,k \geq 0.

We shall prove this by counting a set in two different ways. Take Sn,n+k={Ω{1,,n+k}Ω=n}S_{n,n+k} = \{\Omega\subseteq \{1,\dots,n+k\}||\Omega| = n\}. So that Sn,n+k=(n+kn)|S_{n,n+k}| = \binom{n+k}{n} giving us the LHS of the identity. Now, we need to count Sn,n+kS_{n,n+k} differently to get the RHS.

To start, we can partition Sn,n+kS_{n,n+k} as follows. Let Wi={ΩSn,n+klargest element of Wi is n+i}W_i = \{\Omega\in S_{n,n+k} | \text{largest element of }W_i\text{ is }n+i\} for 0ik0\leq i\leq k. So then we can represent Sn,n+k=W0WkS_{n,n+k} = W_0\cup \dots \cup W_k, a disjoint union. Before moving on, it is important to represent Wi={{n+i}QQSn1,n+i1}W_i=\{\{n+i\}\cup Q| Q\in S_{n-1, n+i-1}\}.

Sn,n+k=i=0kWi=i=0kSn1,n+i1=i=0k(n+i1n1)\begin{aligned} |S_{n,n+k}| &= \sum_{i=0}^k |W_i|\\ &= \sum_{i=0}^k |S_{n-1,n+i-1}|\\ &= \sum_{i=0}^k \binom{n+i-1}{n-1} \end{aligned}

As desired.

Bijections

Bijections are important tools when it comes to combinatorial proof. They enable one to show that two sets have the same size and do so conveniently. Let us try to understand bijections. How does one know that two sets have equal size?

Example: Let S={1,,100}S=\{1,\dots,100\} be the set of integers between 1 and 100. Let T={1,4,9,16,,10000}T=\{1,4,9,16,\dots,10000\} be the set of perfect squares between 1 and 10000.

In this example, clearly, these two sets have the same number of elements. A pairing of 1 with 1, 2 with 4, 3 with 9, 4 with 16 and so on all the way to 100 with 10000 should make this clear. There are no uncounted elements and since there are 100 elements in SS, the fact that they can be paired uniquely with elements in TT (with none being left unpaired from TT or SS) shows that TT has size 100 too. Let us formalize this notion:

Definition: A bijection is a mapping f:STf:S\mapsto T satisfying:

  • x1,x2S(f(x1)=f(x2))    (x1=x2)\forall x_1,x_2\in S (f(x_1)=f(x_2)) \iff (x_1 = x_2)

This says that "For all x1,x2x_1,x_2 in SS, f(x1)=f(x2)f(x_1)=f(x_2) if and only if x1=x2x_1=x_2". A function satisfying this property is said to be one-to-one. This is because every element of SS maps to a unique element in TT under the action of ff.

  • yT(xS[f(x)=y])\forall y\in T (\exists x\in S [f(x) = y])

This says that "For every yy in TT, there exists an xx in SS which satisfies f(x)=yf(x) = y. A function satisfying this property is said to be onto. This is because the image of SS under the action of ff is the whole of TT i.e. every element in TT can be obtained as a result of the action of ff on some value in SS.

To test that a function is a bijection, it is sufficient to show that it is both one-to-one and onto as stated in the above 2 bullets.

Example (Cont'd.): We can now formalize the pairing that was discussed in the example above. Define a function f:STf:S\mapsto T like so: xS\forall x\in S, we have that f(x)=x2f(x)=x^2. Let us show that this function satisfies the above two properties to be called a bijection.

  • Verify one-to-one: If we have x1,x2Sx_1,x_2\in S such that f(x1)=f(x2)f(x_1) = f(x_2). This would mean that x12=x22x_1^2 = x_2^2. But since both x1,x2x_1,x_2 are members of SS, they must be positive. So taking the square root leaves us with x1=x2x_1 = x_2. Therefore, ff is one-to-one.

  • Verify onto: If we have yTy\in T, then by definition, it is a perfect square between 1 and 10000 therefore, its square root must lie between 1 and 100, which are the members of SS. Therefore, we can take x=ySx=\sqrt{y}\in S and then we get that f(x)=(x)2=yf(x) = (\sqrt{x})^2 = y. This shows that ff is onto.

Therefore, the function ff is a bijection. We still have to prove that this bijection actually gives us the result that the sets SS and TT have the same size.

THEOREM: Suppose S,TS,T are two finite sets and f:STf:S\mapsto T. Then:

  • If ff is one-to-one, then ST|S| \leq |T|
  • If ff is onto, then ST|S| \geq |T|
  • If ff is a bijection, then S=T|S| = |T|

Proof:

  1. Suppose that ff is one-to-one. This means that every element of SS maps to a unique element in TT under the action of ff. Therefore, there must be at least S|S| elements in T|T| i.e. ST|S| \leq |T|.
  2. Suppose that ff is onto. Then, every element in TT is the image of some element in SS. Therefore, by the definition of a function (that an input will always produce the same output), this allows us to see that the size of SS is at least the size of TT i.e. ST|S| \geq |T|.
  3. If ff is a bijection, it is both one-to-one and onto. Therefore, we see that it must be the case that (ST)(ST)(|S| \leq |T|)\land (|S| \geq |T|), which leaves us with one option: S=T|S| = |T|.

As desired.

There is another method for showing that a function is a bijection. This one relies on the inverse of a function, which comes sort of naturally when speaking of bijective maps- you need a way to reverse the action of the function. Inverse functions will capture this idea and provide yet another tool for combinatorial proofs.

Inverse Functions

Definition An inverse of a function f:STf:S\mapsto T is a function
f1:TSf^{-1}: T \mapsto S which satisfies the following properties:

  • xS(f1(f(x))=x)\forall x\in S (f^{-1}(f(x)) = x).
    This says that for every element in SS, the action of ff on xx can be undone through the action of f1f^{-1}.
  • yT(f(f1(y))=y)\forall y\in T (f(f^{-1}(y)) = y). This says that for every element in TT, the action of f1f^{-1} can be undone through the action of ff.

This definition can be used to verify that a function is an inverse function. Now, let us the utility of inverse functions through this theorem:

THEOREM: A function f:STf:S\mapsto T has an inverse function if and only if it is a bijection.

Proof: (Forwards) Assume that ff has an inverse function
f1:TSf^{-1}:T\mapsto S. We need to show that ff is a bijection.

  • (Proving that ff is one-to-one) Take x1,x2Sx_1,x_2\in S such that f(x1)=f(x2)f(x_1)=f(x_2). Since f1f^{-1} is the inverse of ff, it reverses the action of ff, so we get

f1(f(x1))=f1(f(x2))    x1=x2f^{-1}(f(x_1)) = f^{-1}(f(x_2)) \implies x_1 = x_2

This shows that ff is one-to-one.

  • (Proving that ff is onto) Pick a yTy\in T. We need to find a value xSx\in S such that f(x)=yf(x) = y. Well, take x=f1(y)x = f^{-1}(y) and then:

f(x)=f(f1(y))=yf(x) = f(f^{-1}(y)) = y

This shows that ff is onto and, together with the above, that ff is a bijection.

(Backwards) Now, let us assume that ff is a bijection. Need to show that ff has an inverse function.

We can construct one as follows: given a yTy\in T, we can find an xSx\in S such that f(x)=yf(x)=y. This is possible because the assumption of bijection means that ff is onto. Keeping this in mind, we make the following definition:

Let f1:TSf^{-1}:T\mapsto S be defined for all yTy\in T, f1(y)=xf^{-1}(y) = x as stated in the passage above. To prove that f1f^{-1} is an inverse:

  • Pick an xSx\in S. Say y=f(x)y = f(x). Then: f1(y)=f1(f(x))=xf^{-1}(y) = f^{-1}(f(x)) = x.
    We see that f1f^{-1} reverses the action of ff.
  • Now pick a yTy\in T. Say x=f1(y)x=f^{-1}(y). Then: f(x)=f(f1(y))=yf(x) = f(f^{-1}(y)) = y

This completes the proof of the backwards direction and finally proves the theorem above.

Generating Series

Context: Let SS be a set. We define a "weight" function as a function w:S{0,1,2,3,}w:S\mapsto \{0,1,2,3,\dots\}.

Question: How many δS\delta \in S such that w(δ)=kw(\delta) = k?

Examples:

  1. With S={a,b,c}S=\{a,b,c\} and w:S{0,1,2,3,}w:S\mapsto \{0,1,2,3,\dots\} defined by: w(a)=1w(a) = 1, w(b)=1w(b) = 1, w(c)=2w(c) = 2. There are two elements of SS with weight 1 and one element with weight 22. This is the kind of information that we wish to capture.
  2. If S={Ω{0,1,2,,n}w(Ω)=Ω}S=\{\Omega\subseteq \{0,1,2,\dots, n\}| w(\Omega) = |\Omega|\} (SS is the set of subsets of {1,2,,n}\{1,2,\dots,n\} with the weight of a subset being defined as the size of the subset). The number of elements with weight kk is easily obtained (from our previous work with Sk,nS_{k,n}) as (nk)\binom{n}{k}.

Definition: A generating function for a set SS with respect to a weight function w:S{0,1,2,}w:S\mapsto \{0,1,2,\dots\} encodes information regarding the counts of elements in SS with the weights {0,1,2,}\{0,1,2,\dots\}. It is denoted by ΦS(x)\Phi_{S}(x) and is defined as

ΦS(x)=δSxw(δ)\Phi_{S}(x) = \sum_{\delta \in S} x^{w(\delta)}

When one stares at this definition long enough, one realises that it is a way of keeping track of the set of weights that occur in SS, with respect to the weight function ww.

We can manipulate the above definition to make this clearer:

ΦS(x)=δSxw(δ)=k0((δS)(w(δ)=k))xk=k0akxk\begin{aligned} \Phi_{S}(x) &= \sum_{\delta \in S} x^{w(\delta)}\\ &= \sum_{k\geq 0}\left( \sum_{(\delta \in S) \land (w(\delta) = k)} \right) x^k\\ &= \sum_{k\geq 0} a_k x^k \end{aligned}

Where ak={δSw(δ)=k}a_k = |\{\delta \in S | w(\delta) = k\}|.

The coefficient of the xnx^n term tells us the number of elements in the set SS which have the weight nn (of course, with respect to the weight function that was used to compute the series).

Examples:

  1. What is the generating series of the corresponding example 1 above?
    We compute: ΦS(x)=xw(a)+xw(b)+xw(c)=x1+x1+x2=2x+x2\Phi_S(x) = x^{w(a)} + x^{w(b)} + x^{w(c)} = x^1+x^1+x^2 = 2x + x^2. Note how even if the weight function were defined differently
    (like w(a)=w(c)=1w(a) = w(c) = 1 and w(b)=2w(b) = 2). The resulting generating series is the same. What this means is that there is information about the weights of elements of SS that is not preserved.
  2. For corresponding example 2 above, we have the generating function

ΦS(x)=k0{ΩSw(Ω)=k}xk=k0(nk)xk=k=0n(nk)xk=(1+x)n\begin{aligned} \Phi_S(x) &= \sum_{k\geq 0} |\{\Omega \in S | w(\Omega) = k\}|\cdot x^k\\ &= \sum_{k\geq 0} \binom{n}{k}x^k\\ &= \sum_{k=0}^n \binom{n}{k} x^k\\ &= (1+x)^n \end{aligned}

This may all be nice, but what is the point of encoding this information in a generating series?

Theorem: Suppose that SS is a finite set with the weight funtion
w:S{0,1,2,}w:S \mapsto \{0,1,2,\dots\}. Let ΦS\Phi_S be the generating series for SS with respect to ww. Then:

  1. S=ΦS(1)|S| = \Phi_S(1). This says that the size of the set SS is equal to its generating series evaluated at x=1x=1.
  2. δSw(δ)=dΦsdxx=1\sum_{\delta \in S} w(\delta) = \left.\frac{d\Phi_s}{dx}\right|_{x=1}. This says that the sum of the weights of the elements in SS is exactly equal to the first derivative of the generating series, evaluated at x=1x=1.

Let's see why this works:

Proof:

  1. Here is a short proof for statement 1 above

ΦS(1)=δSxw(δ)x=1=δS1w(δ)=δS1=S\begin{aligned} \Phi_S(1) &= \left.\sum_{\delta \in S} x^{w(\delta)}\right|_{x=1}\\ &= \sum_{\delta\in S} 1^{w(\delta)} = \sum_{\delta \in S} 1 \\ &= |S| \end{aligned}
2. Similarly, here is a short proof for statement 2

dΦSdxx=1=δSw(δ)xw(δ)1x=1=δSw(δ)1w(δ)1=δSw(δ)\begin{aligned} \left.\frac{d\Phi_S}{dx}\right|_{x=1} &= \left.\sum_{\delta \in S} w(\delta) x^{w(\delta) - 1}\right|_{x=1}\\ &= \sum_{\delta \in S} w(\delta) 1^{w(\delta) - 1}\\ &= \sum_{\delta \in S} w(\delta)\\ \end{aligned}

Example: Consider a sequence of nn coin flips and suppose you win $1 with each time that you get heads after tails or tails after heads. Compute the expected winnings using generating series.

The above theorem allows us to get the sum of the weights of the elements in SS as the number of elements. Putting these together, we can get the average weight of an element in SS. Contextualizing to this example, we can benefit by defining a weight function in terms of the rewards and then compute the average as explained to get the expected winnings. Take a moment to contemplate how you would solve this before moving on...

We need to get a 'mathematical handle' on the occurrence of tails after heads and heads after tails. So, for a sequence of nn coin flips, we can concoct n1n-1 binary variables indicating whether the next flip in the sequence differs from the current one:

i[1,n1]\forall i\in [1,n-1] define zi={1,if (i+1)th outcome is different from ith0,otherwisez_i = \left\{\begin{array}{lr} 1, & \text{if }(i+1)\text{th outcome is different from }i\text{th}\\ 0, & \text{otherwise} \end{array}\right.

Using this string of binary variables z1,z2,,zn1z_1,z_2,\dots,z_{n-1}, one can derive useful information about the sequence of nn coin flips. We can put this all in a set S={(z1,z2,zn1)zj{0,1}}S=\{(z_1, z_2, \dots z_{n-1}) | z_j\in \{0,1\}\} with the weight function defined by w((z1,,zn1))=i=1n1ziw((z_1,\dots,z_{n-1})) = \sum_{i=1}^{n-1} z_i.

We can now view S={0,1}n1S = \{0,1\}^{n-1} and the weight function is essentially computing the number of ones in an entry from SS- each of which corresponds to a dollar reward (as per the rules of the game). Then we can compute ΦS(x)=i0aixi\Phi_S(x) = \sum_{i\geq 0} a_i x^{i}, where aia_i is the number of elements in SS which have weight ii. Fortunatly, we have alredy done this (see work on Ak,nA_{k,n})- we know that ai=(n1i)a_i = \binom{n-1}{i}. Therefore, we finally get ΦS(x)=(1+x)n1\Phi_S(x) = (1+x)^{n-1}, by the binomial theorem.

Now, we use the theorem above:

S=ΦS(1)=2n1|S| = \Phi_S(1) = 2^{n-1}

δSw(δ)=dΦSdxx=1=(n1)(1+x)n2x=1=(n1)2n2\begin{aligned} \sum_{\delta \in S} w(\delta) &= \left.\frac{d\Phi_S}{dx}\right|_{x=1}\\ &= \left.(n-1)(1+x)^{n-2}\right|_{x=1}\\ &= (n-1)2^{n-2} \end{aligned}

We can readily compute the expected winnings now:

Expected winnings=dΦSdxx=1S=(n1)2n22n1\begin{aligned} \text{Expected winnings} &= \frac{\left.\frac{d\Phi_S}{dx}\right|_{x=1}}{|S|}\\ &= \frac{(n-1)2^{n-2}}{2^{n-1}}\\ \end{aligned}

So far, we have only been considering the cases in which the set SS is finite. What happens to ΦS\Phi_S when SS is an infinite set? In this case, we have to make the following distinction:

  • If SS is finite, then ΦS(x)\Phi_S(x) is a polynomial
  • If SS is infinite, then ΦS(s)\Phi_S(s) is a formal power series (given that the coefficients aka_k are finite)

At this point, we introduce the concept of formal power series through which we shall obtain tools to deal with generating series.

Formal Power Series

A formal power series is an expression c0+c1x1+c2x2+c_0 + c_1x^1 + c_2x^2 + \dots where (c0,c1,c2,)(c_0, c_1, c_2, \dots) is a sequence of rational numbers (for our intents and purposes, the coefficients are rational numbers, but the important part is that they are finite). This is to concretize the notion of a generating series as a formal power series, and to then to apply the properties of formal power series in the realm of generating series.

Let C(x)C(x) be a formal power series (to be abbreviated as f.p.s). We can write C(x)=i=0cixiC(x) = \sum_{i=0}^\infty c_i x^i and define [xn]=cn[x^n] = c_n. So if C(x)=i=02ixiC(x) = \sum_{i=0}^\infty 2^i x^i, then [x0]C(x)=1[x^0]C(x) = 1, [x1]C(x)=2[x^1]C(x) = 2, [x2]C(x)=4[x^2]C(x) = 4 and so on. Let us now consider the operations which can be performed with formal power series.

Operations on Formal Power Series

Consider two formal power series A(x)=i=0aixiA(x) = \sum_{i=0}^\infty a_i x^i and B(x)=i=0bixiB(x) = \sum_{i=0}^\infty b_i x^i.

EQUALITY: ((A(x)=B(x))    (kN(ak=bk)))((A(x) = B(x)) \iff (\forall k \in \mathbb{N} (a_k = b_k))). This states that two formal power series are considered to be equal if and only if the coefficients aka_k and bkb_k are the same for all natural number kk.

ADDITION: A(x)+B(x)=k=0(ak+bk)xkA(x) + B(x) = \sum_{k=0}^\infty (a_k+b_k)x^k. This is a valid definition for addition since the resulting series is also a f.p.s- the coefficients (ak+bk)(a_k+b_k) are all finite rational numbers (by the declaration that AA and BB are formal power series above).

MULTIPLICATION: Defined as follows:

A(x)B(x)=j=0k=0ajbkxj+k=n=0(j=0najbnj)xn\begin{aligned} A(x)B(x) &= \sum_{j=0}^\infty \sum_{k=0}^\infty a_jb_k x^{j+k}\\ &= \sum_{n=0}^\infty \left( \sum_{j=0}^n a_j b_{n-j} \right)x^n\\ \end{aligned}

Note that we can identify the term [xn]A(x)B(x)=j=0najbnj[x^n]A(x)B(x) = \sum_{j=0}^n a_j b_{n-j} from the last expression above. This term is also a rational number since it is a finite sum of product of pairs of rational numbers.

Example: Let A(x)=(1+x)mA(x) = (1+x)^m and B(x)=k=0m2kxkB(x) = \sum_{k=0}^m 2^k x^k. Let us compute [xn]A(x)B(x)[x^n]A(x)B(x).

We know, from the binomial theorem that A(x)=j=0m(mj)xjA(x) = \sum_{j=0}^m \binom{m}{j}x^j. So [xn]A(x)=(mj)[x^n]A(x) = \binom{m}{j}. It is also clear from the summand of B(x)B(x) that [xn]B(x)=2n[x^n]B(x)=2^n. Putting these two together:

[xn]A(x)B(x)=i=0n[xi]A(x)[xni]B(x)=i=0n(mi)2ni=2ni=0n(mi)2i\begin{aligned} [x^n]A(x)B(x) &= \sum_{i=0}^n [x^i]A(x)[x^{n-i}]B(x) \\ &= \sum_{i=0}^n \binom{m}{i}2^{n-i} \\ &= 2^n \sum_{i=0}^n \binom{m}{i}2^{-i} \\ \end{aligned}

Now, notice what happens when n>mn>m- the term (mi)\binom{m}{i} will be zero for values of i>mi>m sum would effiectively run from 1 to mm.

2ni=0m(mi)2i=2ni=0m(mi)(12)i=2n(1+12)m=2nm3m\begin{aligned} 2^n\sum_{i=0}^m \binom{m}{i} 2^{-i} &= 2^n\sum_{i=0}^m \binom{m}{i} \left(\frac{1}{2}\right)^i\\ &= 2^n \left( 1+\frac{1}{2} \right)^m\\ &= 2^{n-m}3^m \end{aligned}

So we have that [xn]A(x)B(x)=2nm3m[x^n]A(x)B(x)=2^{n-m}3^m.

Example (Multiplicative Shift): This example illustrates how for all k,n0k,n\geq 0, the term [xn]xkA(x)[x^n]x^k A(x), is similar to a shift in indices. Let A(x)=j0ajxjA(x)=\sum_{j\geq 0}a_j x^j.

[xn]xkA(x)=[xn]xk(j0ajxj)=[xn]j=0ajxj+k  , let l=k+j=[xn]l=kalkxl={ank  , if nk00  , if nk<0\begin{aligned} [x^n] x^k A(x) &= [x^n] x^k \left( \sum_{j\geq 0} a_j x^j \right)\\ &= [x^n] \sum_{j= 0}^\infty a_j x^{j+k} \;\text{, let }l = k + j\\ &= [x^n] \sum_{l=k}^\infty a_{l-k} x^l\\ &= \begin{cases} a_{n-k} &\;\text{, if }n-k\geq 0\\ 0 &\;\text{, if }n-k <0 \end{cases} \end{aligned}

This is a rule/identity that will be used time and time again.

Definition (Formal Power Series Inverses): Let A(x),B(x)A(x),B(x) be formal power series satisfying A(x)B(x)=1A(x)B(x)=1, then one says that B(x)B(x) is the inverse of A(x)A(x) and writes B(x)=A(x)1B(x)=A(x)^{-1} or B(x)=1A(x)B(x)=\frac{1}{A(x)}.

Example: Let A(x)=1xA(x)=1-x, B(x)=i=0xkB(x)=\sum_{i=\geq 0} x^k.

We can write A(x)A(x), using the binomial theorem, as
A(x)=(1x)1=k=01(1k)(x)kA(x)=(1-x)^1=\sum_{k=0}^1 \binom{1}{k} (-x)^k. Using this, it can be seen that [xn]A(x)=(1)n(1n)[x^n]A(x) = (-1)^n\binom{1}{n} and this forms the sequence (1,1,0,0,0,)(1,-1,0,0,0,\dots).

Now, using the multiplication rule:

an=[xn]A(x)B(x)=j=0n[xj]A(x)[xnj]B(x)=j=0n(1)j(1j)(1)={0 if n01 if n=0\begin{aligned} a_n = [x^n]A(x)B(x) &= \sum_{j = 0}^n [x^j]A(x) [x^{n-j}]B(x)\\ &= \sum_{j = 0}^n (-1)^j\binom{1}{j} (1)\\ &= \begin{cases} 0 &\text{ if }n\geq 0\\ 1 &\text{ if }n=0 \end{cases} \end{aligned}

The only term with non-zero coefficent in A(x)B(x)A(x)B(x) is the x0x^0 term (the constant term), which has coefficent 1. Therfore, we see that A(x)B(x)=1A(x)B(x) =1 ans as such we can write i=0xi=11x\sum_{i=0}^\infty x^i = \frac{1}{1-x}, using the notation in the definiton of inverses in formal power series.

Theorem (Negative Binomial Series): As an extension of the example above, we have the following theorem: for all k=1,2,3,k=1,2,3,\dots we have that

1(1x)k=n=0(n+k1k1)xn\begin{aligned} \frac{1}{(1-x)^k} &= \sum_{n=0}^\infty \binom{n+k-1}{k-1} x^n \end{aligned}

Proof: The proof of this theorem is provided using induction on kk.

In the base case when k=1k=1, we check:

[xn]1(1x)1=[xn]j=0(j+1111)xj[xn]11x=[xn]j=0(j0)xj[xn]i=0xi=[xn]j=0xj1=1\begin{aligned} [x^n]\frac{1}{(1-x)^1} &= [x^n]\sum_{j=0}^\infty \binom{j+1-1}{1-1} x^j\\ [x^n]\frac{1}{1-x} &= [x^n]\sum_{j=0}^\infty \binom{j}{0} x^j\\ [x^n]\sum_{i=0}^\infty x^i &= [x^n]\sum_{j=0}^\infty x^j\\ 1 &= 1 \end{aligned}

The base case holds. Now, onto the inductive step where it is assumed that the theorem holds for some kk. Now, to show that it holds for k+1k+1.

1(1x)k+1=1(1x)k1(1x)=(n=0(n+k1k1)xn)(i=0xi)\begin{aligned} \frac{1}{(1-x)^{k+1}} &= \frac{1}{(1-x)^k} \frac{1}{(1-x)}\\ &= \left( \sum_{n=0}^\infty \binom{n+k-1}{k-1} x^n \right)\cdot \left( \sum_{i=0}^\infty x^i \right) \end{aligned}

Now that we have 1(1x)k+1\frac{1}{(1-x)^{k+1}} expressed as a product of two power series, we can use the multiplication rule to compute the coefficient for the xnx^n term.

[xn]1(1x)k+1=[xn](j=0(j+k1k1)xj)(i=0xi)=j=0n(j+k1k1)   by the Multiplication Rule=(n+kk)   by the Hockey Stick Identity\begin{aligned} [x^n]\frac{1}{(1-x)^{k+1}} &= [x^n]\left( \sum_{j=0}^\infty \binom{j+k-1}{k-1} x^j \right)\cdot \left( \sum_{i=0}^\infty x^i \right)\\ &= \sum_{j=0}^n \binom{j+k-1}{k-1} \;\text{ by the Multiplication Rule}\\ &= \binom{n+k}{k} \;\text{ by the Hockey Stick Identity} \end{aligned}

The inductive step holds too and therefore so does the theorem for all k=1,2,k=1,2,\dots. This completes the proof. \blacksquare

This theorem is also an important one and should be familiarized with as it can help with quick conversions to and from power series.

NOTE: There are conditions under which formal power series have inverses. Here's an example highlighting a case when the f.p.s. has no inverse. Consider A(x)=xA(x)=x, which is claimed to have no inverse. Let us proove this.

The proof provided here is by contradiction. Suppose A(x)A(x) did have an inverse. Then, there would exist another f.p.s. B(x)=i0bixiB(x)=\sum_{i\geq 0}b_i x^i such that A(x)B(x)=xB(x)=1A(x)B(x)=xB(x)=1. This looks like a good place to use the "Multiplicative shift" mentioned before to get:

[xn]A(x)B(x)=[xn]xB(x)={bn1   if n00   if n=0\begin{aligned} [x^n]A(x)B(x) &= [x^n]xB(x)\\ &= \begin{cases} b_{n-1} &\;\text{ if }n\geq 0\\ 0 &\;\text{ if }n=0 \end{cases} \end{aligned}

Note that if B(x)B(x) is an inverse for A(x)A(x), then [x0]A(x)B(x)=1[x^0]A(x)B(x)=1, but with the result above, we see that [x0]A(x)B(x)=0[x^0]A(x)B(x)=0, which is a contradiction. The starting assumption, that A(x)A(x) has an inverse must have been wrong then. Therefore, A(x)=xA(x)=x has no inverse.

Theorem (Condition for Inverses): An f.p.s. A(x)A(x) has an inverse iff
[x0]A(x)0[x^0]A(x)\neq 0.

Proof: [Forwards] First, assume that [x0]A(x)=an=0[x^0]A(x)=a_n=0. Now, we need to show that A(x)A(x) has no inverse (This is proving the contrapositive which is logically equivalent).

Say B(x)B(x) is another f.p.s. with [x0]B(x)=b0[x^0]B(x)=b_0.
Let us check [x0]A(x)B(x)=a0b0=0[x^0]A(x)B(x) = a_0b_0 = 0 since a0=0a_0 = 0. This means that A(x)B(x)A(x)B(x) can never be equal to 11, no matter the f.p.s. B(x)B(x). Therefore, A(x)A(x) has no inverse. This proves that if A(x)A(x) has an inverse then a00a_0\neq 0.

[Backwards] Now, suppose that [x0]A(x)0[x^0]A(x)\neq 0. We need to show that A(x)A(x) has an inverse. To do this, we let B(x)=i0bixiB(x)=\sum_{i\geq 0}b_i x^i and show that A(x)B(x)=1A(x)B(x)=1 has a single solution. We can equivalenty solve for:

[xn]A(x)B(x)=j=0najbnj\begin{aligned} [x^n]A(x)B(x) &= \sum_{j=0}^n a_j b_{n-j} \end{aligned}

If n=0n=0, then [x0]A(x)B(x)=a0b0[x^0]A(x)B(x) = a_0b_0. Comparing this to the [x0][x^0] coefficient on the Right Hand Side (RHS) of the equation A(x)B(x)=1A(x)B(x)=1, we get that a0b0=1a_0b_0 = 1, therefore b0=1a0b_0 = \frac{1}{a_0}. This is a valid quotient since a0a_0 is non-zero by assumption.

If n>0n>0, then [xn]A(x)B(x)=a0bn+j=1najbnj[x^n]A(x)B(x) = a_0b_n + \sum_{j=1}^n a_jb_{n-j}. All coefficents on the RHS of the equation are zero (except the constant term) so we can solve for bnb_n from above as bn=1a0j=1najbnjb_n = \frac{-1}{a_0}\sum_{j=1}^n a_jb_{n-j}, which can be solved iteratively for some desidered value of nn. This completes the proof. \blacksquare

The kind of solution obtained for bnb_n in the above theorem is called a linear recurrence because bnb_n is defined as a linear combination of previous bib_i terms (for 0i<n0\leq i < n). We can use this in solving equations of formal power series using inverses. This is studied now.

Theorem: Let A(x)A(x) and C(x)C(x) be f.p.s. If [x0]A(x)0[x^0]A(x)\neq 0 the there is a formal power series B(x)B(x) such that A(x)B(x)=C(x)A(x)B(x) = C(x).

Proof: This short and simple proof builds off of the conditions for inverses. Since [x0]A(x)0[x^0]A(x) \neq 0, we know that it has an invserse. Say the inverse of A(x)A(x) is B(x)B'(x) i.e. A(x)B(x)=1A(x)B'(x)=1. Now, let B(x)=B(x)C(x)B(x)=B'(x)C(x). This gives us that A(x)B(x)=(A(x)B(x))C(x)=C(x)A(x)B(x) = (A(x)B'(x))C(x) = C(x). This completes the proof. \blacksquare

Example: Let A(x)=5+x21xA(x) = \frac{5+x^2}{1-x}. Compute [xn]A(x)[x^n]A(x) for all n0n\geq 0. Note that in this example, if one considers A(x)A(x) to be quotient of polynomials, the one in the denominator has degree lower than the polynomial in the numerator.

First, start by writing A(x)=(11x)(5+x2)A(x) = \left(\frac{1}{1-x} \right)\left(5+x^2\right). Then:

[xn]A(x)=[xn](11x)(5+x2)=k=0n([xk]11x)([xnk]5+x2)=k=0n[xnk]5+x2={5  if n{0,1}6  if n2\begin{aligned} [x^n]A(x) &= [x^n]\left(\frac{1}{1-x} \right)\left(5+x^2\right)\\ &= \sum_{k=0}^n \left( [x^k] \frac{1}{1-x}\right)\left([x^{n-k}]5+x^2\right)\\ &= \sum_{k=0}^n [x^{n-k}]5+x^2\\ &= \begin{cases} 5 &\;\text{if }n\in\{0,1\}\\ 6 &\;\text{if }n\geq 2 \end{cases} \end{aligned}

Example: Let A(x)=2+7x3x21x2+x3A(x) = \frac{2+7x-3x^2}{1-x^2+x^3}. Note that in this example, if one considers A(x)A(x) to be a quotient of polynomials the the polynomial in the denominator has higher degree than the one in the numerator. In this case, we derive a recurrence relationship which can be used to compute [xn]A(x)[x^n]A(x) for all n0n\geq 0.

Write A(x)=k0akxkA(x) = \sum_{k\geq 0} a_k x^k so that [xn]A(x)=an[x^n]A(x)=a_n. Now, rewrite the original expression for A(x)A(x) as (1x2+x3)A(x)=2+7x3x2(1-x^2+x^3)A(x) = 2+7x-3x^2. This process works by essentially comparing coefficients from the Left Hand Side (LHS) and RHS of the re-arranged expression for A(x)A(x).

Starting with the constant term. Since the LHS and RHS are equal, we have that [x0](1x2+x3)A(x)=[x0]2+7x3x2[x^0](1-x^2+x^3)A(x) = [x^0]2+7x-3x^2. When the expressions for the coefficients are computed, we get that a0=2a_0 = 2.

Next, the linear term: [x1](1x2+x3)A(x)=[x1]2+7x3x2[x^1](1-x^2+x^3)A(x) = [x^1]2+7x-3x^2. Computing the expressions on each side of the equation, we get a1(1)a0(0)=7a_1(1) - a_0(0) = 7. We get a1=7a_1 = 7.

Next, the quadratic term: [x2](1x2+x3)A(x)=[x2]2+7x3x2[x^2](1-x^2+x^3)A(x) = [x^2]2+7x-3x^2. Computing the expressions on each side of the equation, we get a2+a1(0)a0=3a_2+a_1(0)-a_0 = -3. Evaluating using the previous computations, we get a2=1a_2 = -1.

Now, we derive a general recurrence using xnx^n term, where n3n\geq 3: [xn](1x2+x3)A(x)=[xn]2+7x3x2[x^n](1-x^2+x^3)A(x) = [x^n]2+7x-3x^2. Evaluating each side gives anan2+an3=0a_n - a_{n-2} + a_{n-3} = 0. This gives us the expression for the general ana_n as an=an2an3a_n = a_{n-2} - a_{n-3} for n3n\geq 3.

The expression ana_n as an=an2an3a_n = a_{n-2} - a_{n-3} for n3n\geq 3 is called a recurrence relation. But note that to compute ana_n, one needs to know an2a_{n-2} and an3a_{n-3}. Therefore, for the recurrence relation to be of any use, there must be a set of initial conditions provided. In this case, they were computed to be a0=2,a1=7a_0 = 2, a_1 = 7 and a2=1a_2 = -1. From this, it is possible to compute any ana_n iteratively.

Notes: We began to derive a general recurrence relation when we reached the cubic (or higher) term. This was because the x3x^3 term (and all after it) for the expression on the RHS have zero coefficient.

Also note how the a0a_0 term is non-zero.

Now, more generally, if there are f.p.s. A(x)=i0aixiA(x)= \sum_{i\geq 0}a_i x^i with a00a_0 \neq 0 and C(x)=i0cixiC(x) = \sum_{i\geq 0}c_i x^i. By a theorem, we know that there exists a B(x)B(x) such that A(x)B(x)=C(x)A(x)B(x)=C(x). We know that B(x)=C(x)A(x)B(x) = \frac{C(x)}{A'(x)}, where 1A(x)\frac{1}{A(x)} is the inverse of A(x)A(x). We can now derive a general linear recurrence relation defining the coefficients of B(x)B(x).

Theorem: Let A(x),C(x)A(x),C(x) be formal power series with a00a_0 \neq 0. Let B(x)=C(x)A(x)B(x)=\frac{C(x)}{A(x)}. Then, bn=[xn]B(x)b_n = [x^n]B(x) is determined by the linear recurrence:

bn=1a0(cnj=0n1anjbj)b_n = \frac{1}{a_0}\left( c_n - \sum_{j=0}^{n-1} a_{n-j}b_j \right)

Proof: Again this proof is short and straight-forward: To prove the recurrence, we start by comparing coefficients of the LHS and RHS i.e. checking [xn]A(x)B(x)=[xn]C(x)[x^n] A(x)B(x) = [x^n]C(x). We get:

cn=j=0najbnjcn=j=0nanjbjcn=a0bn+j=0n1anjbj\begin{aligned} c_n &= \sum_{j=0}^n a_j b_{n-j}\\ c_n &= \sum_{j=0}^n a_{n-j} b_{j}\\ c_n &= a_0b_n + \sum_{j=0}^{n-1} a_{n-j} b_{j}\\ \end{aligned}

Rearranging the last equation reveals the desired recurrence:

bn=1a0(cnj=0n1anjbj)b_n = \frac{1}{a_0}\left( c_n - \sum_{j=0}^{n-1} a_{n-j}b_j \right)

COMPOSITION: This is yet another operation defined on formal power series. Let A(x)=j0ajxjA(x) = \sum_{j\geq 0}a_j x^j and B(x)=j0bjxjB(x) = \sum_{j\geq 0}b_j x^j be formal power series. If b0=0b_0=0 then we define:

A(B(x))=j0ajB(x)j\begin{aligned} A(B(x)) &= \sum_{j\geq 0} a_j B(x)^j \end{aligned}

In order to verify that composition of formal series is well defined, we check that the result of composing A(x),B(x)A(x),B(x) as defined above is a formal power series too.

[xn]A(B(x))=j0aj[xn]B(x)j\begin{aligned} [x^n]A(B(x)) &= \sum_{j\geq 0} a_j [x^n]B(x)^j \end{aligned}

At this stage, note that the logic from multiplicative shift can be applied here to the term [xn]B(x)j=[xn](b1x+b2x2+)j[x^n]B(x)^j = [x^n](b_1x+b_2x^2+\dots)^j. If j>nj>n, then there are no terms with xnx^n left (all terms have exponent xnx^n or higher). Therefore, the sum can be reduced to run from 00 to nn instead.

[xn]A(B(x))=j=0naj[xn]B(x)j<\begin{aligned} [x^n]A(B(x)) &= \sum_{j= 0}^n a_j [x^n]B(x)^j\\ &< \infty \end{aligned}

The coefficient in finite because aja_j and [xn]B(x)j[x^n]B(x)^j are all rational and there is a finite count of summands. Therefore, A(B(x))A(B(x)) is a formal power series and the operation of composition is well defined.

Note: Here is an example when the condition that b0=0b_0=0 is necessary. Consider the composition of A(x)=i0xiA(x)=\sum_{i\geq 0}x^i and B(x)=1B(x)=1. Then, we get that A(B(x))=j01A(B(x))=\sum_{j\geq 0} 1 \rightarrow \infty. It can cause such problems.

Now, we shall look at using composition of formal power series to compute inverses.

Theorem: Let A(x),B(x)A(x),B(x) be formal power series such that [x0]A(x)0[x^0]A(x)\neq 0 and [x0]B(x)=0[x^0]B(x)=0. Then:

[A(B(x))]1=A1(B(x))\begin{aligned} [A(B(x))]^{-1} = A^{-1}(B(x)) \end{aligned}

Proof: Write A(x)=i0aixiA(x)=\sum_{i\geq 0}a_i x^i. Since a0=0a_0=0, we know that A(x)A(x) has an inverse. Let A1(x)=i0αixiA^{-1}(x)=\sum_{i\geq 0}\alpha_i x^i. Since [x0]B(x)=0[x^0]B(x)=0 we know that it can be composed into other formal power series namely A(x)A(x) and A1(x)A^{-1}(x). Let us now consider the product of A(x)A(x) and A1(x)A^{-1}(x), when composed with B(x)B(x):

A(B(x))A1(B(x))=(j0aj[B(x)]j)(j0αj[B(x)]j)=AA1(B(x))  by the Multiplicative Rule=1\begin{aligned} A(B(x))A^{-1}(B(x)) &= \left( \sum_{j\geq 0}a_j[B(x)]^j \right) \left( \sum_{j\geq 0}\alpha_j[B(x)]^j \right)\\ &= AA^{-1}(B(x)) \;\text{by the Multiplicative Rule}\\ &= 1 \end{aligned}

This shows us that A(B(x))A(B(x)) is the invserse of A1(B(x))A^{-1}(B(x)) since their product is 1. Therefore, we get the desired result. \blacksquare

Example: Computing [xn](1+3x)2(1x)1[x^n](1+3x)^{-2}(1-x)^{-1}. The key idea here is to first express the term (1+3x)2(1+3x)^{-2} as a composition of formal power series, then use the theorem above.

Let A(x)=3xA(x)=-3x. So we get that (1+3x)2=(1A(x))2(1+3x)^{-2}=(1-A(x))^{-2}. This is good because the negative binomial series can be used to obtain a series representation.

1(1+3x)2=1(1A(X))2=n0(n+11)(A(x))n   by the Negative Binomial Series=n0(n+1)(3x)n=n0(3)n(n+1)xn\begin{aligned} \frac{1}{(1+3x)^{2}} &= \frac{1}{(1-A(X))^{2}}\\ &= \sum_{n\geq 0} \binom{n+1}{1} (A(x))^n \;\text{ by the Negative Binomial Series}\\ &= \sum_{n\geq 0} (n+1)(-3x)^n\\ &= \sum_{n\geq 0} (-3)^n(n+1)x^n \end{aligned}

This shows us that [xn](1+3x)2=(3)n(n+1)[x^n](1+3x)^{-2} = (-3)^n(n+1).

[xn](1+3x)2(1x)1=j=0n[xj](1+3x)2[xnj](1x)1=j=0n[xj](1+3x)2=j=0n(3)j(j+1)\begin{aligned} [x^n] (1+3x)^{-2} (1-x)^{-1} &= \sum_{j=0}^n [x^j](1+3x)^{-2} [x^{n-j}](1-x)^{-1}\\ &= \sum_{j=0}^n [x^j](1+3x)^{-2}\\ &= \sum_{j=0}^n (-3)^j(j+1) \end{aligned}

We now have a suite of tools related to formal power series which we shall employ in our study of generating series- to which we return now...

... Back to Generating Series

Now, we study some additional properties of generating series.

Sum Lemma Let S=ABS=A\cup B where AB=A\cap B = \emptyset.
Let w:S{0,1,}w:S\mapsto\{0,1,\dots\}. Then

ΦS(x)=ΦA(x)+ΦB(x)\Phi_S(x)=\Phi_A(x)+\Phi_B(x)

Proof:

ΦS(x)=δSxw(δ)=δAxw(δ)+δBxw(δ)=ΦA(x)+ΦB(x)\begin{aligned} \Phi_S(x) &= \sum_{\delta\in S} x^{w(\delta)}\\ &= \sum_{\delta\in A}x^{w(\delta)} + \sum_{\delta\in B}x^{w(\delta)}\\ &= \Phi_A(x) + \Phi_B(x)\\ \blacksquare \end{aligned}
$

The sum lemma for generating series can be generalised to:

Generalization of Sum Lemma: Let S=A1A2AnS=A_1\cup A_2\cup \dots A_n where
AiAj=A_i\cap A_j = \emptyset for 1ijn1\leq i\neq j\leq n. Also let
w:S{0,1,2,}w:S\mapsto \{0,1,2,\dots\}. Then

ΦS(x)=i=0nΦAi(x)\Phi_S(x) = \sum_{i=0}^n \Phi_{A_i}(x)

Proof: This is proven by induction on nn. The base case is trivial. Assume, inductively that the generalization is true for some nn. Now we have to show that it holds for n+1n+1. So we have S=A1A2AnAn+1S=A_1\cup A_2\cup \dots A_n\cup A_{n+1} and where AiAj=A_i\cap A_j = \emptyset for all 1ijn+11\leq i\neq j \leq n+1.

S=A1A2AnAn+1=(A1A2An)An+1\begin{aligned} S &= A_1\cup A_2\cup \dots A_n\cup A_{n+1}\\ &= (A_1\cup A_2\cup \dots A_n)\cup A_{n+1} \end{aligned}

Since SS is a disjoint union of (A1A2An)(A_1\cup A_2\cup \dots A_n) and An+1A_{n+1}, we can use the sum lemma:

ΦS(x)=Φ(A1A2An)(x)+ΦAn+1(x)=i=0nΦAi(x)+ΦAn+1(x)  by Inductive Hypothesis for case n=i=0n+1ΦAi(x)\begin{aligned} \Phi_S(x) &= \Phi_{(A_1\cup A_2\cup \dots A_n)}(x) + \Phi_{A_{n+1}}(x)\\ &= \sum_{i=0}^n \Phi_{A_i}(x) + \Phi_{A_{n+1}}(x) \;\text{by Inductive Hypothesis for case }n\\ &= \sum_{i=0}^{n+1} \Phi_{A_i}(x)\\ \blacksquare \end{aligned}

Product Lemma: Let A,BA,B be sets with weight functions
α:A{0,1,2,}\alpha:A\mapsto \{0,1,2,\dots\} and β:B{0,1,2,}\beta:B\mapsto\{0,1,2,\dots\}
respectively. Define a weight function w:A×B{0,1,2,}w:A\times B\mapsto \{0,1,2,\dots\} by:

w((a,b))=α(a)+β(b)w((a,b))=\alpha(a) + \beta(b) for all (a,b)A×B(a,b)\in A\times B

Then:

ΦA×B(x)=ΦA(x)ΦB(x)\Phi_{A\times B}(x) = \Phi_{A}(x)\Phi_B(x)

Proof:

ΦA×B=δ=(a,b)A×Bxw((a,b))=δA×Bxα(a)+β(b)=aAbBxα(a)+β(b)=(aAxα(a))(bBxβ(b))=ΦA(x)ΦB(x)\begin{aligned} \Phi_{A\times B} &= \sum_{\delta=(a,b)\in A\times B} x^{w((a,b))}\\ &= \sum_{\delta\in A\times B} x^{\alpha(a)+\beta(b)}\\ &= \sum_{a\in A}\sum_{b\in B} x^{\alpha(a)+\beta(b)}\\ &= \left( \sum_{a\in A}x^{\alpha(a)} \right)\cdot \left( \sum_{b\in B}x^{\beta(b)} \right)\\ &= \Phi_A(x)\Phi_B(x)\\ \blacksquare \end{aligned}

Integer Compositions

Definition: A composition of an integer nn is a tuple of positive integers (a1,,ak)(a_1,\dots, a_k) such that i=1kai=n\sum_{i=1}^k a_i = n. We say that this composition of nn has kk parts. Note that a composition is a tuple and therefore order matters! Furthermore, note that the parts in a integer composition are strictly positive integers and therefore 0 is not a valid part.

Also, by convention, n=0n=0 has one composition- the empty composition ()().

The goal is to answer questions of the form: "How many integer compositions of nn have some property PP?" The property PP could be anything of interest. Here are some examples:

  • P=P= the number of parts is kk
  • P=P= the parts are even
  • P=P= the parts are congruent to 1 modulo 3.

The general strategy to solve such problems is as follows:

  1. Define a set, say SS, which contains all of the intger compositions with the property PP.
  2. Define an appropriate weight function ww, such that the solution to the question of interest is given by [xn]ΦS(x)[x^n]\Phi_S(x).
  3. Compute the generating series of SS, with respect to the weight function ww, ΦS(x)\Phi_S(x).
  4. Finally, compute [xn]ΦS(x)[x^n]\Phi_S(x).

The toolbox that will allow us to perform these steps includes the following:

  • Sum lemma: if AB=A\cap B = \emptyset, then ΦAB(x)=ΦA(x)+ΦB(x)\Phi_{A\cup B}(x)=\Phi_A(x) + \Phi_B(x)
  • Product lemma: with the condition of the weight functions satisfied, one has ΦA×B(x)=ΦA(x)ΦB(x)\Phi_{A\times B}(x) = \Phi_A(x)\Phi_B(x)
  • Power shifts: [xn]xkA(x)={[xnk]A(x) if nk0 if n<k[x^n]x^kA(x) = \begin{cases} [x^{n-k}]A(x) &\text{ if }n\geq k\\ 0 &\text{ if }n<k \end{cases}
  • Geometric series: i=0xi=11x\sum_{i=0}^\infty x^i = \frac{1}{1-x}
  • Composition of geometric series: If [x0]A(x)=0[x^0]A(x) = 0, then i=0(A(x))i=11A(x)\sum_{i=0}^\infty (A(x))^i = \frac{1}{1-A(x)}
  • Negative binomal series: 1(1x)k=n=0(n+k1k1)xn\frac{1}{(1-x)^k}=\sum_{n=0}^\infty \binom{n+k-1}{k-1} x^n

Example: Compute the number of compositions of nn with kk parts.

First, we need a set which contains all the integer compositions with kk parts. Let S=NkS=N^k, where N={1,2,3,}N=\{1,2,3,\dots\}.

Then, define an appropriate weight function. Let w:S{0,1,2,}w:S\mapsto \{0,1,2,\dots\} be defined as follows: for all X=(x1,,xk)SX=(x_1,\dots,x_k)\in S, w(X)=i=0kxiw(X)=\sum_{i=0}^k x_i.

With these definitions in place, the answer to the "number of compositions of nn which have kk parts" is just [xn]ΦS(x)[x^n]\Phi_S(x). We now compute it.

ΦS(x)=ΦNk(x)=(ΦN(x))kProduct Lemma=(j1xj)j=xk(1x)k\begin{aligned} \Phi_S(x) &= \Phi_{N^k}(x)\\ &= \left( \Phi_N(x) \right)^k &\text{Product Lemma}\\ &= \left( \sum_{j\geq 1} x^j \right)^j\\ &= \frac{x^k}{(1-x)^k} \end{aligned}

Now, using the multiplicative shift rule, one sees that:

[xn]ΦS(x)={0 if n<k(n1k1) if nk\begin{aligned} [x^n]\Phi_S(x) &= \begin{cases} 0 &\text{ if } n< k\\ \binom{n-1}{k-1} &\text{ if } n\geq k \end{cases} \end{aligned}

The last case evaluates to the binomial coefficient given because of the negative binomial theorem.

Binary Strings

This is another object which can be studied very interestingly with the concepts of generating series. First, some definitions regarding binary strings:

Definition: A binary string (a.k.a. {0,1}\{0,1\}-string) is a strings consisting of only ones and zeroes. The length of a binary string is defined to be the total number of ones and zeroes in the string.

Definition: bb is a substring of ss if s=abcs=abc for some binary strings a,ca,c (which could possibly be empty strings). The expression abcabc will become clear when concatenation is defined soon.

Definition: A block in a binary string is a maximal, non-empty substring which consists of only ones or only zeroes.

Definition: Concatenation of two binary strings a,ba,b is the string abab.

Definition: If AA and BB are sets of binary strings, we define
AB={abaA and bB}AB=\{ab|a\in A \text{ and }b\in B\}. So ABAB is the set consisting of all the concatenations of a string in AA with a string in BB.

Also, AkA^k is defined to be AAAA\cdot A\cdot \dots \cdot A, kk times. Note that this notation is overloaded with the Cartesian product for regular sets.

Definition: For any set AA of binary strings, define the star operation as follows:

A={ϵ}AA2=k=0Ak\begin{aligned} A^* &= \{\epsilon\}\cup A \cup A^2 \cup \dots \\ &= \cup_{k=0}^\infty A^k \end{aligned}

where ϵ\epsilon is defined to be the empty string (which has length 0).

With this star operation, it is possible to define the set of all binary strings as {0,1}\{0,1\}^*. It also allows us to create more specific sets of strings (which, as in the case of integer compositions is the first step in solving a combinatorial problem involving binary strings). For example, the set {0}{00}\{0\}\{00\}^* contains all of the strings of odd length containing only
zeroes.

The goal is to be able to answer questions of the form "How many binary strings of length nn have property PP?". Again, property PP could be anything of interest in the context of binary strings:

  • P=P= no adjacent ones
  • P=P= exactly kk occurrences of the substring 110110
  • P=P= no occurrences of the substring 111111.

The strategy is very similar to the one used for integer compositions:

  1. Devise a set SS of all binary strings which have the desired property PP
  2. Let w:S{0,1,2,}w:S\mapsto \{0,1,2,\dots\} be a weight function which, for all σS\sigma \in S, w(σ)=w(\sigma)= length of σ\sigma.
  3. The desired answer will then be [xn]ΦS(x)[x^n]\Phi_S(x)

While the strategy may be the same as the one for integer compositions, binary strings have some nuances which require that the set of tools (specifically the product and sum lemmas) to be readjustment to handle the nuances. In particular, one may not immediately see how the product lemma may be used for binary strings as there is no operation of Cartesian products, just concatenation and these two are not the same.

Consider the two sets of binary strings A={010,01}A = \{ 010, 01 \} and B={0,00}B = \{ 0, 00 \}. The Cartesian product A×BA\times B has 4 elements. But observe the set of strings in the set AB={0100,10100,101}AB = \{ 0100, 10100, 101 \}. Note that the product lemma does not hold in this case: ΦAB(x)=x3+x4+x5\Phi_{AB}(x) = x^3 + x^4 + x^5 and this is not the product of ΦA(x)=x2+x3\Phi_A(x) = x^2 + x^3 and ΦB(x)=x+x2\Phi_B(x) = x + x^2. In particular, one cannot draw a bijection between ABAB and A×BA\times B- there are two elements in A×BA\times B, namely (010,0)(010, 0) and (01,00)(01, 00) which map to the same string "0100" in ABAB. To deal with this, we introduce the concept of ambiguity:

Definition: Let AA and BB be sets of binary strings. The expression ABAB is ambiguous if there exist a1,a2Aa_1,a_2 \in A and b1,b2Bb_1, b_2 \in B such that a1b1=a2b2a_1b_1 = a_2b_2.

The expression ABA\cup B is ambiguous if ABA\cap B \neq \emptyset.

An expression involving concatenations and unions of sets of binary strings is ambiguous if any of the constituent concatenations or unions of sets is ambiguous.

An expression is said to be unambiguous iff it is not ambiguous.

Using this definition, for example, the set AkA^k is ambiguous if there exist a1,,ak,a1,akAa_1, \dots, a_k, a_1', \dots a_k' \in A such that (a1,,ak)(a1,,ak)(a_1,\dots, a_k)\neq (a_1', \dots, a_k') and a1ak=a1aka_1\dots a_k = a_1'\dots a_k'.

Example: Show that the expression {0}{00}\{ 0 \} \{ 00 \}^* is unambiguous.

First, we show that the star operation is unambiguous. The set {00}={ϵ}{00}{00}2\{ 00 \} = \{ \epsilon \} \cup \{ 00 \} \cup \{ 00 \}^2 \cup \dots. Firstly, these are clearly disjoint unions. Next, note that the set {00}k\{ 00 \}^k is unambiguous for all k0k\geq 0; we have that {00}k={00002k zeros}\{ 00 \}^k = \{ \underbrace{00\dots 00}_\text{2k zeros} \}.

Next, the concatenation with {0}\{ 0 \} is unambiguous. If we have two strings 0b=0b0b = 0b', then clearly b=bb = b'. So the expression {0}{00}\{ 0 \} \{ 00 \}^* is unambiguous.

Now, here are some very useful unambiguous expressions for the set of all binary strings. They are important because they can be "refined" to smaller unambiguous sets.

  • {0,1}\{ 0, 1 \}. This one is trivial.
  • {0}({1}{0})\{ 0 \}^* \left( \{ 1 \} \{ 0 \}^* \right)^*. In this expression, a binary
    string is split by the ones in the string.
  • {0}({1}{1}{0}{0}){1}\{ 0 \}^* \left( \{ 1 \} \{ 1 \}^* \{ 0 \} \{ 0 \}^* \right)^* \{ 1 \}^*. This expression is called the block decomposition of the set of binary strings. A binary string here is viewed as a block of ones followed by a block of zeros, with a possible starting block of zeros and a possible ending block of ones.

Note: One may find it important to know that by the laws of symmetry, it is perfectly acceptable and valid to switch the ones and zeros in the above ambiguous expressions and end up with a valid unambiguous expression.

Depending on the situation, one of these expressions will be easier to manipulate to reduce it into a set of binary strings with desired properties. For example, in a question specifying conditions on the blocks of ones and the blocks of zeros, the block decomposition may be a good place to start.

Example: Derive an expression for the set of all binary strings with no 3 consecutive zeros.

Since this question only deals with the blocks of zeros, we can use the second one of the unambiguous decompositions and refine it down to an expression for the set of all binary strings with no 3 consecutive zeros.

In the expression {0}({1}{0})\{ 0 \}^* \left( \{ 1 \} \{ 0 \}^* \right)^*, the parts of concern are the ones with zeros. The set {0}={ϵ,0,00,}\{ 0 \}^* = \{ \epsilon, 0, 00, \dots \} needs to be reduced down to the set {ϵ,0,00}\{ \epsilon, 0, 00 \}.

Replacing {0}\{ 0 \}^* with this reduced set in the unambiguous expression above results in the following set {ϵ,0,00}({1}{ϵ,0,00})\{ \epsilon, 0, 00 \}^* \left( \{ 1 \} \{ \epsilon, 0, 00 \} \right)^* which is an unambiguous expression for the set of all binary strings which have no 3 consecutive zeros.

Equipped with this concept of unambiguity, one modification needs to be made before computing generating series for sets of binary strings. This is that the expression for the set SS needs to be unambiguous in order to sanely compute its generating series. Here are some theorems to aid with this goal:

Theorem: Let AA and BB be sets of binary strings.

  1. If ABAB is unambiguous, then ΦAB(x)=ΦA(x)ΦB(x)\Phi_{AB}(x) = \Phi_A(x) \Phi_B(x). This is
    called the product lemma for binary strings
  2. If ABA\cup B is unabiguous, then ΦAB(x)=ΦA(x)+ΦB(x)\Phi_{A\cup B}(x) = \Phi_A(x) + \Phi_B(x).
    This is called the sum lemma for binary strings.
  3. If AA^* is unambiguous, then ΦA(x)=11ΦA(x)\Phi_{A^*}(x) = \frac{1}{1 - \Phi_A(x)}. This
    is called the star rule.

Proof:

Let l(x)l(x) be the function which returns the length of the binary string xx. This is the default weight function for binary strings, unless otherwise specified.

  1. In this case, note that an element σAB\sigma \in AB is unambiguously a concatenation of some element aAa\in A with a bBb\in B.
    ΦAB(x)=σABxl(σ)=aA and bBxl(ab)=aA and bBxl(a)+l(b)=(aAxl(a))(bBxl(b))=ΦA(x)ΦB(x)\begin{aligned} \Phi_{AB}(x) &= \sum_{\sigma \in AB} x^{l(\sigma)}\\ &= \sum_{a\in A \text{ and } b\in B} x^{l(ab)}\\ &= \sum_{a\in A \text{ and } b\in B} x^{l(a) + l(b)}\\ &= \left( \sum_{a\in A} x^{l(a)} \right) \left( \sum_{b\in B} x^{l(b)} \right)\\ &= \Phi_A(x) \Phi_B(x) \end{aligned}

  2. This case is pretty simple as ABA\cup B is unambiguous iff it is a disjoint union. Therefore the sum lemma can be used.

    ΦAB(x)=ΦA(x)+ΦB(x)\Phi_{A\cup B}(x) = \Phi_A(x) + \Phi_B(x)

  3. We rewrite A=k=0AkA^* = \cup_{k=0}^\infty A^k. Since AiAj=A^i \cap A^j = \emptyset for all 0ij0 \leq i \neq j, the sum lemma can be applied. Also, since each AkA^k is unambiguous for all k0k\geq 0, the product lemma may be applied.

    ΦA(x)=k=0ΦAk(x)Sum lemma=k=0[ΦA(x)]kProduct lemma\begin{aligned} \Phi_{A^*}(x) &= \sum_{k=0}^\infty \Phi_{A^k}(x) &\text{Sum lemma}\\ &= \sum_{k=0}^\infty \left[ \Phi_A(x) \right]^k &\text{Product lemma} \end{aligned}

    Now, this last term looks like a composition with the geometric series, but to be sure, we have to verify that [x0]ΦA(x)=0[x^0] \Phi_A(x) = 0. Note that A0={ϵ}A^0 = \{ \epsilon \} and that AjAi=A^j \cap A^i = \emptyset for all 0ij0\leq i\neq j by the unambiguity of AA^*. This means that ϵA\epsilon \notin A and so there is no string in AA with length 0, so the constant term of ΦA(x)\Phi_A(x) must be 0- so this is indeed a valid composition. Finally, we get:

    ΦA(x)=k=0[ΦA(x)]k=11ΦA(x)\begin{aligned} \Phi_{A^*}(x) &= \sum_{k=0}^\infty \left[ \Phi_A(x) \right]^k \\ &= \frac{1}{1-\Phi_A(x)} \end{aligned}

Example: Compute the number of binary strings of length nn with no substring 1111.

First, we need an unambiguous expression for the set of all binary strings with no substring 1111. We can use symmetry and reverse one of the unambiguous expressions for the set of all binary strings to get a good starting point. We reverse the ones and zeros in the unambiguous expression {0}({1}{0})\{ 0 \}^* \left( \{ 1 \} \{ 0 \}^* \right)^* to get {1}({0}{1})\{ 1 \}^* \left( \{ 0 \} \{ 1 \}^* \right)^*. Now, we refine this expression down so that every block of ones does not contain the substring 11. This gives the expression S={ϵ,1}({0}{ϵ,1})S = \{ \epsilon, 1 \} \left( \{ 0 \} \{ \epsilon, 1 \} \right)^*.

Now, proceeding to compute the generating series of the set SS with respect to string length as the weight function. Note that the product and star rules may be applied because the expression for SS above is unambiguous.

ΦS(x)=Φ{ϵ,1}(x)Φ({0}{ϵ,1})(x)Product lemma=Φ{ϵ,1}(x)1Φ{0}{ϵ,1}(x)Star rule=Φ{ϵ,1}(x)1Φ{0}(x)Φ{ϵ,1}(x)Product rule=1+x1x(1+x)=1+x1xx2\begin{aligned} \Phi_{S}(x) &= \Phi_{\{ \epsilon, 1 \}}(x) \Phi_{\left( \{ 0 \} \{ \epsilon, 1 \} \right)^*}(x) &\text{Product lemma}\\ &= \frac{ \Phi_{\{ \epsilon, 1 \}}(x) }{ 1 - \Phi_{\{ 0 \} \{ \epsilon, 1 \}}(x)} &\text{Star rule}\\ &= \frac{ \Phi_{\{ \epsilon, 1 \}}(x) }{ 1 - \Phi_{\{ 0 \}}(x) \Phi_{\{ \epsilon, 1 \}}(x)} &\text{Product rule}\\ &= \frac{ 1 + x }{ 1 - x(1 + x) } \\ &= \frac{ 1 + x }{ 1 - x - x^2 } \end{aligned}

With this expression, note that we can do the following simplification:

[xn]ΦA(x)=[xn]11xx2+[xn]x1xx2=[xn]11xx2+[xn1]11xx2=fn+fn1\begin{aligned} [x^n] \Phi_A(x) &= [x^n] \frac{1}{ 1 - x - x^2 } + [x^n] \frac{x}{ 1 - x - x^2}\\ &= [x^n]\frac{1}{ 1 - x - x^2 } + [x^{n-1}]\frac{1}{ 1 - x - x^2 }\\ &= f_n + f_{n-1} \end{aligned}

where fn=[xn]11xx2f_n = [x^n]\frac{1}{ 1 - x - x^2 }. This is actually part of the Fibonacci sequence. Computing the starting terms of fnf_n is easy by comparing coefficients. We get f0=f1=1f_0 = f_1 = 1. Putting all of this together, we finally get that:

[xn]ΦS(x)={1 if n=0fn+fn1 if n1\begin{aligned} [x^n]\Phi_S(x) &= \begin{cases} 1 &\text{ if } n = 0\\ f_{n} + f_{n - 1} &\text{ if } n\geq 1 \end{cases} \end{aligned}

This makes intutive sense. When n=0n=0, there is clearly only one string with length 0, which does not contain the substring 1111, namely the empty string. Then, there are two strings of length 1 which do not contain the string 1111 and so on.

Recursive Decompositions

Sometimes, recursive definitions can help to make computation of generating series easier. Note that recursive decompositions are just expressions and the rules that have so far been used to compute generating series will still be used with them, just with collection of like terms to get the final series.

Here is an example. Let us derive a recursive decomposition for the set of all binary strings, say SS. One way to approach this is by selecting an arbitrary string in SS and analysing it to see what could come next. In this case, any binary string could start with either a 11 or a 00 and it could be followed by a sequence of ones or zeros i.e. another binary string. Or, the binary string could simply be the empty string. This gives us the simple recursive definition for the set of all binary strings as S={ϵ}{0,1}SS = \{ \epsilon \} \cup \{ 0, 1 \}S.

Now, let us try to compute the generating series of this set SS with this representation. Note that we can still use the product lemma here as the concatenation is clearly unambiguous and the sum lemma because the union is disjoint.

ΦS(x)=Φ{ϵ}(x)+Φ{0,1}S(x)Sum lemma=1+Φ{0,1}(x)ΦS(x)Product lemma=1+2xΦS(x)\begin{aligned} \Phi_S(x) &= \Phi_{ \{ \epsilon \} }(x) + \Phi_{ \{ 0, 1 \} S }(x) &\text{Sum lemma}\\ &= 1 + \Phi_{ \{ 0, 1 \} }(x) \Phi_S(x) &\text{Product lemma}\\ &= 1 + 2x \Phi_S(x) \end{aligned}

By collecting like terms, one ends up with the generating series: ΦS(x)=112x\Phi_S(x) = \frac{1}{1-2x}.

Example: Another example is an expression for the set of all binary strings
with no substring 1111. Let the set of all such strings be SS.

Firstly, a string in SS is either the empty string, starts with a one or starts with a zero. Let A0A_0 be the set of all binary strings which start with a zero. And let A1A_1 be the set of all binary strings which start with a one. So S={ϵ}A0A1S = \{ \epsilon \} \cup A_0 \cup A_1. Note that this is a disjoint union of sets and so the sun lemma is applicable.

Here are some observations that will make life easier. Firstly, an element in A0A_0 is such that it begins with a 00 and then a sequence of ones and zeros follow i.e. another binary string. Therefore, we get the expression A0={0}SA_0 = \{ 0 \} S. Similarly, an element in A1A_1 is either the string 11 or the string 11 concatenated to the beginning of every string in A0A_0. More precisely, A1={1}{1}A0A_1 = \{ 1 \} \cup \{ 1 \} A_0. Now, we can use these expressions in order to compute the generating series of the set SS.

ΦS(x)=Φ{ϵ}(x)+Φ{0}S(x)+Φ{1}(x)+Φ{1}{0}S(x)Sum lemma=Φ{ϵ}(x)+Φ{0}(x)ΦS(x)+Φ{1}(x)+Φ{1}(x)Φ{0}(x)ΦS(x)Product lemma=1+xΦS(x)+x+x2ΦS(x)\begin{aligned} \Phi_S(x) &= \Phi_{ \{ \epsilon \} }(x) + \Phi_{ \{ 0 \} S }(x) + \Phi_{ \{ 1 \} }(x) + \Phi_{ \{ 1 \} \{ 0 \} S }(x) &\text{Sum lemma}\\ &= \Phi_{ \{ \epsilon \} }(x) + \Phi_{ \{ 0 \} }(x) \Phi_S(x) + \Phi_{ \{ 1 \} }(x) + \Phi_{ \{ 1 \} }(x) \Phi_{ \{ 0 \} }(x) \Phi_{ S }(x) &\text{Product lemma}\\ &= 1 + x\Phi_S(x) + x + x^2\Phi_S(x) \end{aligned}

By collecting like terms and solving for ΦS(x)\Phi_S(x) yields

ΦS(x)=1+x1xx2\begin{aligned} \Phi_S(x) &= \frac{ 1 + x }{ 1 - x - x^2 } \end{aligned}

This is the same answer as the one obtained when using unambiguous expressions.

Coefficients of Rational Functions

This is a new tool that can be used when solving combinatorial problems such as the ones being dealt above. Specifically, these are tools for computing [xn]f(x)g(x)[x^n] \frac{ f(x) }{ g(x) } where f,gf,g are polynomials.

A starting point will be the method of partial fractions. In specific cases, it can be used to split a rational polynomial to a form where its coefficients are easy to compute using composition with geometric series or the negative binomial series.

The method of partial fractions will be generalized to rational fractions of the form f(x)g(x)\frac{ f(x) }{ g(x) } where f,gf,g are polynomials and deg(f)<deg(g)\deg( f ) < \deg( g ) and [x0]g(x)=0[x^0] g(x) = 0.

Notes: Here are some points before proceeding:

  • Without loss of generality (WLOG), we can assume that [xn]g(x)=1[x^n] g(x) = 1 since it is non-zero- this is fine because it can be scaled to 1 anyway by multiplying the numerator and denominator by a constant.

  • By the fundamental theorem of algebra, gg can be factorized over the complex numbers as g(x)=(α1x)e1(αkx)ekg(x) = (\alpha_1 - x)^{e_1} \dots (\alpha_k - x)^{e_k}. Note that the multiplicities e1,,ek1e_1, \dots, e_k \geq 1 and i=0kei=deg(g)\sum_{i=0}^k e_i = \deg( g ). Also, the values α1,,αkC{0}\alpha_1, \dots, \alpha_k \in \mathbf{C}\setminus \{ 0 \}- they must be non-zero complex numbers since the constant term of gg is non-zero.

    Furthermore, if we let ri=αi1r_i = \alpha_i^{-1} for 1ik1\leq i \leq k, Since the constant term of gg is effectively 11, we can re-write it as:
    g(x)=(1r1x)e1(1rkx)ekg(x) = ( 1 - r_1 x )^{e_1} \dots ( 1 - r_k x )^{e_k}

Now, we generalize the theorem of partial fractions. This will be just the statement and not the proof (I may include the proof at a later time).

Theorem [ Partial Fractions ]: Suppose f,gf,g are polynomials with deg(f)<deg(g)\deg(f) < \deg( g ) and g(x)=(1r1x)e1(1rkx)ekg(x) = ( 1 - r_1 x )^{e_1} \dots ( 1 - r_k x )^{e_k} with r1,,rkr_1, \dots, r_k distinct complex numbers and e1,,eke_1, \dots, e_k are integers greater that one which add up to the degree of gg. Then:

f(x)g(x)=C1,1(1r1x)+C1,2(1r1x)2++C1,e1(1r1x)e1+C2,1(1r2x)+C2,2(1r2x)2++C1,e2(1r2x)e2+Ck,1(1rkx)+Ck,2(1rkx)2++Ck,ek(1rkx)ek\begin{aligned} \frac{ f(x) }{ g(x) } &= \frac{C_{1,1}}{(1-r_1x)} + \frac{C_{1,2}}{(1-r_1x)^2} + \dots + \frac{C_{1,e_1}}{(1-r_1x)^{e_1}} \\ &+ \frac{C_{2,1}}{(1-r_2x)} + \frac{C_{2,2}}{(1-r_2x)^2} + \dots + \frac{C_{1,e_2}}{(1-r_2x)^{e_2}} \\ &\vdots \\ &+ \frac{C_{k,1}}{(1-r_kx)} + \frac{C_{k,2}}{(1-r_kx)^2} + \dots + \frac{C_{k,e_k}}{(1-r_kx)^{e_k}} \end{aligned}

for complex coefficients {C1,1,,Ck,ek}\{ C_{1,1}, \dots, C_{k, e_k} \}.

This is useful because it allows one to reduce a quotient of polynomial to a state where it can be dealt with using the negative binomial theorem. For example, the quotient A(x)=2+2x2(12x)(17x)A(x) = \frac{ 2 + 2x^2 }{ (1-2x)(1-7x) } can be reduced to a much more friendly form:

A(x)=112x1(12x)2+417x\begin{aligned} A(x) &= \frac{-1}{ 1 - 2x } - \frac{1}{ ( 1 - 2x )^2 } + \frac{4}{ 1 - 7x } \end{aligned}

In this form, one can easily compute the coefficient of the xnx^n using the negative binomial series for each individual term. This results in the following answer: [xn]A(x)=2n(n2)+47n[x^n] A(x) = 2^n ( -n -2 ) + 4\cdot 7^n.

Theorem: Suppose that f,gf,g are polynomials with deg(f)<deg(g)\deg(f) < \deg(g) and gg is of the form g(x)=(1r1x)e1(1rkx)ekg(x) = (1-r_1x)^{e_1} \dots (1-r_kx)^{e_k}. Then, there exist polynomials p1.,pkp_1. \dots, p_k such that deg(pi)ei\deg( p_i ) \leq e_i for 1ik1\leq i \leq k and [xn]f(x)g(x)=p1(x)r1n++pk(x)rkn[x^n] \frac{f(x)}{g(x)} = p_1(x)r_1^n + \dots + p_k(x)r_k^n.

Proof: This comes pretty simply using the following claim, applied to the generalised partial fractions theorem.

Claim: Let c,rCc,r \in \mathbf{C} and j1j\geq 1. Then there is a degree j1j-1 polynomial qq such that [xn]c(1rx)j=q(n)rn[x^n] \frac{ c }{ (1-rx)^j } = q(n)r^n.

Proof of claim: We can use the negative binomial series here to expand the formal power series into a summation and then extract its coefficient as follows:

[xn]c(1rx)j=[xn]ck0(k+j1j1)(rx)j=c(n+j1j1)rn=crn(n+j1)(n+j2)(n+1)(j1)!\begin{aligned} [x^n] \frac{c}{(1-rx)^j} &= [x^n] c \sum_{k\geq 0} \binom{k + j - 1}{j - 1} (rx)^j\\ &= c \binom{n+j-1}{j-1} r^n\\ &= cr^n \frac{(n+j-1)(n+j-2)\dots (n+1)}{(j-1)!} \end{aligned}

Note that the numerator has the form of a polynomial in xx evaluated at a value nn and the denominator is not at all dependent on nn. This motivates us to define q(x)=c(j1)!(x+j1)(x+j2)(x+1)q(x) = \frac{c}{(j-1)!}(x+j-1)(x+j-2)\dots (x+1), which is indeed a degree j1j-1 polynomial. This completes the proof of the claim.

Now, using this claim with each row in the generalized form of the partial fractions expression, the theorem now holds. For a row ii in the partial fractions expression, one has eie_i such polynomials from the claim- each of them will be multiplied by an rinr_i^n term and these can be factored to get rir_i multiplied by the sum of eie_i polynomials, of which the maixmum degree is ei1e_i -1 by the claim above. Let this sum of polynomials be the polynomial pi(x)p_i(x). This completes the proof of the theorem.

Now, this theorem gives us another way of quickly computing the coefficient of the xnx^n term in a quotient of polynomials. One knows the form that the quotient will take, so what remains is to plug whatever available information exists in the context into this theorem and then attempt to compute the coefficients of the polynomials p1,,pkp_1, \dots, p_k.