"Symmetry is one of the most powerful tools in the mathematics arsenal." David Jao
Tools for Enumeration
Main interest: counting the size of various sets.
Let us recall the elementary operations that can be performed with sets. Let A,B be finite sets. We have the following operations:
UNION: A∪B={x∣x∈A or x∈B}
INTERSECTION: A∩B={x∣x∈A and x∈B}
In order to count the size of a set, one of the approaches used is to build that set using unions of smaller sets, whose sizes are known. It is important, therefore, to understand the relationship between the cardinality of a set and the cardinalities of the sets that it is built from.
Most readers should be aware of the following identity: ∣A∪B∣=∣A∣+∣B∣−∣A∩B∣ which is of utmost importance when it comes to counting the size of sets as it provides a relationship between the cardinality of a set A∪B and the cardinalities ∣A∣ and ∣B∣. A much simpler case arises when A∩B=∅ i.e. when A∪B is a disjoint union- the identity reduces to ∣A∪B∣=∣A∣+∣B∣.
This identity can be extended to a more general one. Given sets A1,…,An with Ai∩Aj=∅ for 1≤i<j≤n, we have ∣A1∪⋯∪An∣=∑j=1n∣Aj∣. This is readily proven by induction on n. In the case of n=2, the identity holds, as seen above. Now, assuming that the identity holds for some n>2, we can verify it for case n+1:
This allows us to state one of the techniques used for problem solving in this course: Recognizing that a set can be expressed as a disjoint union of other sets. The general identity derived above proves to be helpful in this technique. Now, back to set operations:
CARTESIAN PRODUCT: A×B={(a,b)∣a∈A,b∈B}. This object has the property that ∣A×B∣=∣A∣⋅∣B∣.
CARTESIAN POWER: Ak={(a1,…,ak)∣a1∈A,…,ak∈A}. This object can be expressed alternatively as the cartesian product of A with itself, k times and has the property that ∣Ak∣=∣A∣k.
Example: Let S={0,1}k={(z1,…,zk)∣z1,…,zk∈{0,1}}. S is called the set of k-tuples of bits. Using the property of the cartesian product outlined above, we see that ∣S∣=∣{0,1}k∣=∣{0,1}∣k=2k.
Example: Let S={(z1,z2,z3)∈{0,1}3∣z1+z2+z3≤1}. Find ∣S∣. In this case, it may be easy to write out the elements of S and count them but we shall use the method of expressing S as a union of 2 disjoint sets to see how it works.
We can write S=A∪B where A={(z1,z2,z3)∈{0,1}3∣z1+z2+z3=0} and B={(z1,z2,z3)∈{0,1}3∣z1+z2+z3=1}. So clearly, A∩B=∅ by the definition of A and B. This allows us to say that ∣S∣=∣A∣+∣B∣.
Now, we can count: A={(0,0,0)} so ∣A∣=1 and B={(1,0,0),(0,1,0),(0,0,1)} so that ∣B∣=3. Therefore, ∣S∣=1+3=4.
Binomial Coefficients
These come up a lot when it comes to enumeration and counting problems so getting comfortable with the binomial coefficients is important.
Definition: Sk,n:={Ω∈{1,…,n}∣∣Ω∣=k}. This will prove to be an important set definition once some of its properties have been uncovered. Sk,n is the set of all subsets of {1,…,n} of size k.
In general, this set can be used to understand many real-life situations e.g. the number of 5-card hands from a standard deck of 52 cards is ∣S5,52∣.
THEOREM: let 0≤k≤n. Then ∣Sk,n∣=k!n(n−1)⋯(n−k+1).
Proof: Let Lk,n:={(a1,…,ak)∈{1,…,n}k∣ai=aj for i=j}. Lk,n is the set of all k-tuples of distinct elements of {1,…,n}.
What is ∣Lk,n∣? Well, since the elements in the k-tuples are distinct, there are n possibilities for the first one, n−1 possibilities for the second one and so on. In general, for the j-th element in the tuple, there n−(j−1) possibilities. Therefore, we get that ∣Lk,n∣=n(n−1)⋯(n−(k−1)).
Now, we shall analyse a relationship between Lk,n and Sk,n which will allow us to compute ∣Sk,n∣. Observe the following:
Each (a1,a2,…,ak)∈Lk,n can be mapped to an element {a1,…,ak}∈Sk,n. This mapping removes the ordering from the k-tuple.
Each {a1,…,ak}∈Sk,n is then the image of k! different elements in Lk,n.
Using these 2 observations, we may see the following:
∣Lk,n∣=∑Ω∈Sk,nk!=∣Sk,n∣⋅k!
⟹∣Sk,n∣=k!∣Lk,n∣, as desired.
Now, we may move forward to define the following object:
Binomial Coefficient: (kn):=k!n(n−1)⋯(n−k+1) for 0≤k≤n. Note that by this defintion, it can also be seen that (kn)=(n−k)!k!n!. This version of the definition immediately reveals the symmetrical nature of the binomial coeffcient: (kn)=(n−kn) because of the commutativity of multiplication in the denominator. For convenience and convention, we also define (kn)=0 for k>n.
Now, let Ak,n={(z1,…,zn)∈{0,1}n∣z1+⋯+zn=k} for 0≤k≤n. What is ∣Ak,n∣? In previous proof, the size of Sk,n was determined by analysing its relationship with Lk,n whose size we could easily determine: we showed Lk,n is k! times larger than Sk,n, essentially proving a 1 to many mapping Sk,n↦Lk,n. This strategy can also be used to find ∣Ak,n∣ with the only difference being that this time, the relationship is 1-1 with a set of size (kn). We conveniently already have such a set! - ∣Sk,n∣=(kn), as previously shown.
THEOREM: ∣Ak,n∣=(kn)
Proof: Observe the following:
Each (z1,…,zn)∈Ak,n has the propery that there must be k positions
which where the element in the n-tuple is a 1. So each of these n-tuples can be mapped
to a subset Ω={j∈{1,…,n}∣zj=1}∈Sk,n.
Each subset Ω∈Sk,n can be mapped to an element (z1,…,zk)∈Ak,n
where zj=1⟺j∈Ω
This shows a 1-1 mapping: for each set in Sk,n, there is one corresponding n-tuple in Ak,n and for every n-tuple in Ak,n, there is one corresponding set in Sk,n. Therefore, ∣Sk,n∣=∣Ak,n∣, as desired.
THEOREM: Binomial Theorem: Let n be a non-negative integer. Then (1+x)n=∑k=0n(kn)xk.
Proof:
We can write 1+x=∑z∈{0,1}xz. Using this, we can do the following:
At this point, what may help tie this problem up is the realisation that the set {0,1}n=A0,n∪A1,n∪⋯∪An,n, a disjoint union. This is a useful decomposition because we already know about the size of An,k. Then:
Example: This binomial theorem can be useful when simplifying expressions. Here is a simple example: Simplify ∑k=0n(kn). At first glance, this looks quite similar to the formula of the binomial theorem, just with x set to 1. Let's use this observation:
∑k=1n(kn)xk=(1+x)n
Setting x=1:
∑k=1n(kn)=2n. And there we have it- a simplified expression.
Exercise: Show that ∑0≤j≤2n(2jn)=2n−1. Basically, that the sum of the binomial coefficient (kn) where k is even is half of the sum of all of the terms. Again, we may use the binomial theorem here to do most of the heavy lifting. Realise, first that
∑0≤j≤2n(2jn)=21(∑k=0n(kn)+∑k=0n(kn)(−1)k)
Here, the expression in question has been replaced as an average of 2 terms, both derived from the binomial theorem. The second term in the RHS sum has the job of cancelling out all of the odd terms in the sum from the first term. Then, since the even terms will have been doubly added, the sum needs to be halved, making it an average.
Next, note that the second term in the RHS sum is just the binomial theorem for x set to −1, giving us (1+(−1))n=0. Therefore, we get:
0≤j≤2n∑(2jn)=21(2n+0)=212n=2n−1
As desired.
Pascal's Triangle: This is an object of interest in mathematics simply because of the number of properties that emerge from it. It is formed by arranging the binomial coefficients by n and k into a triangle.
=
One of the first things that one may notice is that every number in the triangle is a sum of the 2 numbers above it. For example, 10 is the sum of 4 and 6 above it. Therefore, it's easy to construct the triangle, row by row but more importantly, this observation can be conjenctured as follows:
(kn)=?(k−1n−1)+(kn−1)
We'll prove this, and similar identities in the next section.
Combinatorial Proofs
Essentially, these are the kinds of proofs which rely on counting
A key method used is establishing identities by counting a set in two different
ways.
Example: ∀1≤k≤n, (kn)=(k−1n−1)+(kn−1). As conjenctured from Pascal's triangle above.
Proof: We want to approach the solution by counting a set in two different ways. One way will give us the expression on the LHS and the other way will give us the expression for the RHS. Since these expressions are representing the cardinality of the same set, they must be equal.
Recall Sk,n={Ω∈{1,…,n}∣∣Ω∣=k} which has the property that ∣Sk,n∣=(kn). This gives us the RHS expression. Now, we need to count the size of Sk,n in a different way which would give us the LHS expression.
Let A={Ω∈Sk,n∣n∈Ω} and B={Ω∈Sk,n∣n∈/Ω}. Clearly, A∩B=∅ and Sk,n=A∪B which, by our findings in the previous section, conventiently allows us to do ∣Sk,n∣=∣A∣+∣B∣.
Case k=1: In this case, A={{n}} and B={{1},…,{n−1}}. So that (1n)=∣S1,n∣=∣A∣+∣B∣=1+(n−1)=(0n−1)+(1n−1).
Case k>1: In this case, A and B can be viewed as follows:
A={{n}∪Q∣Q⊆{1,…,n} and ∣Q∣=k−1}={{n}∪Q∣Q∈Sk−1,n−1}
This identity can also be proven similarly using the set Ak,n.
Example: (Hockey Stick Identity). This is one which also arises from Pascal's triangle. In this case, for any number in Pascal's Triangle, the "hockey stick" of that number is the set of numbers formed by moving one row up and to the left, then following the diagonal of numbers from there to the right end of the triangle. This identity basically says that for a number in Pascal's triangle, the sum of all other numbers in its "hockey stick" is itself. Mathematically represented as:
(nn+k)=∑i=0k(n−1n+i−1) for n,k≥0.
We shall prove this by counting a set in two different ways. Take Sn,n+k={Ω⊆{1,…,n+k}∣∣Ω∣=n}. So that ∣Sn,n+k∣=(nn+k) giving us the LHS of the identity. Now, we need to count Sn,n+k differently to get the RHS.
To start, we can partition Sn,n+k as follows. Let Wi={Ω∈Sn,n+k∣largest element of Wi is n+i} for 0≤i≤k. So then we can represent Sn,n+k=W0∪⋯∪Wk, a disjoint union. Before moving on, it is important to represent Wi={{n+i}∪Q∣Q∈Sn−1,n+i−1}.
Bijections are important tools when it comes to combinatorial proof. They enable one to show that two sets have the same size and do so conveniently. Let us try to understand bijections. How does one know that two sets have equal size?
Example: Let S={1,…,100} be the set of integers between 1 and 100. Let T={1,4,9,16,…,10000} be the set of perfect squares between 1 and 10000.
In this example, clearly, these two sets have the same number of elements. A pairing of 1 with 1, 2 with 4, 3 with 9, 4 with 16 and so on all the way to 100 with 10000 should make this clear. There are no uncounted elements and since there are 100 elements in S, the fact that they can be paired uniquely with elements in T (with none being left unpaired from T or S) shows that T has size 100 too. Let us formalize this notion:
Definition: A bijection is a mapping f:S↦T satisfying:
∀x1,x2∈S(f(x1)=f(x2))⟺(x1=x2)
This says that "For all x1,x2 in S, f(x1)=f(x2) if and only if x1=x2". A function satisfying this property is said to be one-to-one. This is because every element of S maps to a unique element in T under the action of f.
∀y∈T(∃x∈S[f(x)=y])
This says that "For every y in T, there exists an x in S which satisfies f(x)=y. A function satisfying this property is said to be onto. This is because the image of S under the action of f is the whole of T i.e. every element in T can be obtained as a result of the action of f on some value in S.
To test that a function is a bijection, it is sufficient to show that it is both one-to-one and onto as stated in the above 2 bullets.
Example(Cont'd.): We can now formalize the pairing that was discussed in the example above. Define a function f:S↦T like so: ∀x∈S, we have that f(x)=x2. Let us show that this function satisfies the above two properties to be called a bijection.
Verify one-to-one: If we have x1,x2∈S such that f(x1)=f(x2). This would mean that x12=x22. But since both x1,x2 are members of S, they must be positive. So taking the square root leaves us with x1=x2. Therefore, f is one-to-one.
Verify onto: If we have y∈T, then by definition, it is a perfect square between 1 and 10000 therefore, its square root must lie between 1 and 100, which are the members of S. Therefore, we can take x=y∈S and then we get that f(x)=(x)2=y. This shows that f is onto.
Therefore, the function f is a bijection. We still have to prove that this bijection actually gives us the result that the sets S and T have the same size.
THEOREM: Suppose S,T are two finite sets and f:S↦T. Then:
If f is one-to-one, then ∣S∣≤∣T∣
If f is onto, then ∣S∣≥∣T∣
If f is a bijection, then ∣S∣=∣T∣
Proof:
Suppose that f is one-to-one. This means that every element of S maps to a unique element in T under the action of f. Therefore, there must be at least ∣S∣ elements in ∣T∣ i.e. ∣S∣≤∣T∣.
Suppose that f is onto. Then, every element in T is the image of some element in S. Therefore, by the definition of a function (that an input will always produce the same output), this allows us to see that the size of S is at least the size of T i.e. ∣S∣≥∣T∣.
If f is a bijection, it is both one-to-one and onto. Therefore, we see that it must be the case that (∣S∣≤∣T∣)∧(∣S∣≥∣T∣), which leaves us with one option: ∣S∣=∣T∣.
As desired.
There is another method for showing that a function is a bijection. This one relies on the inverse of a function, which comes sort of naturally when speaking of bijective maps- you need a way to reverse the action of the function. Inverse functions will capture this idea and provide yet another tool for combinatorial proofs.
Inverse Functions
Definition An inverse of a function f:S↦T is a function f−1:T↦S which satisfies the following properties:
∀x∈S(f−1(f(x))=x).
This says that for every element in S, the action of f on x can be undone through the action of f−1.
∀y∈T(f(f−1(y))=y). This says that for every element in T, the action of f−1 can be undone through the action of f.
This definition can be used to verify that a function is an inverse function. Now, let us the utility of inverse functions through this theorem:
THEOREM: A function f:S↦T has an inverse function if and only if it is a bijection.
Proof: (Forwards) Assume that f has an inverse function f−1:T↦S. We need to show that f is a bijection.
(Proving that f is one-to-one) Take x1,x2∈S such that f(x1)=f(x2). Since f−1 is the inverse of f, it reverses the action of f, so we get
f−1(f(x1))=f−1(f(x2))⟹x1=x2
This shows that f is one-to-one.
(Proving that f is onto) Pick a y∈T. We need to find a value x∈S such that f(x)=y. Well, take x=f−1(y) and then:
f(x)=f(f−1(y))=y
This shows that f is onto and, together with the above, that f is a bijection.
(Backwards) Now, let us assume that f is a bijection. Need to show that f has an inverse function.
We can construct one as follows: given a y∈T, we can find an x∈S such that f(x)=y. This is possible because the assumption of bijection means that f is onto. Keeping this in mind, we make the following definition:
Let f−1:T↦S be defined for all y∈T, f−1(y)=x as stated in the passage above. To prove that f−1 is an inverse:
Pick an x∈S. Say y=f(x). Then: f−1(y)=f−1(f(x))=x.
We see that f−1 reverses the action of f.
Now pick a y∈T. Say x=f−1(y). Then: f(x)=f(f−1(y))=y
This completes the proof of the backwards direction and finally proves the theorem above.
Generating Series
Context: Let S be a set. We define a "weight" function as a function w:S↦{0,1,2,3,…}.
Question: How many δ∈S such that w(δ)=k?
Examples:
With S={a,b,c} and w:S↦{0,1,2,3,…} defined by: w(a)=1, w(b)=1, w(c)=2. There are two elements of S with weight 1 and one element with weight 2. This is the kind of information that we wish to capture.
If S={Ω⊆{0,1,2,…,n}∣w(Ω)=∣Ω∣} (S is the set of subsets of {1,2,…,n} with the weight of a subset being defined as the size of the subset). The number of elements with weight k is easily obtained (from our previous work with Sk,n) as (kn).
Definition: A generating function for a set S with respect to a weight function w:S↦{0,1,2,…} encodes information regarding the counts of elements in S with the weights {0,1,2,…}. It is denoted by ΦS(x) and is defined as
ΦS(x)=∑δ∈Sxw(δ)
When one stares at this definition long enough, one realises that it is a way of keeping track of the set of weights that occur in S, with respect to the weight function w.
We can manipulate the above definition to make this clearer:
The coefficient of the xn term tells us the number of elements in the set S which have the weight n (of course, with respect to the weight function that was used to compute the series).
Examples:
What is the generating series of the corresponding example 1 above?
We compute: ΦS(x)=xw(a)+xw(b)+xw(c)=x1+x1+x2=2x+x2. Note how even if the weight function were defined differently
(like w(a)=w(c)=1 and w(b)=2). The resulting generating series is the same. What this means is that there is information about the weights of elements of S that is not preserved.
For corresponding example 2 above, we have the generating function
This may all be nice, but what is the point of encoding this information in a generating series?
Theorem: Suppose that S is a finite set with the weight funtion w:S↦{0,1,2,…}. Let ΦS be the generating series for S with respect to w. Then:
∣S∣=ΦS(1). This says that the size of the set S is equal to its generating series evaluated at x=1.
∑δ∈Sw(δ)=dxdΦs∣∣x=1. This says that the sum of the weights of the elements in S is exactly equal to the first derivative of the generating series, evaluated at x=1.
Let's see why this works:
Proof:
Here is a short proof for statement 1 above
ΦS(1)=δ∈S∑xw(δ)∣∣x=1=δ∈S∑1w(δ)=δ∈S∑1=∣S∣
2. Similarly, here is a short proof for statement 2
Example: Consider a sequence of n coin flips and suppose you win $1 with each time that you get heads after tails or tails after heads. Compute the expected winnings using generating series.
The above theorem allows us to get the sum of the weights of the elements in S as the number of elements. Putting these together, we can get the average weight of an element in S. Contextualizing to this example, we can benefit by defining a weight function in terms of the rewards and then compute the average as explained to get the expected winnings. Take a moment to contemplate how you would solve this before moving on...
We need to get a 'mathematical handle' on the occurrence of tails after heads and heads after tails. So, for a sequence of n coin flips, we can concoct n−1 binary variables indicating whether the next flip in the sequence differs from the current one:
∀i∈[1,n−1] define zi={1,0,if (i+1)th outcome is different from ithotherwise
Using this string of binary variables z1,z2,…,zn−1, one can derive useful information about the sequence of n coin flips. We can put this all in a set S={(z1,z2,…zn−1)∣zj∈{0,1}} with the weight function defined by w((z1,…,zn−1))=∑i=1n−1zi.
We can now view S={0,1}n−1 and the weight function is essentially computing the number of ones in an entry from S- each of which corresponds to a dollar reward (as per the rules of the game). Then we can compute ΦS(x)=∑i≥0aixi, where ai is the number of elements in S which have weight i. Fortunatly, we have alredy done this (see work on Ak,n)- we know that ai=(in−1). Therefore, we finally get ΦS(x)=(1+x)n−1, by the binomial theorem.
So far, we have only been considering the cases in which the set S is finite. What happens to ΦS when S is an infinite set? In this case, we have to make the following distinction:
If S is finite, then ΦS(x) is a polynomial
If S is infinite, then ΦS(s) is a formal power series (given that the coefficients ak are finite)
At this point, we introduce the concept of formal power series through which we shall obtain tools to deal with generating series.
Formal Power Series
A formal power series is an expression c0+c1x1+c2x2+… where (c0,c1,c2,…) is a sequence of rational numbers (for our intents and purposes, the coefficients are rational numbers, but the important part is that they are finite). This is to concretize the notion of a generating series as a formal power series, and to then to apply the properties of formal power series in the realm of generating series.
Let C(x) be a formal power series (to be abbreviated as f.p.s). We can write C(x)=∑i=0∞cixi and define [xn]=cn. So if C(x)=∑i=0∞2ixi, then [x0]C(x)=1, [x1]C(x)=2, [x2]C(x)=4 and so on. Let us now consider the operations which can be performed with formal power series.
Operations on Formal Power Series
Consider two formal power series A(x)=∑i=0∞aixi and B(x)=∑i=0∞bixi.
EQUALITY: ((A(x)=B(x))⟺(∀k∈N(ak=bk))). This states that two formal power series are considered to be equal if and only if the coefficients ak and bk are the same for all natural number k.
ADDITION: A(x)+B(x)=∑k=0∞(ak+bk)xk. This is a valid definition for addition since the resulting series is also a f.p.s- the coefficients (ak+bk) are all finite rational numbers (by the declaration that A and B are formal power series above).
Note that we can identify the term [xn]A(x)B(x)=∑j=0najbn−j from the last expression above. This term is also a rational number since it is a finite sum of product of pairs of rational numbers.
Example: Let A(x)=(1+x)m and B(x)=∑k=0m2kxk. Let us compute [xn]A(x)B(x).
We know, from the binomial theorem that A(x)=∑j=0m(jm)xj. So [xn]A(x)=(jm). It is also clear from the summand of B(x) that [xn]B(x)=2n. Putting these two together:
Example (Multiplicative Shift): This example illustrates how for all k,n≥0, the term [xn]xkA(x), is similar to a shift in indices. Let A(x)=∑j≥0ajxj.
[xn]xkA(x)=[xn]xk(j≥0∑ajxj)=[xn]j=0∑∞ajxj+k, let l=k+j=[xn]l=k∑∞al−kxl={an−k0, if n−k≥0, if n−k<0
This is a rule/identity that will be used time and time again.
Definition (Formal Power Series Inverses): Let A(x),B(x) be formal power series satisfying A(x)B(x)=1, then one says that B(x) is the inverse of A(x) and writes B(x)=A(x)−1 or B(x)=A(x)1.
Example: Let A(x)=1−x, B(x)=∑i=≥0xk.
We can write A(x), using the binomial theorem, as A(x)=(1−x)1=∑k=01(k1)(−x)k. Using this, it can be seen that [xn]A(x)=(−1)n(n1) and this forms the sequence (1,−1,0,0,0,…).
Now, using the multiplication rule:
an=[xn]A(x)B(x)=j=0∑n[xj]A(x)[xn−j]B(x)=j=0∑n(−1)j(j1)(1)={01 if n≥0 if n=0
The only term with non-zero coefficent in A(x)B(x) is the x0 term (the constant term), which has coefficent 1. Therfore, we see that A(x)B(x)=1 ans as such we can write ∑i=0∞xi=1−x1, using the notation in the definiton of inverses in formal power series.
Theorem (Negative Binomial Series): As an extension of the example above, we have the following theorem: for all k=1,2,3,… we have that
(1−x)k1=n=0∑∞(k−1n+k−1)xn
Proof: The proof of this theorem is provided using induction on k.
Now that we have (1−x)k+11 expressed as a product of two power series, we can use the multiplication rule to compute the coefficient for the xn term.
[xn](1−x)k+11=[xn](j=0∑∞(k−1j+k−1)xj)⋅(i=0∑∞xi)=j=0∑n(k−1j+k−1) by the Multiplication Rule=(kn+k) by the Hockey Stick Identity
The inductive step holds too and therefore so does the theorem for all k=1,2,…. This completes the proof. ■
This theorem is also an important one and should be familiarized with as it can help with quick conversions to and from power series.
NOTE: There are conditions under which formal power series have inverses. Here's an example highlighting a case when the f.p.s. has no inverse. Consider A(x)=x, which is claimed to have no inverse. Let us proove this.
The proof provided here is by contradiction. Suppose A(x) did have an inverse. Then, there would exist another f.p.s. B(x)=∑i≥0bixi such that A(x)B(x)=xB(x)=1. This looks like a good place to use the "Multiplicative shift" mentioned before to get:
[xn]A(x)B(x)=[xn]xB(x)={bn−10 if n≥0 if n=0
Note that if B(x) is an inverse for A(x), then [x0]A(x)B(x)=1, but with the result above, we see that [x0]A(x)B(x)=0, which is a contradiction. The starting assumption, that A(x) has an inverse must have been wrong then. Therefore, A(x)=x has no inverse.
Theorem (Condition for Inverses): An f.p.s. A(x) has an inverse iff [x0]A(x)=0.
Proof: [Forwards] First, assume that [x0]A(x)=an=0. Now, we need to show that A(x) has no inverse (This is proving the contrapositive which is logically equivalent).
Say B(x) is another f.p.s. with [x0]B(x)=b0.
Let us check [x0]A(x)B(x)=a0b0=0 since a0=0. This means that A(x)B(x) can never be equal to 1, no matter the f.p.s. B(x). Therefore, A(x) has no inverse. This proves that if A(x) has an inverse then a0=0.
[Backwards] Now, suppose that [x0]A(x)=0. We need to show that A(x) has an inverse. To do this, we let B(x)=∑i≥0bixi and show that A(x)B(x)=1 has a single solution. We can equivalenty solve for:
[xn]A(x)B(x)=j=0∑najbn−j
If n=0, then [x0]A(x)B(x)=a0b0. Comparing this to the [x0] coefficient on the Right Hand Side (RHS) of the equation A(x)B(x)=1, we get that a0b0=1, therefore b0=a01. This is a valid quotient since a0 is non-zero by assumption.
If n>0, then [xn]A(x)B(x)=a0bn+∑j=1najbn−j. All coefficents on the RHS of the equation are zero (except the constant term) so we can solve for bn from above as bn=a0−1∑j=1najbn−j, which can be solved iteratively for some desidered value of n. This completes the proof. ■
The kind of solution obtained for bn in the above theorem is called a linear recurrence because bn is defined as a linear combination of previous bi terms (for 0≤i<n). We can use this in solving equations of formal power series using inverses. This is studied now.
Theorem: Let A(x) and C(x) be f.p.s. If [x0]A(x)=0 the there is a formal power series B(x) such that A(x)B(x)=C(x).
Proof: This short and simple proof builds off of the conditions for inverses. Since [x0]A(x)=0, we know that it has an invserse. Say the inverse of A(x) is B′(x) i.e. A(x)B′(x)=1. Now, let B(x)=B′(x)C(x). This gives us that A(x)B(x)=(A(x)B′(x))C(x)=C(x). This completes the proof. ■
Example: Let A(x)=1−x5+x2. Compute [xn]A(x) for all n≥0. Note that in this example, if one considers A(x) to be quotient of polynomials, the one in the denominator has degree lower than the polynomial in the numerator.
Example: Let A(x)=1−x2+x32+7x−3x2. Note that in this example, if one considers A(x) to be a quotient of polynomials the the polynomial in the denominator has higher degree than the one in the numerator. In this case, we derive a recurrence relationship which can be used to compute [xn]A(x) for all n≥0.
Write A(x)=∑k≥0akxk so that [xn]A(x)=an. Now, rewrite the original expression for A(x) as (1−x2+x3)A(x)=2+7x−3x2. This process works by essentially comparing coefficients from the Left Hand Side (LHS) and RHS of the re-arranged expression for A(x).
Starting with the constant term. Since the LHS and RHS are equal, we have that [x0](1−x2+x3)A(x)=[x0]2+7x−3x2. When the expressions for the coefficients are computed, we get that a0=2.
Next, the linear term: [x1](1−x2+x3)A(x)=[x1]2+7x−3x2. Computing the expressions on each side of the equation, we get a1(1)−a0(0)=7. We get a1=7.
Next, the quadratic term: [x2](1−x2+x3)A(x)=[x2]2+7x−3x2. Computing the expressions on each side of the equation, we get a2+a1(0)−a0=−3. Evaluating using the previous computations, we get a2=−1.
Now, we derive a general recurrence using xn term, where n≥3: [xn](1−x2+x3)A(x)=[xn]2+7x−3x2. Evaluating each side gives an−an−2+an−3=0. This gives us the expression for the general an as an=an−2−an−3 for n≥3.
The expression an as an=an−2−an−3 for n≥3 is called a recurrence relation. But note that to compute an, one needs to know an−2 and an−3. Therefore, for the recurrence relation to be of any use, there must be a set of initial conditions provided. In this case, they were computed to be a0=2,a1=7 and a2=−1. From this, it is possible to compute any an iteratively.
Notes: We began to derive a general recurrence relation when we reached the cubic (or higher) term. This was because the x3 term (and all after it) for the expression on the RHS have zero coefficient.
Also note how the a0 term is non-zero.
Now, more generally, if there are f.p.s. A(x)=∑i≥0aixi with a0=0 and C(x)=∑i≥0cixi. By a theorem, we know that there exists a B(x) such that A(x)B(x)=C(x). We know that B(x)=A′(x)C(x), where A(x)1 is the inverse of A(x). We can now derive a general linear recurrence relation defining the coefficients of B(x).
Theorem: Let A(x),C(x) be formal power series with a0=0. Let B(x)=A(x)C(x). Then, bn=[xn]B(x) is determined by the linear recurrence:
bn=a01(cn−∑j=0n−1an−jbj)
Proof: Again this proof is short and straight-forward: To prove the recurrence, we start by comparing coefficients of the LHS and RHS i.e. checking [xn]A(x)B(x)=[xn]C(x). We get:
Rearranging the last equation reveals the desired recurrence:
bn=a01(cn−∑j=0n−1an−jbj)
COMPOSITION: This is yet another operation defined on formal power series. Let A(x)=∑j≥0ajxj and B(x)=∑j≥0bjxj be formal power series. If b0=0 then we define:
A(B(x))=j≥0∑ajB(x)j
In order to verify that composition of formal series is well defined, we check that the result of composing A(x),B(x) as defined above is a formal power series too.
[xn]A(B(x))=j≥0∑aj[xn]B(x)j
At this stage, note that the logic from multiplicative shift can be applied here to the term [xn]B(x)j=[xn](b1x+b2x2+…)j. If j>n, then there are no terms with xn left (all terms have exponent xn or higher). Therefore, the sum can be reduced to run from 0 to n instead.
[xn]A(B(x))=j=0∑naj[xn]B(x)j<∞
The coefficient in finite because aj and [xn]B(x)j are all rational and there is a finite count of summands. Therefore, A(B(x)) is a formal power series and the operation of composition is well defined.
Note: Here is an example when the condition that b0=0 is necessary. Consider the composition of A(x)=∑i≥0xi and B(x)=1. Then, we get that A(B(x))=∑j≥01→∞. It can cause such problems.
Now, we shall look at using composition of formal power series to compute inverses.
Theorem: Let A(x),B(x) be formal power series such that [x0]A(x)=0 and [x0]B(x)=0. Then:
[A(B(x))]−1=A−1(B(x))
Proof: Write A(x)=∑i≥0aixi. Since a0=0, we know that A(x) has an inverse. Let A−1(x)=∑i≥0αixi. Since [x0]B(x)=0 we know that it can be composed into other formal power series namely A(x) and A−1(x). Let us now consider the product of A(x) and A−1(x), when composed with B(x):
A(B(x))A−1(B(x))=(j≥0∑aj[B(x)]j)(j≥0∑αj[B(x)]j)=AA−1(B(x))by the Multiplicative Rule=1
This shows us that A(B(x)) is the invserse of A−1(B(x)) since their product is 1. Therefore, we get the desired result. ■
Example: Computing [xn](1+3x)−2(1−x)−1. The key idea here is to first express the term (1+3x)−2 as a composition of formal power series, then use the theorem above.
Let A(x)=−3x. So we get that (1+3x)−2=(1−A(x))−2. This is good because the negative binomial series can be used to obtain a series representation.
(1+3x)21=(1−A(X))21=n≥0∑(1n+1)(A(x))n by the Negative Binomial Series=n≥0∑(n+1)(−3x)n=n≥0∑(−3)n(n+1)xn
The sum lemma for generating series can be generalised to:
Generalization of Sum Lemma: Let S=A1∪A2∪…An where Ai∩Aj=∅ for 1≤i=j≤n. Also let w:S↦{0,1,2,…}. Then
ΦS(x)=∑i=0nΦAi(x)
Proof: This is proven by induction on n. The base case is trivial. Assume, inductively that the generalization is true for some n. Now we have to show that it holds for n+1. So we have S=A1∪A2∪…An∪An+1 and where Ai∩Aj=∅ for all 1≤i=j≤n+1.
S=A1∪A2∪…An∪An+1=(A1∪A2∪…An)∪An+1
Since S is a disjoint union of (A1∪A2∪…An) and An+1, we can use the sum lemma:
ΦS(x)■=Φ(A1∪A2∪…An)(x)+ΦAn+1(x)=i=0∑nΦAi(x)+ΦAn+1(x)by Inductive Hypothesis for case n=i=0∑n+1ΦAi(x)
Product Lemma: Let A,B be sets with weight functions α:A↦{0,1,2,…} and β:B↦{0,1,2,…}
respectively. Define a weight function w:A×B↦{0,1,2,…} by:
Definition: A composition of an integer n is a tuple of positive integers (a1,…,ak) such that ∑i=1kai=n. We say that this composition of n has k parts. Note that a composition is a tuple and therefore order matters! Furthermore, note that the parts in a integer composition are strictly positive integers and therefore 0 is not a valid part.
Also, by convention, n=0 has one composition- the empty composition ().
The goal is to answer questions of the form: "How many integer compositions of n have some property P?" The property P could be anything of interest. Here are some examples:
P= the number of parts is k
P= the parts are even
P= the parts are congruent to 1 modulo 3.
The general strategy to solve such problems is as follows:
Define a set, say S, which contains all of the intger compositions with the property P.
Define an appropriate weight function w, such that the solution to the question of interest is given by [xn]ΦS(x).
Compute the generating series of S, with respect to the weight function w, ΦS(x).
Finally, compute [xn]ΦS(x).
The toolbox that will allow us to perform these steps includes the following:
Sum lemma: if A∩B=∅, then ΦA∪B(x)=ΦA(x)+ΦB(x)
Product lemma: with the condition of the weight functions satisfied, one has ΦA×B(x)=ΦA(x)ΦB(x)
Power shifts: [xn]xkA(x)={[xn−k]A(x)0 if n≥k if n<k
Geometric series: ∑i=0∞xi=1−x1
Composition of geometric series: If [x0]A(x)=0, then ∑i=0∞(A(x))i=1−A(x)1
Now, using the multiplicative shift rule, one sees that:
[xn]ΦS(x)={0(k−1n−1) if n<k if n≥k
The last case evaluates to the binomial coefficient given because of the negative binomial theorem.
Binary Strings
This is another object which can be studied very interestingly with the concepts of generating series. First, some definitions regarding binary strings:
Definition: A binary string (a.k.a. {0,1}-string) is a strings consisting of only ones and zeroes. The length of a binary string is defined to be the total number of ones and zeroes in the string.
Definition: b is a substring of s if s=abc for some binary strings a,c (which could possibly be empty strings). The expression abc will become clear when concatenation is defined soon.
Definition: A block in a binary string is a maximal, non-empty substring which consists of only ones or only zeroes.
Definition: Concatenation of two binary strings a,b is the string ab.
Definition: If A and B are sets of binary strings, we define AB={ab∣a∈A and b∈B}. So AB is the set consisting of all the concatenations of a string in A with a string in B.
Also, Ak is defined to be A⋅A⋅⋯⋅A, k times. Note that this notation is overloaded with the Cartesian product for regular sets.
Definition: For any set A of binary strings, define the star operation as follows:
A∗={ϵ}∪A∪A2∪…=∪k=0∞Ak
where ϵ is defined to be the empty string (which has length 0).
With this star operation, it is possible to define the set of all binary strings as {0,1}∗. It also allows us to create more specific sets of strings (which, as in the case of integer compositions is the first step in solving a combinatorial problem involving binary strings). For example, the set {0}{00}∗ contains all of the strings of odd length containing only
zeroes.
The goal is to be able to answer questions of the form "How many binary strings of length n have property P?". Again, property P could be anything of interest in the context of binary strings:
P= no adjacent ones
P= exactly k occurrences of the substring 110
P= no occurrences of the substring 111.
The strategy is very similar to the one used for integer compositions:
Devise a set S of all binary strings which have the desired property P
Let w:S↦{0,1,2,…} be a weight function which, for all σ∈S, w(σ)= length of σ.
The desired answer will then be [xn]ΦS(x)
While the strategy may be the same as the one for integer compositions, binary strings have some nuances which require that the set of tools (specifically the product and sum lemmas) to be readjustment to handle the nuances. In particular, one may not immediately see how the product lemma may be used for binary strings as there is no operation of Cartesian products, just concatenation and these two are not the same.
Consider the two sets of binary strings A={010,01} and B={0,00}. The Cartesian product A×B has 4 elements. But observe the set of strings in the set AB={0100,10100,101}. Note that the product lemma does not hold in this case: ΦAB(x)=x3+x4+x5 and this is not the product of ΦA(x)=x2+x3 and ΦB(x)=x+x2. In particular, one cannot draw a bijection between AB and A×B- there are two elements in A×B, namely (010,0) and (01,00) which map to the same string "0100" in AB. To deal with this, we introduce the concept of ambiguity:
Definition: Let A and B be sets of binary strings. The expression AB is ambiguous if there exist a1,a2∈A and b1,b2∈B such that a1b1=a2b2.
The expression A∪B is ambiguous if A∩B=∅.
An expression involving concatenations and unions of sets of binary strings is ambiguous if any of the constituent concatenations or unions of sets is ambiguous.
An expression is said to be unambiguous iff it is not ambiguous.
Using this definition, for example, the set Ak is ambiguous if there exist a1,…,ak,a1′,…ak′∈A such that (a1,…,ak)=(a1′,…,ak′) and a1…ak=a1′…ak′.
Example: Show that the expression {0}{00}∗ is unambiguous.
First, we show that the star operation is unambiguous. The set {00}={ϵ}∪{00}∪{00}2∪…. Firstly, these are clearly disjoint unions. Next, note that the set {00}k is unambiguous for all k≥0; we have that {00}k={2k zeros00…00}.
Next, the concatenation with {0} is unambiguous. If we have two strings 0b=0b′, then clearly b=b′. So the expression {0}{00}∗ is unambiguous.
Now, here are some very useful unambiguous expressions for the set of all binary strings. They are important because they can be "refined" to smaller unambiguous sets.
{0,1}. This one is trivial.
{0}∗({1}{0}∗)∗. In this expression, a binary
string is split by the ones in the string.
{0}∗({1}{1}∗{0}{0}∗)∗{1}∗. This expression is called the block decomposition of the set of binary strings. A binary string here is viewed as a block of ones followed by a block of zeros, with a possible starting block of zeros and a possible ending block of ones.
Note: One may find it important to know that by the laws of symmetry, it is perfectly acceptable and valid to switch the ones and zeros in the above ambiguous expressions and end up with a valid unambiguous expression.
Depending on the situation, one of these expressions will be easier to manipulate to reduce it into a set of binary strings with desired properties. For example, in a question specifying conditions on the blocks of ones and the blocks of zeros, the block decomposition may be a good place to start.
Example: Derive an expression for the set of all binary strings with no 3 consecutive zeros.
Since this question only deals with the blocks of zeros, we can use the second one of the unambiguous decompositions and refine it down to an expression for the set of all binary strings with no 3 consecutive zeros.
In the expression {0}∗({1}{0}∗)∗, the parts of concern are the ones with zeros. The set {0}∗={ϵ,0,00,…} needs to be reduced down to the set {ϵ,0,00}.
Replacing {0}∗ with this reduced set in the unambiguous expression above results in the following set {ϵ,0,00}∗({1}{ϵ,0,00})∗ which is an unambiguous expression for the set of all binary strings which have no 3 consecutive zeros.
Equipped with this concept of unambiguity, one modification needs to be made before computing generating series for sets of binary strings. This is that the expression for the set S needs to be unambiguous in order to sanely compute its generating series. Here are some theorems to aid with this goal:
Theorem: Let A and B be sets of binary strings.
If AB is unambiguous, then ΦAB(x)=ΦA(x)ΦB(x). This is
called the product lemma for binary strings
If A∪B is unabiguous, then ΦA∪B(x)=ΦA(x)+ΦB(x).
This is called the sum lemma for binary strings.
If A∗ is unambiguous, then ΦA∗(x)=1−ΦA(x)1. This
is called the star rule.
Proof:
Let l(x) be the function which returns the length of the binary string x. This is the default weight function for binary strings, unless otherwise specified.
In this case, note that an element σ∈AB is unambiguously a concatenation of some element a∈A with a b∈B. ΦAB(x)=σ∈AB∑xl(σ)=a∈A and b∈B∑xl(ab)=a∈A and b∈B∑xl(a)+l(b)=(a∈A∑xl(a))(b∈B∑xl(b))=ΦA(x)ΦB(x)
This case is pretty simple as A∪B is unambiguous iff it is a disjoint union. Therefore the sum lemma can be used.
ΦA∪B(x)=ΦA(x)+ΦB(x)
We rewrite A∗=∪k=0∞Ak. Since Ai∩Aj=∅ for all 0≤i=j, the sum lemma can be applied. Also, since each Ak is unambiguous for all k≥0, the product lemma may be applied.
Now, this last term looks like a composition with the geometric series, but to be sure, we have to verify that [x0]ΦA(x)=0. Note that A0={ϵ} and that Aj∩Ai=∅ for all 0≤i=j by the unambiguity of A∗. This means that ϵ∈/A and so there is no string in A with length 0, so the constant term of ΦA(x) must be 0- so this is indeed a valid composition. Finally, we get:
ΦA∗(x)=k=0∑∞[ΦA(x)]k=1−ΦA(x)1
Example: Compute the number of binary strings of length n with no substring 11.
First, we need an unambiguous expression for the set of all binary strings with no substring 11. We can use symmetry and reverse one of the unambiguous expressions for the set of all binary strings to get a good starting point. We reverse the ones and zeros in the unambiguous expression {0}∗({1}{0}∗)∗ to get {1}∗({0}{1}∗)∗. Now, we refine this expression down so that every block of ones does not contain the substring 11. This gives the expression S={ϵ,1}({0}{ϵ,1})∗.
Now, proceeding to compute the generating series of the set S with respect to string length as the weight function. Note that the product and star rules may be applied because the expression for S above is unambiguous.
where fn=[xn]1−x−x21. This is actually part of the Fibonacci sequence. Computing the starting terms of fn is easy by comparing coefficients. We get f0=f1=1. Putting all of this together, we finally get that:
[xn]ΦS(x)={1fn+fn−1 if n=0 if n≥1
This makes intutive sense. When n=0, there is clearly only one string with length 0, which does not contain the substring 11, namely the empty string. Then, there are two strings of length 1 which do not contain the string 11 and so on.
Recursive Decompositions
Sometimes, recursive definitions can help to make computation of generating series easier. Note that recursive decompositions are just expressions and the rules that have so far been used to compute generating series will still be used with them, just with collection of like terms to get the final series.
Here is an example. Let us derive a recursive decomposition for the set of all binary strings, say S. One way to approach this is by selecting an arbitrary string in S and analysing it to see what could come next. In this case, any binary string could start with either a 1 or a 0 and it could be followed by a sequence of ones or zeros i.e. another binary string. Or, the binary string could simply be the empty string. This gives us the simple recursive definition for the set of all binary strings as S={ϵ}∪{0,1}S.
Now, let us try to compute the generating series of this set S with this representation. Note that we can still use the product lemma here as the concatenation is clearly unambiguous and the sum lemma because the union is disjoint.
By collecting like terms, one ends up with the generating series: ΦS(x)=1−2x1.
Example: Another example is an expression for the set of all binary strings
with no substring 11. Let the set of all such strings be S.
Firstly, a string in S is either the empty string, starts with a one or starts with a zero. Let A0 be the set of all binary strings which start with a zero. And let A1 be the set of all binary strings which start with a one. So S={ϵ}∪A0∪A1. Note that this is a disjoint union of sets and so the sun lemma is applicable.
Here are some observations that will make life easier. Firstly, an element in A0 is such that it begins with a 0 and then a sequence of ones and zeros follow i.e. another binary string. Therefore, we get the expression A0={0}S. Similarly, an element in A1 is either the string 1 or the string 1 concatenated to the beginning of every string in A0. More precisely, A1={1}∪{1}A0. Now, we can use these expressions in order to compute the generating series of the set S.
By collecting like terms and solving for ΦS(x) yields
ΦS(x)=1−x−x21+x
This is the same answer as the one obtained when using unambiguous expressions.
Coefficients of Rational Functions
This is a new tool that can be used when solving combinatorial problems such as the ones being dealt above. Specifically, these are tools for computing [xn]g(x)f(x) where f,g are polynomials.
A starting point will be the method of partial fractions. In specific cases, it can be used to split a rational polynomial to a form where its coefficients are easy to compute using composition with geometric series or the negative binomial series.
The method of partial fractions will be generalized to rational fractions of the form g(x)f(x) where f,g are polynomials and deg(f)<deg(g) and [x0]g(x)=0.
Notes: Here are some points before proceeding:
Without loss of generality (WLOG), we can assume that [xn]g(x)=1 since it is non-zero- this is fine because it can be scaled to 1 anyway by multiplying the numerator and denominator by a constant.
By the fundamental theorem of algebra, g can be factorized over the complex numbers as g(x)=(α1−x)e1…(αk−x)ek. Note that the multiplicities e1,…,ek≥1 and ∑i=0kei=deg(g). Also, the values α1,…,αk∈C∖{0}- they must be non-zero complex numbers since the constant term of g is non-zero.
Furthermore, if we let ri=αi−1 for 1≤i≤k, Since the constant term of g is effectively 1, we can re-write it as: g(x)=(1−r1x)e1…(1−rkx)ek
Now, we generalize the theorem of partial fractions. This will be just the statement and not the proof (I may include the proof at a later time).
Theorem [ Partial Fractions ]: Suppose f,g are polynomials with deg(f)<deg(g) and g(x)=(1−r1x)e1…(1−rkx)ek with r1,…,rk distinct complex numbers and e1,…,ek are integers greater that one which add up to the degree of g. Then:
This is useful because it allows one to reduce a quotient of polynomial to a state where it can be dealt with using the negative binomial theorem. For example, the quotient A(x)=(1−2x)(1−7x)2+2x2 can be reduced to a much more friendly form:
A(x)=1−2x−1−(1−2x)21+1−7x4
In this form, one can easily compute the coefficient of the xn using the negative binomial series for each individual term. This results in the following answer: [xn]A(x)=2n(−n−2)+4⋅7n.
Theorem: Suppose that f,g are polynomials with deg(f)<deg(g) and g is of the form g(x)=(1−r1x)e1…(1−rkx)ek. Then, there exist polynomials p1.…,pk such that deg(pi)≤ei for 1≤i≤k and [xn]g(x)f(x)=p1(x)r1n+⋯+pk(x)rkn.
Proof: This comes pretty simply using the following claim, applied to the generalised partial fractions theorem.
Claim: Let c,r∈C and j≥1. Then there is a degree j−1 polynomial q such that [xn](1−rx)jc=q(n)rn.
Proof of claim: We can use the negative binomial series here to expand the formal power series into a summation and then extract its coefficient as follows:
Note that the numerator has the form of a polynomial in x evaluated at a value n and the denominator is not at all dependent on n. This motivates us to define q(x)=(j−1)!c(x+j−1)(x+j−2)…(x+1), which is indeed a degree j−1 polynomial. This completes the proof of the claim.
Now, using this claim with each row in the generalized form of the partial fractions expression, the theorem now holds. For a row i in the partial fractions expression, one has ei such polynomials from the claim- each of them will be multiplied by an rin term and these can be factored to get ri multiplied by the sum of ei polynomials, of which the maixmum degree is ei−1 by the claim above. Let this sum of polynomials be the polynomial pi(x). This completes the proof of the theorem.
Now, this theorem gives us another way of quickly computing the coefficient of the xn term in a quotient of polynomials. One knows the form that the quotient will take, so what remains is to plug whatever available information exists in the context into this theorem and then attempt to compute the coefficients of the polynomials p1,…,pk.