Skip to content

Lower limit for sorting algorithms based on comparison

How fast can we sort n elements?

Until now, we covered some algorithms which base their decision on the comparisons between keys but we had no assumption regarding the input's nature.

So what is the inferior limit, when it comes to these algorithms?

  • A quick and wrong answer would be that the best time would be Ω(n), but this is wrong
Theorem: Any comparison sorting algorithm must perform Ω(nlog(n)) comparisons in the worst case {Thesis Inferior limit is Ω(nlog(n))Hypothesis All elements in input are distinct

1 - Decision Tree as a representation

Let us use a decision tree to represent the comparisons done by a comparison sorting algorithm, given a generic input:

inflimsort

We notice that:

  1. Every internal node is labeled from i to j, with i,j1,,n
    • It means we need to compare ai and aj
    • L-subtree, includes the following comparisons if aiaj;
    • R-subtree, includes the following comparisons if ai>aj;
  2. Every leaf represents a permutation <π1,,pin>
    • Such that aπ1aπ2aπn
      • A permutation such that every element is sorted by some criteria;

Given any comparison sorting algorithm, and some input of size n, we can construct a decision tree (binary tree Y/N) for each n, that models all possible permutations and comparisons:

  • Given any comparison sorting algorithm we can build a tree for every n
    • Each tree maps all the possible execution paths
  • The execution time T(n), which equals the count of comparisons, is the length of a path in the tree
    • The worst execution time is given by the height of the tree Tworst(n)=height(T)=h

The inferior limit on the heights of all the decisions trees (where each permutation appears like a leaf) behaves like the inferior limit of the execution time of any comparison sorting algorithm

Since it is a binary tree, we can use its height to find out the number of permutations which are exactly the number of leaves in the tree!

2 - How many leaves f() can a decision tree, like the one we just mentioned, have?

2.1 - f(T)n!

Every permutation must appear at least once, trivially

2.2 Lemma: a binary tree of height h has at most 2h leaves

Demonstration

Induction on h:

  1. Base Case: h=0, a tree with just one node which is both root and leaf
    • f(T)=f(h=0)=1
  2. Inductive Hypothesis: We assume this is true for any tree of height k<h
  3. Inductive Step: We show it for h
    • Tree with one child
      • f(T) is equal to f(T1) which has height h1, f(T)=f(T1)
      • Then, by Inductive Hypothesis, f(T)=f(T1)2h12hf(T)
    • Tree with both children
      • f(T)=f(T1)+f(T2), the leaves of the tree T equals the sum of its children's leaves
      • Let h1,h2<h be the height of the children
      • Then, by Inductive Hypothesis,
        • f(T1)2h1f(T2)2h2
        • f(T)=f(T1)+f(T2)2h1+2h222max(h1,h2)2h
        • 22max(h1,h2)=21+max(h1,h2)h=1+max(h1,h2)
        • 21+max(h1,h2)=2h
        • The property is demonstrated as f(T)2h

3: Theorem

Any comparison sorting algorithm requires Tworst(n)=Ω(nlog(n)) comparisons

Demonstration

We need to determine the height of a binary decision tree where each permutation appears as a leaf. Let us consider a decision tree with height h and l leaves which represents the sorting process of a comparison sorting algorithm given n elements:

2hln!log(2h)log(n!)hlog(n!)

Stirling's approximation for n!

n!2πn(ne)n(1+Θ(1n))

If n is sufficiently large, we can get the dominant term and substitute it with n

hlog(n!)log((ne)n)nlog(n/e)=n(log(n)log(e))hnlog(n)h=Ω(nlog(n))

Corollary

Heapsort and Mergesort are asymptotically efficient comparison sorting algorithms.