Skip to content

Quick Sort

OperationMethodTimeAdaptive?In-Place?Stable?Online?
Quick sortDivide et Impera, PartitioningTb(n)=Θ(nlog(n)),Tavg(n)=O(nlog(n)),Tw(n)=Θ(n2)NYNN

Idea

Quick sort belongs to a class of algorithms which use a Divide-et-Impera approach.

Quick Sort

The Algorithm - Divide et Impera

  1. Divide: After choosing a pivot element from A such that pivot(A)=A[q], the vector is partitioned in two sub-vectors A[pq1] and A[q+1r] (can be empty), such that
    1. A[i]A[pq1]A[i]A[q]
    2. A[j]A[q+1r]A[j]A[q]
    3. A[q] is the pivot, the element used to create the partitions and every element that is smaller or larger, is shifted to one of the sub-arrays (so that the conditions are respected).
  2. Impera: sorts the sub-vectors recursively through quicksort, unless the input is trivially small (01 element).
  3. Merge: in this case there is no merge phase, because the algorithm sorts in place
python
quick_sort(Array A, int p, int r)
    if(p < r):
        q = Partition(A, p, r);
        quick_sort(A, p, q-1);
        quick_sort(A, q+1, r);

partition(Array A, int p, int r)
    x = A[r];   #Last element
    i = p-1;
    for (j = p to r-1):
        if (A[j] <= x): #Swap if the invariant holds true
            i++;
            swap(A[i], A[j]);
    swap(A[i+1], A[r]); #Put pivot back to the right position
    return i+1;
quick_sort(Array A, int p, int r)
    if(p < r):
        q = Partition(A, p, r);
        quick_sort(A, p, q-1);
        quick_sort(A, q+1, r);

partition(Array A, int p, int r)
    x = A[r];   #Last element
    i = p-1;
    for (j = p to r-1):
        if (A[j] <= x): #Swap if the invariant holds true
            i++;
            swap(A[i], A[j]);
    swap(A[i+1], A[r]); #Put pivot back to the right position
    return i+1;

Invariant

INV{x=A[r] is always truek[pi]A[k]xk[i+1j1]A[k]>xpjrp1ij1

We can confirm this is holds true at all times:

  1. Initialization
  2. Preservation
  3. Conclusion:
    • At the end of the for block j=r
    • This means: prr and p1rr1.
    • Furthermore, the last two lines of code in the partition function, insert the pivot in the right position by changing it with the leftmost element larger than x.
INV[r/j]{x=A[r] is always truek[pi]A[k]xk[i+1j1]A[k]>xprrp1ir1

invariantquicksort

Time Complexity

Tpartition(n)=Θ(n)={O(1) assignments Θ(n) for block rp1=n

And

Tquicksort(n)={cn1T(k)+T(nk1)+Θ(n)n>1

Where

  • k is the count of elements of the first sub-array
  • nk1 is the count of elements of the second sub-array (excluding the pivot 1)

Tquicksort(n) depends on the partitioning method of the subarray

Tquicksort(n)={Tworst(n)Θ(n2)Tbest(n)Θ(nlog(n))Taverageconstant(n)O(nlog(n))Taveragenonconstant(n)Θ(nlog(n))

Demonstration - Worst Case

In the worst case the sub-arrays are highly unbalanced

Tworst(n)=T(n1)+T(0)+Θ(n), since |sub1|=n1|sub2|=0

wcquicksort

Tworst(n)=T(n1)+c+Θ(n)Tworst(n)=T(n1)+Θ(n)=T(n1)+cnTworst(n)=T(0)+i=1nci=T(0)+ci=1ni=T(0)+c(n+1)n2=Θ(n2)

Demonstration - Best Case

In the best case the sub-arrays contain exactly almost half of the total elements respectively

Tbest(n)=2T(n/2)+Θ(n), since |sub1|=n/2|sub2|=(n/2)1

By the Master Theorem

f(n)=Θ(nlogb(a))=Θ(n)2ndTbest=Θ(nlog(n))

Demonstration - Average Case (constant)

In the average case, we find a constant repartition of the subarrays for example 9:1 :

Taverageconstant(n)=T(n/10)+T(9n/10)+cn

By the Occurrences Tree method we find that:

avg1quicksort

So:

  1. The height of the tree is h=log10(n)
  2. The longest path from the root, to a leaf here is found by keep going right, log10/9(n)
    1. Which is where the recursion stops
  3. The maximum cost of a level is cn
    1. Every level has cost cn until we reach the depth log10(n) where it becomes the upperbound for the cost of a level

Then:

Taverageconstant(n)cnlog10/9(n)Taverageconstant(n)=O(nlog(n))

From this we can generalize that:

T(n)=T(αn)+T(n(1α))+cn, where 0<α<1c>0 T(n)=Θ(nlog(n))

Demonstration - Average Case (non-constant)

In the average case where there are two options that keep repeating:

  • We assume that the input's permutations are i.i.d
Taveragenonconstant(n)={Tlucky(n)L(n)=2U(n/2)+Θ(n)Tunlucky(n)U(n)=L(n1)+Θ(n)

Then:

L(n)=2(L(n21)+Θ(n2))+Θ(n)=2L(n21)+2Θ(n2))+Θ(n)L(n)=2L(n21)+2Θ(n2))+Θ(n)=2L(n21)+Θ(n)L(n)=2L(n21)+Θ(n)=Θ(nlog(n))

Avoiding Worst Case: Randomized Version

Instead of choosing A[r] as pivot, we use a random pivot in A[pr]

  • We assume all the keys are distinct
python
#returns random integer between p and r
int randomized_partition(int arr[], int p, int r)
    int i = random(p,r); 
    swap(&arr, i, r); #swap arr[i] and arr[j]
    return partition(arr, p, r);

randomized_quicksort(int arr[], int p, int r)
    if(p < r):
        q = randomized_partition(arr, p, r);
        randomized_quicksort(arr, p, q - 1);
        randomized_quicksort(arr, q + 1, r);
#returns random integer between p and r
int randomized_partition(int arr[], int p, int r)
    int i = random(p,r); 
    swap(&arr, i, r); #swap arr[i] and arr[j]
    return partition(arr, p, r);

randomized_quicksort(int arr[], int p, int r)
    if(p < r):
        q = randomized_partition(arr, p, r);
        randomized_quicksort(arr, p, q - 1);
        randomized_quicksort(arr, q + 1, r);
Tavg(n)=Θ(nlog(n))Tw(n)=Θ(n2)

Pros:

  1. T(n) does not depend on the input's order
  2. No assumption on the input's distribution.
  3. No specific input can define the worst case
  4. The random number generator defines the worst case
  5. It is 3 to 4 times faster than the normal version.

Optimization - Insertionsort on small vectors

By using a value m 5 <= m <= 25 to have a range of cases where the insertion sort overrides the main algorithm can help to improve the average case scenario.

Case 1: We can either sort it only if the input is in range

python
quicksort(int * arr, int p, int r)
   if(r - p <= M):
      insertionsort(arr);
quicksort(int * arr, int p, int r)
   if(r - p <= M):
      insertionsort(arr);

Case 2:

python
quicksort(int * arr, int p, int r)
   if(r - p <= M):
      return;

sort(int * arr, int p, int r)
    quicksort(arr, p, r); # partially sorted vectors
    insertionsort(arr); # we sort the rest with insertion sort
quicksort(int * arr, int p, int r)
   if(r - p <= M):
      return;

sort(int * arr, int p, int r)
    quicksort(arr, p, r); # partially sorted vectors
    insertionsort(arr); # we sort the rest with insertion sort

Optimization 2 - Median as pivot

Using a value m the pivot for the quicksort means,

  1. Choosing the median out of three elements inside an unsorted vector:
    • A leftmost element
    • A rightmost element
    • A center element
  2. Swapping it with A[r]
  3. Applying the algorithm

Optimization 3 - Dutch Flag (Tri-Partition)

When we find duplicates, not even randomizing the choice of the pivot can help much.

Instead, of dividing the vector in 2 parts we divide it in 3 parts:

  1. Partition with elements i<x
  2. Partition with elements i>x
  3. Partition with elements i=x

This slightly changes the invariant, but the main idea stays the same.

Partition:

  1. Permutation of the elements in A[pr]
  2. Returns q and t, pqtr
    1. A[i]A[qt]A[i=q]=A[i+1]==A[t=i+n]
    2. A[i]A[pq1]A[i]<A[q]
    3. A[i]A[q+1r]A[i]>A[q]
  3. T(n)=Θ(rp)
python
partition(int * arr, int p, int r)
    int x = arr[r];
    int min = p, eq = p, max = r;

    # Theta(r-p)
    while(eq < max):
        if(arr[eq] < x):
            swap(arr, min, eq);
            eq++;
            min++;
        else if(arr[eq] == x):
            eq++;
        else:
            max--;
            swap(arr, max, eq);

    swap(arr, r, max);
    return [min, max]; #pair

    
quicksort(int * arr, int p, int r)
    if(p < r):
        int[] qt = partition(arr, p, r);
        quicksort(arr, p, q - 1);
        quicksort(arr, t + 1, r);
partition(int * arr, int p, int r)
    int x = arr[r];
    int min = p, eq = p, max = r;

    # Theta(r-p)
    while(eq < max):
        if(arr[eq] < x):
            swap(arr, min, eq);
            eq++;
            min++;
        else if(arr[eq] == x):
            eq++;
        else:
            max--;
            swap(arr, max, eq);

    swap(arr, r, max);
    return [min, max]; #pair

    
quicksort(int * arr, int p, int r)
    if(p < r):
        int[] qt = partition(arr, p, r);
        quicksort(arr, p, q - 1);
        quicksort(arr, t + 1, r);

Invariant

INV{x=A[r] is always truek[pmin)A[k]xk[mineq)A[k]=xk[maxr)A[k]>xpmineqmaxr

We obtain something like this:

| < x | = x | ? | > x | x |

|p |min | eq| max | r |

We can confirm this is holds true at all times:

  1. Initialization
  2. Preservation
  3. Conclusion: When the execution ends, we have eq=max
    1. The last two lines swap the pivot A[r] with the first element larger than x
    2. We obtain the desired partition
INV[max/eq]{x=A[r] is always truek[pmin)A[k]xk[minmax)A[k]=xk[maxr)A[k]>xpminmaxr

The result is:

| < x | = x | > x | x |

|p |min | max | r |

Complexity

If all the elements are equal

T(n)=Θ(n=rp+1)

Conclusion

Pro:

  • In-Loco
  • Worst Case is very rare, and normally it is very efficient
  • Randomizing the pivot is a solution for when input is sorted
  • Tri-partition improves the computing complexity

Con:

  • Worst Case T(n) = O(n**2)
  • Not Stable
  • Not adaptive: slow when input is already sorted (asc/desc) or all the elements have equal values

Extra Credits