56.2k views
1 vote
Recall the problem of finding the number of inversions. As in the text, we are given a sequence of numbers a1, . . . , an, which we assume are all distinct, and we define an inversion to be a pair i < j such that aj < ai . We motivated the problem of counting inversions as a good measure of how different two orderings are. However, one might feel that this measure is too sensitive. Lets call a pair a significant inversion if i < j and 2aj < ai .

(a) Describe a O(n log n) algorithm to count the number of significant inversions between two orderings. Make clear how is it that your algorithm would achieve the given O(n log n) execution time bound.

User Antonyoni
by
5.0k points

1 Answer

4 votes

Answer:

The algorithm is very similar to that of counting inversions. The only change is that here we separate the counting of significant inversions from the merge-sort process.

Step-by-step explanation:

Algorithm:

Let A = (a1, a2, . . . , an).

Function CountSigInv(A[1...n])

if n = 1 return 0; //base case

Let L := A[1...floor(n/2)]; // Get the first half of A

Let R := A[floor(n/2)+1...n]; // Get the second half of A

//Recurse on L. Return B, the sorted L,

//and x, the number of significant inversions in $L$

Let B, x := CountSigInv(L);

Let C, y := CountSigInv(R); //Do the counting of significant split inversions

Let i := 1;

Let j := 1;

Let z := 0;

// to count the number of significant split inversions while(i <= length(B) and j <= length(C)) if(B[i] > 2*C[j]) z += length(B)-i+1; j += 1; else i += 1;

//the normal merge-sort process i := 1; j := 1;

//the sorted A to be output Let D[1...n] be an array of length n, and every entry is initialized with 0; for k = 1 to n if B[i] < C[j] D[k] = B[i]; i += 1; else D[k] = C[j]; j += 1; return D, (x + y + z);

Runtime Analysis: At each level, both the counting of significant split inversions and the normal merge-sort process take O(n) time, because we take a linear scan in both cases. Also, at each level, we break the problem into two subproblems and the size of each subproblem is n/2. Hence, the recurrence relation is T(n) = 2T(n/2) + O(n). So in total, the time complexity is O(n log n).

Correctness Proof: We will prove a stronger statement that on any input A our algorithm produces correctly the sorted version of A and the number of significant inversions in A. And, we will prove it by induction. Base Case: When n = 1, the algorithm returns 0, which is obviously correct.

Inductive Case: Suppose that the algorithm works correctly for 1, 2, . . . , n βˆ’ 1. We have to show that the algorithm also works for n. By the induction hypothesis, B, x, C, y are correct. We need to show that the counting of significant split inversions and the normal merge-sort process are both correct. In other words, we are to show that z is the correct number of significant split inversions, and that D is correctly sorted.

First, we prove that z is correct. Due to the fact that B and C are both sorted by the induction hypothesis, for every C[j], where j = 1, . . . , length(C), we can always find the smallest i such that B[i] > 2C[j]. Therefore, B[1], B[2], . . . , B[i βˆ’ 1] are all smaller than or equal to 2C[j], and B[i], B[i + 1], . . . , B[length(B)] are all greater than 2C[j]. Thus, length(B) βˆ’ i + 1 is the correct significant split inversion count for pairs involving C[j]. After counting the number of correct significant split inversions for all C[j], where j = 1, . . . , length(C), z is the correct number of significant split inversions. Second, we show that D is the sorted version of A.

This is actually the proof of merge-sort. Each time, we append one element from B or C to D. At each iteration k, since B and C are both sorted by the induction hypothesis, min(B[i], C[j]), the one to be appended to D, is also the smallest number among all the numbers that have not been added to D yet.

As a result, every time we append the smallest remaining element (of B and C) to D, which makes D a sorted array

User Janis Peisenieks
by
5.7k points