Introduction
MergeSort's complexity for inputs that have been sorted in reverse is O(n^2), and timsort is generated by optimizing MergeSort for this situation. The average complexity is n*O(log n), the best case is O(n), and the worst case is n*O(log n). And TimSort is a stability sort. The idea is to partition the sorted column first, and then merge the partitions, which looks the same as the MergeSort step, but there are some optimizations for reverse and large-scale data.
Optimization idea of merge sorting
There are several optimization methods for merge sorting:
Like quick sorting, you can use insert sorting or select sorting for small arrays to avoid recursive calls.
Before merge() is called, you can determine whether a[mid] is less than or equal to a[mid+1]. If so, then there is no need to merge, the array is already ordered. The reason is very simple. Since the two subarrays are already ordered, a[mid] is the maximum value of the first subarray, and a[mid+1] is the minimum value of the second subarray. When a[mid]<=a[mid+1], the array is ordered as a whole.
To save time copying elements to the auxiliary array, the roles of the original array and the auxiliary array can be swapped at each level of the recursive call.
The merge process in the merge() method needs to determine whether i and j have crossed the boundary, that is, a half of the edge has been used up. Another way to remove the code that detects whether a half of the side has been exhausted. The specific step is to copy the second half of the array a[] to aux[] in descending order, and then merge from both ends. For arrays {1,2,3} and {2,3,5}, the first subarray is copied as usual, and the second is copied from behind to front, and the final element in aux[] is {1,2,3,5,3,2}. The disadvantage of this method is that it makes the merge sorting become unstable sorting. The code implementation is as follows:
void merge(int[] a, int lo, int mid, int hi, int[] aux) {for (int k = lo; k <= mid; k++) { aux[k] = a[k];}for (int k = mid + 1;k <= hi; k++) { aux[k] = a[hi - k + mid + 1];}int i = lo, j = hi; //From the two ends to the middle for (int k = lo; k <= hi; k++) if (aux[i] <= aux[j]) a[k] = aux[i++]; else a[k] = aux[j--];}TimSort's steps
Partition
The idea of partitioning is to scan the array once, and use the continuous positive sequence (if it is sorted ascending order, then the positive sequence is an ascending sequence), or [strictly] (to ensure the stability of the sorting algorithm) as a partition (run). If it is a reverse sequence, reverse the elements in the partition. For example
1, 2, 3, 6, 4, 5, 8, 6, 4 The partitioning results are
[1,2,3,6],[4,5,8],[6,4]
Then reverse the reverse sequence
[1,2,3,6],[4,5,8],[4,6]
merge
Consider an extreme example, for example, the lengths of partitions are 10000, 10, 1000, 10, 10, and 10, of course we hope to merge 10 10 into 20, 20 and 1000 into 1020 first. If we merge from left to right, we will use the array 10000 and the combination of small numbers each time, which is too expensive. So we can use a strategy to optimize the order of merges.
Example
Taking ComparableTimSort.sort() in java as an example, a run stack is used to determine whether it should be merged.
if (nRemaining < MIN_MERGE) { int initRunLen = countRunAndMakeAscending(a, lo, hi); binarySort(a, lo, hi, lo + initRunLen); return; }Sort less than MIN_MERGE (32), and sort directly with binary insertion after partitioning
int minRun = minRunLength(nRemaining); do { //Find out the starting position of the next partition, and also flip the reverse sequence int runLen = countRunAndMakeAscending(a, lo, hi); // Make sure that the runs in the run stack are greater than minRun. If the current partition is too small, take out the elements from the back to make up if (runLen < minRun) { int force = nRemaining <= minRun ? nRemaining : minRun; binarySort(a, lo, lo + force, lo + runLen); runLen = force; } //Put run into the run stack ts.pushRun(lo, runLen); //Judge whether it should be merged. i starts from the top of the stack and knows that it cannot be merged//1. runLen[i - 3] > runLen[i - 2] + runLen[i - 1] //2. runLen[i - 2] > runLen[i - 1] ts.mergeCollapse(); lo += runLen; nRemaining -= runLen; } while (nRemaining != 0); // Merge all remaining runs to complete sort sort assert lo == hi; //Merge the remaining run ts.mergeForceCollapse(); assert ts.stackSize == 1;Looking at a more important function
/*** If the lengths of the last 2 runs are added longer than the previous one, then use the run at the middle position and the run with a shorter front and back lengths to merge* If the lengths of the last 2 runs are added shorter than the previous one, then merge the two runs*/ private void mergeCollapse() { while (stackSize > 1) { int n = stackSize - 2; if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1]) { if (runLen[n - 1] < runLen[n + 1]) n--; mergeAt(n); } else if (runLen[n] <= runLen[n + 1]) { mergeAt(n); } else { break; // Invariant is established } }