Introduction to cardinality sorting and Java language implementation

Author：Eve Cole Update Time：2025-08-07 06:00:03

Basic ideas

RadixSort is developed based on bucket sorting, and both sorting are advanced implementations of allocating sorting. The basic idea of DistributiveSort: The sorting process does not require comparing keywords, but implements sorting through the "distribution" and "collect" processes. Their time complexity can reach a linear order: O(n).

Cardinality sorting is a stable sorting algorithm, but has certain limitations:

1. Keywords can be decomposed.
2. The number of keywords recorded is smaller, and it is better if they are dense. 3. If it is a number, it is best to be unsigned, otherwise the corresponding mapping complexity will be increased. You can first sort them separately.

Let’s take a look at bucket sorting (RadixSort).

Bucket sorting is also called box sorting (BinSort). Its basic idea is to set up several buckets, scan the records to be sorted R[0], R[1],…, R[n-1], load all records with keywords in a certain range into the kth bucket (allocated), and then connect the heads and tails of each non-empty bucket in sequence (collect) according to the sequence number.

For example, to sort a shuffled 52 playing cards by points A<2<…<J<Q<K, 13 "buckets" need to be set. When sorting, each card is placed in the corresponding bucket by points, and then the buckets are connected in sequence to get a deck of cards arranged in incremental order of points.

In bucket sorting, the number of buckets depends on the value range of the keyword. Therefore, bucket sorting requires that the keyword type is finite, otherwise unlimited buckets may be required.

Generally speaking, it is unpredictable how many records with the same keywords are stored in each bucket, so the type of bucket should be designed as a linked list.

To ensure that the sorting is stable, the connections during the packing and collection process must be carried out according to the first-in-first-out principle.

For bucket sorting, the time of the allocation process is O(n); the time of the collection process is O(m) (a linked list is used to store the input to be sorted records) or O(m+n). Therefore, the time for bucket sorting is O(m+n). If the order of magnitude of the number of buckets m is O(n), the time of bucket sorting is linear, that is, O(n).

Most of the time complexity of the major sorting algorithms mentioned above is O(n2), and some sorting algorithms are O(nlogn). But barrel sorting can achieve the time complexity of O(n). However, the disadvantages of bucket sorting are: first of all, the space complexity is relatively high and the additional overhead is required. Sorting has two arrays of space overhead, one stores the array to be sorted, and the other is the so-called bucket. For example, the value to be sorted is from 0 to m-1, then m buckets are required, and this bucket array requires at least m space. Secondly, the elements to be sorted must be within a certain range, etc.

Cardinality sorting is an improvement to bucket sorting, which makes "bucket sorting" suitable for larger sets of element values rather than improving performance.

Let’s use the example of playing cards to illustrate. A card consists of two keywords: suit (peach<heart<mei<square) + face value (A<2<3<4<...<K). If the size of a card is determined by the suit first, and the cards of the same suit are determined by the number. We have two algorithms to solve this problem.

That is, if the suits are different, no matter what the face value is, the card with a lower suit color is smaller than the suit color with a higher suit color. Only in the same suit color, the size relationship is determined by the size of the face value. This is the ordering of multiple key codes.

To get sorting results, we discuss two sorting methods.

Method 1: First sort the suits and colors and divide them into 4 groups, namely plum blossom group, square group, red heart group, and black heart group. Then sort each group by face value, and finally, connect the 4 groups together.

Method 2: First, give 13 number groups (No. 2, 3, ..., A) according to 13 face values, put the cards into the corresponding number groups in order according to the face values, and divide them into 13 piles. Then, give 4 numbering groups (Plum Blossoms, Squares, Red Hearts, Black Hearts), take out the cards in Group 2 and put them into the corresponding group of color, and then take out the cards in Group 3 and put them into the corresponding group of color,... In this way, all the 4 color groups are ordered according to the face value, and then, connect the 4 color groups in sequence.

Multi-key code sorting is sorted in order from the main key code to the second key code or from the second key to the main key code, and is divided into two methods:

The most significant bit priority (MostSignificantDigitfirst) method, referred to as the MSD method:

1) First sort and group them by k1, divide the sequence into several subsequences. In the records of the same group of sequences, the key codes k1 are equal.

2) Then sort each group into subgroups by k2. After that, continue to sort the following key codes in this way until each subgroup is sorted by the most secondary key code kd.

3) Then connect the groups to obtain an ordered sequence. The method introduced in sorting cards by suits and face values is the MSD method.

The Least SignificantDigitfirst method, referred to as the LSD method:

1) Start sorting from kd, then sort kd-1, repeating in turn until it is divided into the smallest subsequence according to k1.

2) Finally, connect each sub-sequence to obtain an ordered sequence. The second method introduced in sorting cards by suit and face value is the LSD method.

A single keyword of a numeric or character type can be regarded as a multi-keyword composed of multiple digits or multiple characters. At this time, the "allocation-collect" method can be used to sort it. This process is called the cardinality sorting method, where the number of possible values for each number or character is called the cardinality. For example, the base of suits of playing cards is 4 and the base of face value is 13. When sorting out playing cards, you can organize them by the style first or by the face value first. When sorting according to the colors, first divide it into 4 stacks (distributions) in the order of red, black, square and flowers, then stack them together (collect) in this order, and then divide them into 13 stacks (distributions) in the order of face value, and then stack them together (collect) in this order. In this way, secondary allocation and collection can arrange the playing cards in an orderly manner.

During the "allocation-collect" process, the stability of the sorting needs to be ensured.

The idea of cardinality sorting is to bucket each group of keywords in the data to be sorted in sequence. For example, the following sequence to be sorted:

135, 242, 192, 93, 345, 11, 24, 19

We divide the single, ten and hundreds of digits of each value into three keywords, for example:

135->k1 (single digits)=5, k2 (ten digits)=3, k3 (hundred digits)=1.

Then start from the lowest single bit (starting from the last keyword), all the k1 keywords of all data are bucket-allocated (because, each number is 0-9, so the bucket size is 10), and then output the data in the bucket in sequence to obtain the following sequence.

(11), (242, 192), (93), (24), (135, 345), (19) (Sorting starting from the most keyword, ignoring the hundred and ten digits, and allocating according to the numbers on the single digits)

Then the above sequence is allocated for k2, and the output sequence is:

(11, 19), (24), (135), (242, 345), (192, 93) (refer to the most keyword to sort the second keyword, ignore the hundred and single digits, and allocate according to the number on the ten digits)

Finally, for k3's bucket allocation, the output sequence is:

(011, 019, 024, 093), (135, 192), (242), (345) (refer to the second keyword to sort the highest keyword, ignore the ten and single digits, and allocate them according to the numbers on the hundred digits)

The sorting is complete.

Java implementation

 public void radixSort(){ int max = array[0]; for(int i=0;i<array.length;i++){ //Find the maximum value in the array if(array[i]>max){ max = array[i]; } } int keysNum = 0; //The number of keywords, we use single digits, ten digits, and hundreds... as keywords, so the number of keywords is the maximum digit while(max>0){ max /= 10; keysNum++; } List<ArrayList<Integer>> buckets = new ArrayList<ArrayList<Integer>>(); for(int i=0;i<10;i++){ //The possible number for each digit is 0~9, so set 10 buckets buckets.add(new ArrayList<Integer>()); //The bucket is composed of ArrayList<Integer>} for(int i=0;i<keysNum;i++){ //Start with the most recent keyword, assign it according to the keywords in turn for(int j=0;j<array.length;j++){ //Scan all array elements and assign elements to the corresponding bucket//Take out the number on the i+1th digit corresponding to the element, such as 258. Now you need to take out the number on the ten digits, 258%100=58,58/10=5 int key =array[j]%(int)Math.pow(10, i+1)/(int)Math.pow(10, i); buckets.get(key).add(array[j]); //Put this element into a bucket with the keyword key} //After allocating, copy the elements in the bucket back to the array int counter = 0; //Element counter for(int j=0;j<10;j++){ ArrayList<Integer> bucket =buckets.get(j); //Bucket with keyword j while(bucket.size()>0){ array[counter++] = bucket.remove(0); //Copy the first element in the bucket to the array and remove } } System.out.print("Thread"+(i+1)+"Wheel sort:"); display(); } }

The printing results are as follows:

Algorithm Analysis

At first glance, the execution efficiency of cardinality sort seems to be good and it is unbelievable that all you have to do is copy the original data items from the array to the linked list and then copy them back. If there are 10 data items, there are 20 replications, repeating the process for each bit. Assuming that 5-digit numbers are sorted, 20*5=100 copies are required. If there are 100 data items, then there are 200*5=1000 copies. The number of copies is proportional to the number of data items, that is, O(n). This is the most efficient sorting algorithm we have seen.

Unfortunately, the more data items, the longer keywords are needed. If the data items are increased by 10 times, the keyword must be added by one (one more round of sorting). The number of copies and the number of data items are proportional to the length of the keyword, and it can be considered that the keyword length is the logarithm of N. Therefore, in most cases, the execution efficiency of cardinal sorting is reversed to O(N*logN), which is almost the same as fast sorting.

Summarize

The above is the entire content of this article about the introduction to cardinality sorting and Java language implementation. I hope it will be helpful to everyone. Interested friends can continue to refer to other related topics on this site. If there are any shortcomings, please leave a message to point it out.