Bucket Sort Algorithm

shanthanu kota
5 min readMar 24, 2021

--

In the Bucket Sorting technique, the data items are distributed in a set of buckets. Each bucket can hold a similar type of data. After distributing, each bucket is sorted using another sorting algorithm. After that, all elements are gathered on the main list to get the sorted form

What is Bucket Sort ?

Bucket sort is a comparison sort algorithm that operates on elements by dividing them into different buckets and then sorting these buckets individually. Each bucket is sorted individually using a separate sorting algorithm or by applying the bucket sort algorithm recursively. Bucket sort is mainly useful when the input is uniformly distributed over a range.

Assume one has the following problem in front of them:

One has been given a large array of floating point integers lying uniformly between the lower and upper bound. This array now needs to be sorted. A simple way to solve this problem would be to use another sorting algorithm such as Merge sort, Heap Sort or Quick Sort. However, these algorithms guarantee a best case time complexity of O(NlogN). However, using bucket sort, the above task can be completed in O(N) time. Let’s have a closer look at it.

Consider one needs to create an array of lists, i.e of buckets. Elements now need to be inserted into these buckets on the basis of their properties. Each of these buckets can then be sorted individually using Insertion Sort. Consider the pseudo code to do so:

void bucketSort(float[] a,int n)
{
for(each floating integer 'x' in n)
{
insert x into bucket[n*x];
}
for(each bucket)
{
sort(bucket);
}
}

The complexity of the Bucket Sort Technique

  • Time Complexity: O(n + k) for best case and average case and O(n²) for the worst case.
  • Space Complexity: O(nk) for worst case

Input and Output

Input:
A list of unsorted data: 0.25 0.36 0.58 0.41 0.29 0.22 0.45 0.79 0.01 0.69
Array before Sorting: 0.25 0.36 0.58 0.41 0.29 0.22 0.45 0.79 0.01 0.69
Output:
Array after Sorting: 0.01 0.22 0.25 0.29 0.36 0.41 0.45 0.58 0.69 0.79

Algorithm

bucketSort(array, size)

Input − An array of data, and the total number in the array

Output − The sorted Array

Example

Begin
for i := 0 to size-1 do
insert array[i] into the bucket index (size * array[i])
done

for i := 0 to size-1 do
sort bucket[i]
done

for i := 0 to size -1 do
gather items of bucket[i] and put in array
done
End
#include<iostream>
#include<vector>
#include<algorithm>
using namespace std;

void display(float *array, int size) {
for(int i = 0; i<size; i++)
cout << array[i] << " ";
cout << endl;
}

void bucketSort(float *array, int size) {
vector<float> bucket[size];
for(int i = 0; i<size; i++) { //put elements into different buckets
bucket[int(size*array[i])].push_back(array[i]);
}

for(int i = 0; i<size; i++) {
sort(bucket[i].begin(), bucket[i].end()); //sort individual vectors
}

int index = 0;
for(int i = 0; i<size; i++) {
while(!bucket[i].empty()) {
array[index++] = *(bucket[i].begin());
bucket[i].erase(bucket[i].begin());
}
}
}

int main() {
int n;
cout << "Enter the number of elements: ";
cin >> n;
float arr[n]; //create an array with given number of elements
cout << "Enter elements:" << endl;

for(int i = 0; i<n; i++) {
cin >> arr[i];
}

cout << "Array before Sorting: ";
display(arr, n);
bucketSort(arr, n);

cout << "Array after Sorting: ";
display(arr, n);
}

Output

Enter the number of elements: 10
Enter elements:
0.25 0.36 0.58 0.41 0.29 0.22 0.45 0.79 0.01 0.69
Array before Sorting: 0.25 0.36 0.58 0.41 0.29 0.22 0.45 0.79 0.01 0.69
Array after Sorting: 0.01 0.22 0.25 0.29 0.36 0.41 0.45 0.58 0.69 0.79

Example

Let’s take an array A[]={0.78,0.26,0.72,0.17,0.39}. Total number of elements are n=10. So we need to create 5 buckets.
maximum element(max)=0.78
Using the formula: bi = (n)*arr[i]/(max+1), we find out bucket index of each element of the array A[].
For eg: bucket index of 0th element is bi= floor((5)(0.78)/(0.78+1))= 2
Similarly,bucket index of 1st element is bi=0
bucket index of 2nd element is bi=1
bucket index of 3rd element is bi=0
bucket index of 4th element is bi=1
Now, since buckets have been created we need to concatenate all the buckets in the order to get the sorted array.

Pseudocode

Step 1: function bucketSort(array, n) is
Step 1: buckets ← new array of n empty lists
Step 2: M ← the maximum key value in the array
Step 3: for i = 1 to length(array) do
insert array[i] into buckets[floor(array[i] / M * n)]
Step 4: for i = 1 to n do
sort(buckets[i])
Step 5: return the concatenation of buckets[1], …., buckets[k]

Complexity

Worst case time complexity:Θ(n^2)
Average case time complexity:Θ(n+k)
Best case time complexity:Θ(n+k)
Space complexity:Θ(n+k)

where n is the number of elements to be sorted and k is the number of buckets

For an upper bound on the worst-case cost, it’s sufficient to show that it can’t be worse. Assuming that insertion sort takes ≤cn2 steps on n elements, consider the sum

n∑i=1c|Bi|2∑i=1nc|Bi|2

it is an upper bound on the cost of sorting all the buckets. For an upper bound on the worst case for bucket sort, maximize this function subject to ∑|Bi|=n (and add the remaining cost, which is O(n) for all inputs).

For a lower bound on the worst-case cost, we have to find an infinite class of actual inputs and show that their cost behaves as claimed. [0,…,0] serves to show an Ω(n2) lower bound.

Applications

Constructing Histograms
One common computation in data visualization and analysis is computing a histogram. For example, n students might be assigned integer scores in some range, such as 0 to 100, and are then placed into ranges or “buckets” based on these scores.

Comparison with other sorting algorithms

Bucket sort can be seen as a generalization of counting sort; in fact, if each bucket has size 1 then bucket sort degenerates to counting sort. The variable bucket size of bucket sort allows it to use O(n) memory instead of O(M) memory, where M is the number of distinct values; in exchange, it gives up counting sort’s O(n + M) worst-case behavior.

Bucket sort with two buckets is effectively a version of quicksort where the pivot value is always selected to be the middle value of the value range. While this choice is effective for uniformly distributed inputs, other means of choosing the pivot in quicksort such as randomly selected pivots make it more resistant to clustering in the input distribution.

The n-way mergesort algorithm also begins by distributing the list into n sublists and sorting each one; however, the sublists created by mergesort have overlapping value ranges and so cannot be recombined by simple concatenation as in bucket sort. Instead, they must be interleaved by a merge algorithm. However, this added expense is counterbalanced by the simpler scatter phase and the ability to ensure that each sublist is the same size, providing a good worst-case time bound.

Top-down radix sort can be seen as a special case of bucket sort where both the range of values and the number of buckets is constrained to be a power of two. Consequently, each bucket’s size is also a power of two, and the procedure can be applied recursively. This approach can accelerate the scatter phase, since we only need to examine a prefix of the bit representation of each element to determine its bucket.

--

--

shanthanu kota
shanthanu kota

No responses yet