Sunday, November 15, 2015

Sorting Techniques

Sorting Techniques and their Importance 

Any modern day computing involves sorting the data in one fashion or the other, based on the some sort of parameter such as price, rating, location or date. This places sorting as the first place where we attempt to "reConnect" the software industry with the techniques that are defined by computer science.

Simple Sorting involves techniques that takes up a amount of time that is proportional to the square of the number of data elements that the technique is trying to sort. For a reasonably sized data set, simple sorting techniques offer the advantage of simplicity and in-place sorting at the cost of efficiency.

In other words, if there are N number of elements, then it takes $N^2$ amount of time, as a proportion to sort those elements. It is not the literal value of $N^2$ that matters, rather than the rate of its growth relative to the rate of growth of input N.

Translating this to actual computer terms, an average modern day computer has a speed of at least 1 G Hz, which is $10^9$ operations per second. Comparison of two values is one such operation that can be performed in 1/$10^9$

However, since real world comparisons involve more than simple numbers, and involve multiple comparisons (ex. comparing two distinct names ), let us assume that one such comparison involves 10000, or $10^4$ simple operations.

So, 1 comparison = $10^4$ simple operations

A modern PC can perform $10^5$ such comparisons per second.

On such a computer system that takes 1/$10^5$ sec to compare 2 items,  to sort a array of 5000 objects , a simple sorting techniques described below require an time proportional to :

$${(5000^2)} x {1}/{10^5} = {(25 x 10^6) x {1}/{10^5} = 250 secs = 4.16 minutes }$$


Since the efficiency is measured as the rate of growth of time consumed, as the input data (number of elements that are being sorted) increases,  let us calculate the time it takes  to sort a array of 50000 objects , using the simple sorting techniques similar to the calculation above

$${(50000^2)} x {1}/{10^5} = {(25 x 10^8) x {1}/{10^5} = 25000 secs = 416 minutes = 7 hours }$$

As demonstrated above, it quickly becomes evident, that for higher values of N, having an efficiency of $N^2$ is extremely detrimental to performance, unless one has 7 hours to burn !

However, it also demonstrates that for a reasonable size data set simplest of sorting techniques can be effectively employed to achieve the results in a acceptable timely fashion.

Now. what are the sort techniques are the simplest of sort techniques ? They are


  • Bubble sort, where the biggest value bubbles to the end of list.
  • Selection sort, where the smallest value is selected and placed at the start of the list
  • Insertion sort, where the value is rightly inserted in a sorted list to the left of a marker.

One way of comparing these 3 techniques is measuring the the time it took to measure how long these 3 techniques sort 100,000 randomly generated numbers. Here are the results:



Technique
Time to sort 100k random values (in msec)
Time to sort 100k random values (in sec)
Insertion sort
1865msec
1 sec
Selection Sort
8892 msec
8 sec
Bubble Sort
20904 msec
20 sec


Following are the results, for the above 3 techniques when we employ them to sort 100K numbers that are in reverse order:


Technique
Time to sort 100k reverse values (in msec)
Time to sort 100k reverse values (in sec)
Insertion sort
3607msec
3 sec
Selection Sort
7464 msec
7 sec
Bubble Sort
12044 msec
12 sec




Following are the results, for the above 3 techniques when we employ them to sort 100K numbers that are already sorted:

Technique
Time to sort 100k sorted values (in msec)
Time to sort 100k sorted values (in sec)
Insertion sort
4ms
0 sec
Selection Sort
2182 msec
2 sec
Bubble Sort
7088 msec
7 sec


So based on all of the above results, we can see that Insertion sort is consistently better performing among all of the 3 techniques, followed by Selection Sort and bubble sort in that order. 

VR180 to 3D SBS Converter