reConnect: 2015

Sunday, December 27, 2015

Insertion Sort

Insertion Sort

Insertion sort, involves "inserting" unsorted elements, one at a time, in the appropriate position in the sorted set.

Initially, an assumption is made that every element to the left of the second element ( = element at position 1), which is of course the first element ( = element at position 0), belongs to the sorted set and that second element ( = element at position 1) is meant to be inserted into that sorted set at the correct position.

Visualizing it, would be:

Position → 0 1 2 3 4 5
Values → 8 | 2 16 5 1 4

This second element ( = element at position 1) is referred as the chosen/marked element, and is compared to every element in the sorted set.

And if the marked element is smaller than the element in sorted set it is compared against , then that element from sorted set is copied to the position to its right. So the compared sorted element at position at position 1, is copied to position 2.

If however, the marked element is bigger than the element in the sorted set that it is being compared with, then the marked element is inserted in the position of the element, that it was previously moved against (not currently compared against). So a marked element that is found to be greater than element at position at position 1, is copied to position 2.

If M stands for marked element, and P for the potential position that the marked element in the unsorted set may be inserted if it's value is greater than the value of element at position P-1 or if P is 0, and a yellow line divides the unsorted part (to the right of the yellow line) and sorted part (to the left of the yellow line), then watching the below animation can help with understanding how insertion sort works:

After M is inserted at one of the positions, then we repeat the steps with M set to the next unsorted element, to the right of the yellow line in above animation, until no elements are left in the unsorted part, or in other words no elements are on the right-side of the yellow line in above animation.

Since the unsorted element that is marked, will be the first element of the unsorted set, it will be essentially compared with the element before it, which will be the last element of the sorted set. If the marked element is found to be bigger than the last element of the sorted set then nothing need to be done, and the next element in the unsorted set is marked for comparison.

However, if the marked element is found to be smaller than the last element in the sorted set, then the last element will be copied to the position of the marked element (effectively moving it right), and the marked element is now compared against the last but first element of the sorted set, and the steps are repeated until we either encounter a element in sorted set that is smaller than marked element, or we reached the first element of the sorted set. If we reached the first element of the sorted set, then we effectively insert our marked element there since that will be the smallest of them all.

Each step involves comparing the marked element, with a element in the sorted set and moving it if needed.

The steps involved in brief are:

Chose the next unsorted element
Insert it into appropriate position in the sorted set using comparison.
Repeat steps 1 through 2, until there are no more elements in the unsorted set.

To summarize, the algorithm, watch the my video below about the workings of a InsertionSort :

To convert, this InsertionSort algorithm into a program, watch the my video below about Writing a InsertionSort Program:

Tuesday, December 15, 2015

Selection Sort

Selection Sort

Selection sort, is a technique which involves repeatedly "selecting" the (next)smallest element from the set of (remaining)unsorted elements in the total data set, and adding it to the end of a sorted set composing of all the previously found small elements (which will be smaller than the element we are trying to add) from the total data set.

As you can imagine, initially all the elements of the total data set are unsorted and we have to assume one of them to be the smallest. Naturally the first element of this unsorted set, which is basically the total data set, when we start, is a good place to make that assumption. So that leads to our step 1

Assume the element at first unsorted position as the next smallest element that needs to be found and remember its position as well. This is the initial assumption.

Now obviously, there is no guarantee that our element at first unsorted position will be the next smallest element in the unsorted part of the total data set. We need to validate that assumption by comparing our assumed next smallest element with the rest of the elements in the unsorted part of the total data set, which leads us to step 2

Compare your assumed next smallest element with rest of the elements, one at a time.

So what if we found an other element in the unsorted part of the total data set, that is smaller than our assumed element ? Well, simple. Just update our assumption(var) of the of the next smallest element and its position

But, what if you found an other one ? Well just update your assumption again. Keep updating your assumption(var) of the next smallest element and its position, every time your found a element smaller than your currently assumed next smallest element until you reach the end of the list. Now, let us formalize this as our step 3:

For every element smaller than your assumed next smallest element that you encountered before the end of the unsorted part of the total data set, update your assumption of the of the next smallest element and its position.

Alright, now you reached the end of the unsorted part of the total data set, and you found the next smallest element. So what next ? Put the next smallest element found in the unsorted part of the total data set in the position of your initial assumption of the next smallest element, and move the element that is previously in the that initial assumption position to where the next smallest element is actually found. This will be our step 4.

Swap the next smallest element found (no longer assumed) with the initially assumed smallest element in the total data set.

At this point we have 1 element, which is the smallest element of the entire set at the start of our set, initializing our sorted set. So it is time to find the next smallest element from our unsorted set, to add to our sorted set, usually positioned at the starting or left part of the total data set.

All that is left at this point is to repeat all of the above steps with the assumption that the first element of the unsorted part of the total data set (in this case the second element of the total data set) is the smallest element.

This animation shows the steps in action (green arrow is comparisons, red is swaps):

Very often I find it easier, if we can understand, how the core steps of the algorithm interacts with a very small data set , then it will make it much easier to understand the whole algorithm.

So watch the demo below to understand the basic concept of SelectionSort algorithm:

How many times do we have to repeat this ?

The obvious answer is, till there are no more unsorted elements left in the total data set. But since, we know the algorithm keep finding the smallest position with every iteration, and that when our sorted part of the total data set, encroaches the element just before the last element, we can be assured that the last remaining element is the next smallest element, and there will be no need to search any further. So that will be a good point to stop our search for the next smallest element.

To summarize, the algorithm, watch the my video below about the workings of a SelectionSort :

To convert, this SelectionSort algorithm into a program, watch the my video below about Writing a SelectionSort Program:

Saturday, December 5, 2015

Bubble Sort

Bubble Sort

Bubble sort, derives its name by drawing an analogy between how the biggest elements in the data set "bubble" up to the end of the list, similar to how bubbles raises up in water.

The process involves comparing each element with the one immediately to the right of it and swapping them if the one on left happens to be bigger than the one in right, starting with the first element. Now you repeat the step with the second element, third element and so on until you reach the end of the list. We will call this single trip from the first element to the last element as a "pass".

To summarize, the core of our algorithm please take a look at the steps below:

When our current index (=ci) is pointing to element at position k, that is larger than element at k+1 position

compare
Large Element at position K ⥨ Element at position K+1
↑
current index

Then, we move element at position k, to position k+1, and element at position k to k+1.

swap
Element at position K  ⥨ Large Element at position K+1
  ↑
current index

Then, we move our current index to point to position k+1 and compare it with whatever is next to it, in k+2 position

  compare
Large Element at position K+1  ⥨ Large/small Element at position K+2
       ↑
current index moved

However, When our current index is pointing to element at position k, that is smaller than element at k+1 position
    compare
Element at position K  ⥨ Large Element at position K+1
↑
current index


Then, we simply move our current index to point to position k+1 and compare it with whatever is next to it, in k+2 position

      compare
Large Element at position K+1  ⥨  Large/small Element at position K+2
     ↑
current index moved

So, the steps involved for current index, ci pointing to k are :

Compare element at position k with one at position k+1 the one next to it, where k is anywhere between 0 and the last element
If value at position k is bigger than value at position k+1, swap value at position k with value at k+1
Now, set your ci to k
Repeat step 1 to step 3, until you reached the end of the list.

At the end of the first pass, you will end up with the biggest element of the data set at the end of the list, since with each swap in the first pass, we made sure that we are positioning the biggest element to the right, and then moving it to the right again when we compare it with the element to the right of it, and found it to be bigger. So at the end of the first pass, the biggest element of the data-set "bubbled" up to the end of the list.

This bubbled up biggest element at the last position will now, be the first element of a sub-set of elements, we will refer to as "sorted" set going forward.

Now, we repeat the pass, starting with the first element, and ending with the element right before the last element.

It will be of great help if you can have a look at the animation below to understand the workings better.(green is comparisons, red is swap, yellow line separates the sorted part (to right of the line) from the unsorted part (to the left))

Now, why are we stopping at the element before the last element ?

Well, why do you want to do anything with the last element which we already established is the biggest element of the set ?

So at the end of second pass, we will obviously end up with the second biggest element at a position towards the end of the list. We will call these two elements as part of the sorted set. So at the end of the second pass, the second biggest element of the data-set "bubbled" up to the end of the list, next to the first biggest element and became part of the so-called sorted set. This new (second)biggest element will now be the first element of the sorted set, along with the previously found (first)biggest element.

Naturally, it should occur to us now, that if we repeat the pass, enough times, that is until the 2nd element from left is part of the sorted set, we have ourselves a sorted set.

This obviously means that the we have to remember the position of first element in the sorted set, and stop the sorting when that position is 1 (since arrays start at position 0, element 1 will be the second position). At that point the element position 0 will be smallest of all, since sorted set will have all the biggest elements we found so far, and we will have a fully sorted set in our hands

Nothing, can obviously beat the demonstration of it, so here we go :

The steps involved in the conversion of the above technique into a program are explained in the video below:

Sunday, November 15, 2015

Sorting Techniques

Sorting Techniques and their Importance

Any modern day computing involves sorting the data in one fashion or the other, based on the some sort of parameter such as price, rating, location or date. This places sorting as the first place where we attempt to "reConnect" the software industry with the techniques that are defined by computer science.

Simple Sorting involves techniques that takes up a amount of time that is proportional to the square of the number of data elements that the technique is trying to sort. For a reasonably sized data set, simple sorting techniques offer the advantage of simplicity and in-place sorting at the cost of efficiency.

In other words, if there are N number of elements, then it takes $N^2$ amount of time, as a proportion to sort those elements. It is not the literal value of $N^2$ that matters, rather than the rate of its growth relative to the rate of growth of input N.

Translating this to actual computer terms, an average modern day computer has a speed of at least 1 G Hz, which is $10^9$ operations per second. Comparison of two values is one such operation that can be performed in 1/$10^9$

However, since real world comparisons involve more than simple numbers, and involve multiple comparisons (ex. comparing two distinct names ), let us assume that one such comparison involves 10000, or $10^4$ simple operations.

So, 1 comparison = $10^4$ simple operations

A modern PC can perform $10^5$ such comparisons per second.

On such a computer system that takes 1/$10^5$ sec to compare 2 items, to sort a array of 5000 objects , a simple sorting techniques described below require an time proportional to :

$${(5000^2)} x {1}/{10^5} = {(25 x 10^6) x {1}/{10^5} = 250 secs = 4.16 minutes }$$

Since the efficiency is measured as the rate of growth of time consumed, as the input data (number of elements that are being sorted) increases, let us calculate the time it takes to sort a array of 50000 objects , using the simple sorting techniques similar to the calculation above

$${(50000^2)} x {1}/{10^5} = {(25 x 10^8) x {1}/{10^5} = 25000 secs = 416 minutes = 7 hours }$$

As demonstrated above, it quickly becomes evident, that for higher values of N, having an efficiency of $N^2$ is extremely detrimental to performance, unless one has 7 hours to burn !

However, it also demonstrates that for a reasonable size data set simplest of sorting techniques can be effectively employed to achieve the results in a acceptable timely fashion.

Now. what are the sort techniques are the simplest of sort techniques ? They are

Bubble sort, where the biggest value bubbles to the end of list.
Selection sort, where the smallest value is selected and placed at the start of the list
Insertion sort, where the value is rightly inserted in a sorted list to the left of a marker.

One way of comparing these 3 techniques is measuring the the time it took to measure how long these 3 techniques sort 100,000 randomly generated numbers. Here are the results:

Technique	Time to sort 100k random values (in msec)	Time to sort 100k random values (in sec)
Insertion sort	1865msec	1 sec
Selection Sort	8892 msec	8 sec
Bubble Sort	20904 msec	20 sec

Following are the results, for the above 3 techniques when we employ them to sort 100K numbers that are in reverse order:

Technique	Time to sort 100k reverse values (in msec)	Time to sort 100k reverse values (in sec)
Insertion sort	3607msec	3 sec
Selection Sort	7464 msec	7 sec
Bubble Sort	12044 msec	12 sec

Following are the results, for the above 3 techniques when we employ them to sort 100K numbers that are already sorted:

Technique	Time to sort 100k sorted values (in msec)	Time to sort 100k sorted values (in sec)
Insertion sort	4ms	0 sec
Selection Sort	2182 msec	2 sec
Bubble Sort	7088 msec	7 sec

So based on all of the above results, we can see that Insertion sort is consistently better performing among all of the 3 techniques, followed by Selection Sort and bubble sort in that order.

reConnect