Saturday, January 2, 2016

Measuring the efficiency of a Algorithm

Algorithms, despite the scary sounding word, are nothing more than a series of steps or actions that are performed to accomplish a certain task, in the world of computers.

Like in real world, each task can be accomplished by a different set of steps, with each set making different sense in terms of time or ease of accomplishment. For instance, if you know a certain location or mall has a coffee shop somewhere, and you want to get there, then there are two sets of steps you can engage in to accomplish that:

  1. First set of steps, involve the obvious, which is to walk/drive to that location and walk-around looking for that coffee shop. Or,
  2. Second steps, involve using a GPS app on your phone or ask someone who is familiar with that location, where the coffee shop is and go there straight.
Now, the first set of steps obviously require more time unless you happen to be super lucky. It does not guarantee that you will be efficient every time (every time you employ it, for a different location that is) and in most cases require a lot more time than second set of steps where you walk straight to the shop, thanks to your GPS. Same task, different steps

Similarly, accomplishing a task using different algorithms involve different set of steps and in most cases vary in efficiency similar to searching around for a coffee shop vs using GPS. It therefore, becomes vital that we use some technique to measure the efficiency of different algorithms, and employ the right one for the right situation. This is the reason, why we should measure efficiency at all.

However, in some scenarios different algorithms employed, similar to different steps employed won't yield much different results at all. For instance finding a coffee shop, among 3 shops down the block wont be much different whether you walk and look for it.

However, it will be much different if you are looking for a coffee shop among a relatively big city block, with 100's of shops, isn't it ? Well, it will be, and if it is not obvious already that our first way of haplessly searching would be a terrible and inefficient, our friend or someone will definitely sooner or later will advise us to use our GPS, or look it up for us. This points us to the fact that an algorithms efficiency is truly only measured relative to how well it can handle large data.

So any measure of an algorithm, needs to be expressed in terms of how well it can handle growth of input data, similar to how efficient using a GPS to find a coffee shop will function in a mega city vs a small town.

At the very least, measuring an efficiency of a algorithm involves measuring the number of steps that need to be performed to accomplish the result. For instance, if we are looking for a value in a array:

int[] arr = new int[]{12,3,4,7,8,3,3,9,6};
int lookingFor = 6;
for(int i=0;i<=arr.length;i++)
{
    if(lookingFor = arr[i])
    {
      System,out.printlin(" Found " + lookingFor + " in " + i + " steps");
    }
}

This above code executes and finds "6", in 9 attempts, which is the size of the array.It compares each element against "6", until it finds it at the end of the array.

So if we employ this algorithm, to find a value, "v" in a array of "n" elements, then it can be said that this algorithm, has an efficiency of O(n). Big O, is a convention that is used to express algorithm efficiency very similar to the use of f($x^2$) in mathematics.


Different algorithms, have different efficiency values, with some better than others for the same task. Here are some of the common efficiency values along with the most common scenarios that will result in that efficiency.

Efficiency
Common scenarios
O(1)
Anything which gets completed in constant time, no matter what the size of data is
O(n)
Anything which requires at the most, 1 full parse of all the data elements
O($n^2$)
Anything which requires going over the data multiple times (of n). Typically this involves nested-loops, where  the inner loop parses all data elements, for each iteration of outer loop, which itself iterates n times, where n is the size of data
O(log $2^n$)
Anything which involves repeatedly dividing the data into two halves to find the result. Examples are binary tree search, or recursive calls.

In addition, there are other rules of common sense, such as ignoring smaller values of n, and constants which are minuscule for large sets of data.

I will expand more on this in future, with examples of how to measure efficiency, and different comparisons between techniques.

VR180 to 3D SBS Converter