## Friday, March 11, 2011

### Algorithms And Data Structures

Alright, so I've decided that I'll try to start writing up some lessons on Algorithms and Data Structures to help with my studies. Hopefully those of you who are new to programming or would like to learn more about algorithms and performance can learn from these.

I'll start by saying that in order to understand what's going on you'll have to have a decent understanding of how to write code (arrays, primitive data types, etc.). Anyway, lets get to business...

Now you may be wondering, why should I study algorithms and performance?

Well you see, performance is often what draws the line between what is feasible and what is impossible.

Algorithms help us understand scalability

And Algorithmic mathematics provide us a language for talking about program behavior

Now that that little introduction is out of the way... Lets discuss Sorting.

What is sorting?

Basically you take an input, a sequence of numbers a1,a2,...,an , and output a permutation such that the first number in the sequence is less than the next.

Example:

Input: 8 2 4 9 3 6

Output: 2 3 4 6 8 9

Lets examine Insertion Sort with the following pseudocode

Code:

Insertion-Sort(A,n)

for j = 2 to n

do key = A[j]

i = j-1

while i > 0 and A[i] > key

do A[i+1] = A[i]

i = i-1

A[i+1] = key

That was simple, but how does it work?

Here's how:

Code:

Step 1: 8 2 4 9 3 6 (Insert 2 and front shift rest right)

Step 2: 2 8 4 9 3 6 (Insert 4 after 2 and shift rest right)

Step 3: 2 4 8 9 3 6 (No swap)

Step 4: 2 4 8 9 3 6 (Insert 3 after 2 and shift rest right)

etc.

Final Step: 2 3 4 6 8 9 (Success)

Okay that's cool, it works, lets examine the running time.

Running time depends on the input. Obviously, an already sorted sequence is easier to sort.

Short sequences are easier to sort than long ones

We need to try and find the upper bound on the running time to guarantee our performance.

Generally we analyze only worst-case or average-case, best case is never used.

So what is insertion sorts worst-case time? In general we ignore machine-dependent constants. Otherwise it's impossible to compare algorithms.

We try to analyze the growth as the size approaches infinity. This is called Asymptotic analysis.

From an engineering standpoint, when analyzing time complexities we drop low-order terms and ignore leading constants.

Eg. 3n^3 + 90n^2 - 5n + 6054 = O(n^3)

When n is large enough an O(n^2) algorithm will ALWAYS beat an O(n^3) algorithm.

Going back to insertion sort.

In the worst case, where input is reverse sorted, the time complexity is of O(n^2).

This holds to be the same for the average case as well (dropping low order terms, leading constants)

So now you should be wondering... Is this fast?

For small n (small sequences), moderately so

For large n, definitely not.

Okay that's great, but if that's not fast for large sequences what do I use instead?

Okay, well lets check out Merge Sort.

Code:

T(n) |MERGE-SORT A[1 . . n]

O(1) | 1. If n = 1, done.

2T(n/2)| 2. Recursively sort A[ 1 . . ⎡n/2⎤ ] and A[ ⎡n/2⎤+1 . . n ] .

O(n) | 3. “Merge” the 2 sorted lists.

I wont go into details as to how this works right now. But this picture should basically describe how it works.

This is linear time. How many times does it do this? Well as you can see it recursively splits the array into two smaller arrays of half the size.

Basically what we have to do is solve:

Code:

T(n) = 2T(n/2) + cn, where c > 0 is constant

T(n) = 2(T(n/4) + cn/2) + cn ....

T(n) = 2(2(T(n/8) + cn/4) + cn/2) + cn

...

We end up with a recursion tree that looks like this

What we get is that merge sort as an upper bound of O(nlg(n)).

Is this good? Well yes. As n gets large nlg(n) grows more slowly than n^2

Asymptotically merge sort beats insertion sort on the worst case.

In general merge sort beats insertion sort for n > 30~ or so.

Get A Top 5 Google Ranking In Under 30 Days!

Subscribe to:
Post Comments (Atom)

## 0 comments:

Post a Comment