Chapt 3.1. Experimental studies

We are interested in the desgin of "good" data structures and algorithms

Data Structure: a systematic way of organising and accessing data

Algorithm: A step-by-step procedure for performing some task in a finite amount of time.

Experimental studies

for time import time
start_time = time()
# run the algorithm
end_time = time()
elaqpsed = end time - start_time

We are interested in the general dependence of running time on the size and structure of the input.

Challenges of experimental analysis:

Goal of experimental analysis:

Counting prime operations:

Perform an analysis directly on a high-level description of the algorithm. Define a set of primitive operations such as the following:

Henceforth, to capture the growth of an algorithms's running time, we will associate a function f(n)f(n) that characterises the number of primitive operations performed, relative to the input size nn.

Focus on worst-case input

Recursion

Begin with the following four example of the use of recursion, providing an python implementation for each:

  1. The factorial function
  2. An English rules
  3. Binary search
  4. File system for a computer, in which directories can be nested arbitrarily deep within other directories

The factorial function

n!={1if xn=0n(n1)(n1)21if x1 n! = \begin{cases} 1 & \text{if } x \in n = 0\\ n \cdot (n-1)\cdot (n-1) \cdot \cdot \cdot 2 \cdot 1 & \text{if } x \ge 1 \end{cases}

There is a natural recursive definition for the factorial function. Observe that 5!=54321=54!5!=5\cdot 4 \cdot 3 \cdot 2 \cdot 1 = 5\cdot 4!. Thus, the recrusive defintion can be formalised as

n!={1if xn=0n(n1)!if x1 n! = \begin{cases} 1 & \text{if } x \in n = 0\\ n \cdot (n-1)! & \text{if } x \ge 1 \end{cases}

Recursive implementation of the Factorial function

def factorial(n):
    if n == 1:
        return 1
    else:
        return n * factorial(n-1)

Drawing the English Rulers

Although it is possible to draw such a ruler with iteration, the task is considerably easier with iteration

Python Implementation

def draw_line(tick_length, tick_label=''):
"""Draw one line with given tick length (followed by optional label)"""
line = '-' * tick_length
if tick_label:
    line += ' ' + tick_label

def draw_interval(center_length):
    """ Draw tick length based upon a central tick length."""
    if center_length > 0:       # stop when length drops to 0
        draw_interval(center_length - 1)    # recursively draw top ticks
        draw_line(center_length)            # draw center tick
        draw_interval(center_length - 1)    # recursively draw bottom ticks

def draw_ruler(num_inches, major length):
    """Draw English ruler with given numbr of inches, major tick length"""
    draw_line(major_length, '0')        # draw 0 inch line
    for j in range(1, 1+num_inches):
        draw_interval(major_length - 1)     # draw interior tick for inch
        draw_line(major_length, str(j))     # draw in j line and label

The execution of the recursive draw_interval function can be visualised using a recursion trace.


This algorithm is used to efficiently locate a target value within a sorted sequence of nn elements.

  1. If the target equals data[mid], return the result
  2. If target < data[mid], recur the first half of the sequence
  3. If target > data[mid], then recur the second half of the sequence

Whereas a sequential search runs in O(n)O(n) time, binary search runs in O(logn)O(\log n) time

""" Return true us target is found in indicated portion of a Python list
The search only considers the portion from data[low] to data[high] inclusive
"""
def binary_search(data, target, low, high):
    if low > high:
        return False
    else:
        mid = (low+ high) // 2
        if target == data[mid]:
            #recur on the portion left of the middle
            return binary_search(data, target, low, mid-1)
        else:
            #recur on the portion right of the middle
            return binary search(data, target, mid + 1, high)

File Systems

A file system consists of a top-level directory, and the contents of this directory consists of files and other directories, which in turn contain files and other directories, and so on. The operating system allow directories to be nested arbitrarily deep

Algorithm DiskUsage(path):
    Input: A string designating a path to a file-system entry
    Output: The cumulative disk space used by that entry and any nested entries
    total = size(path)
    if path represents a directory then
        for each child stored within directory path do
            total = total + DiskUsage(child)
    return total

Analysing Recursive Algorithms

For each recursive algorithm, we will account for each operation that is performed based upon the particular Activation of the function that manages the flow of control at the time it is executed

Factorial algorithm

Drawing an English ruler

Recursion run Amok

An inefficent recursion for Computing Fibonacci Numbers

Fibonacci:

F0=0F_{0} = 0
F1=1F_{1} = 1
Fn=Fn2+Fn1,n>1F_{n} = F_{n-2} + F_{n-1}, n>1

A direct implementation based on the algorithm would be as follows:

def bad_fibonacci(n):
    """Return the nth Fibonacci number"""
    if n <= 1:
        return n
    else:
        return bad_fibonacci(n-1) + bad_fibonacci(n-2)

Computing the nth Fibonacci number in this way requires an exponential number of calls to the function. The number of calls cnc_{n} for input size nn is cn>2n/2c_{n}>2^{n/2}, which mean that bad_fibonacci(n) makes a number of calls that is exponential in nn.

We can compute FnF_{n} much more efficiently in which invocation only makes one recursive call. Rather than having the function return a single value, we define a recursive function that returns a pair of consecutive fibonacci numbers.

def good_fibonacci(n):
    """ Return pair of Fibonacci numbers, F(n) and F(n-1)"""
    if n <= 1:
        return (n, 0)
    else:
        (a, b) = good_fibonacci(n-1)
        return (a+b, a)

The execution of good_fibonacci(n) table O(n)O(n) time.

4.4 Further examples of Recursion

Linear recursion

Example: Summing the elements of a sequence recursively

The following computes the sum of a sequence

def linear_sum(S, n)
""" Compute the sum of a sequence of the first n numbers of Sequence S"""
    if n == 0:
        return 0
    else:
        return linear_sum(S, n-1) + S[n-1]

Binary recursion

Revisit the problem of summing nn elements of s sequence SS, of numbers. With two or more elements, we can recursively compute the sum of the first half, and the sum of the second half, and ad these together as shown in the following:

def binary_sum(S, start, stop):
    """ Return the sum of the numbers in implicit slice S[start:stop]. """
    if start >= stop:
        return 0
    elif start == stop - 1:
        return S[start]
        else: 
            mid = (start + stop) // 2
            return binary_sum(S, start, mid) + binary_sum(S, mid, stop)

The size of the range is divided in half for each recursive call, and uses logn\log n amount of additional space, compared to nn space used in the linear_sum function. However, the running time of binary_sum is O(n)O(n), as there are 2n12n-1 function calls, each requiring constant time.

Multiple recursion

The recursion of analysing the disk space usage of a file system is an example of multiple recursion, because the number of recursive calls made during one invocation was equal to the number of entries within a given directory of a file system.

Designing Recursive algorithms

An algorithm that uses recursion typically has the following form:

Eliminating tail recursion

Some forms of recursion can be eliminated without any use of auxillary memory. A notable form is known as tail recursion.

def binary_search(data, target):
    """return True if target is found in the given python list"""
    low = 0
    high = len(data) - 1
    while low <= high:
        mid = low+high // 2
        if target == data[mid]:
            return True
        elif target < data[mid]:
            high = mid - 1
        else:
            low = mid + 1
    return False

Where we made a recursive call binary_search(data, target, low, mid-1) in the original version, we replace high = mid - 1 in our new version and continue our iteration.