Thursday, November 27, 2014

Suffix Array

A suffix array is a sorted array of all the suffixes of a given string. It does the same thing which suffix tree does.
Advantage of suffix array over suffix trees include improved space requirements, simpler linear time construction algorithms and improved cache locality.

Building Suffix Array - Method 1: Make an array of all suffixes and then sort the array.

Implementation:

struct Suffix
{
        int index;
        char* suff;
};

bool compareSuffixes(const Suffix& s1, const Suffix& s2)
{
        return std::strcmp(s1.suff, s2.suff) < 0 ? true : false;
}

int* buildSuffixArray(char* str, int len)
{
        Suffix* suffixes = new Suffix[len];
        for(int i = 0; i < len; ++i)
        {
                suffixes[i].index = i;
                suffixes[i].suff = str + i;
        }
        std::sort(suffixes, suffixes + len, compareSuffixes);
        int *suffixArr = new int[len];
        for(int i = 0; i < len; ++i)
                suffixArr[i] = suffixes[i].index;
        return suffixArr;

}

Time complexity: O(n^2logn)

Building Suffix Array - Method 2: The idea is to use the fact that strings that are to be sorted are suffixes of a single string. The algorithm is mainly based on maintaining the order of the string’s suffixes sorted by their 2^k long prefixes.

Implementation:

struct Suffix
{
        int index;
        int rank[2];
};

bool compareSuffix(const Suffix& s1, const Suffix& s2)
{
        return s1.rank[0] == s2.rank[0] ? (s1.rank[1] < s2.rank[1] ? true : false) : (s1.rank[0] < s2.rank[0] ? true : false);
}

int* buildSuffixArray(char* text, int len)
{
        Suffix* suffixes = new Suffix[len];
        for(int i = 0; i < len; ++i)
        {
                suffixes[i].index = i;
                suffixes[i].rank[0] = text[i] - 'a';
                suffixes[i].rank[1] = ((i+1) < len) ? (text[i+1] - 'a') : -1;
        }

        std::sort(suffixes, suffixes + len, compareSuffix);

        int* index = new int[len];

        for(int k = 4; k < 2*len; k = k*2)
        {
                int rank = 0, prev_rank = suffixes[0].rank[0];
                suffixes[0].rank[0] = rank;
                index[suffixes[0].index] = 0;

                for(int i = 1; i < len; ++i)
                {
                        if(suffixes[i].rank[0] == prev_rank && suffixes[i].rank[1] == suffixes[i-1].rank[1])
                        {
                                prev_rank = suffixes[i].rank[0];
                                suffixes[i].rank[0] = rank;
                        }
                        else
                        {
                                prev_rank = suffixes[i].rank[0];
                                suffixes[i].rank[0] = ++rank;
                        }
                        index[suffixes[i].index] = i;
                }
                for(int i = 0; i < len; ++i)
                {      
                        int nextIndex = suffixes[i].index + k/2;
                        suffixes[i].rank[1] = (nextIndex < len) ? suffixes[index[nextIndex]].rank[0]: -1;
                }

                std::sort(suffixes, suffixes + len, compareSuffix);

        }

        int *suffixArr = new int[len];
        for(int i = 0; i < len; ++i)
                suffixArr[i] = suffixes[i].index;
        return suffixArr;

}


Searching a Pattern: Now we have a sorted array of all the suffixes, we can use binary search to search pattern in the text.

int search(char *text, char *pattern, int *suffArr)
{
        int len = std::strlen(text);
        int patrnLen = std::strlen(pattern);
        int left = 0, right = len - 1;
        while(left <= right)
        {
                int mid = left + (right - left) / 2;
                int result = std::strncmp(pattern, text + suffArr[mid], patrnLen);
                if(result == 0)
                        return suffArr[mid];
                if(result < 0)
                        right = mid - 1;
                else
                        left = mid + 1;
        }
        return -1;

}

Time complexity: O(mlogn)

________________________________________________

Random Access List

An interesting persistent data structure that combines the singly linked list with the binary tree is Random-Access List. This data structure allows for random access of its items as well as adding and removing items from the beginning of the list. 
                  It is structured as a Singly Linked List of Completely Balanced Binary Trees. The advantage of this data structure is that it allows access, insertion, and removal of the head of the list in O(1) time as well as provides logarithmic performance in randomly accessing its items. Following is a Random-Access list with 13 items --


Insertion: When a node is added to the list, the first two root nodes (if they exist) are checked to see if they both have the same height. If so, the new node is made the parent of the first two nodes; the current head of the list is made the left child of the new node, and the second root node is made the right child. If the first two root nodes do not have the same height, the new node is simply placed at the beginning of the list and linked to the next tree in the list.

Removal: To remove the head of the list, the root node at the beginning of the list is removed, with its left child becoming the new head and its right child becoming the root of the second tree in the list. The new head of the list is right linked with the next root node in the list. Just see the following figure to get clear picture --


Random Access: The algorithm for finding a node at a specific index is in two parts: in the first part, we find the tree in the list that contains the node we're looking for. In the second part, we descend into the tree to find the node itself. The following algorithm is used to find a node in the list at a specific index:
  1. Let I be the index of the node we're looking for. Set T to the head of the list where T will be our reference to the root node of the current tree in the list we're examining.
  2. If I is equal to 0, we've found the node we're looking for; terminate algorithm. Else if I is greater than or equal to the number of nodes in T, subtract the number of nodes in T from I and set T to the root of the next tree in the list and repeat step 2. Else if I is less than the number of nodes in T, go to step 3.
  3. Set S to the number of nodes in T divided by 2 (the fractional part of the division is ignored. For example, if the number of nodes in the current subtree is 3, S will be 1).
  4. If I is less than S, subtract 1 from I and set T to T's left child. Else subtract (S + 1) from I and set T to T's right child.
  5. If I is equal to 0, we've found the node we're looking for; terminate algorithm. Else go to step 3.
Following image illustrates how to find the 10th item in the list --

Algorithm for rest of the operations are straight forward. I don't think, we need to discuss.

______________________________________

Wednesday, November 26, 2014

Pubmatic Question: Longest Bitonic subsequence

Given an array arr[0 … n-1] containing n positive integers, a subsequence of arr[] is called Bitonic if it is first increasing, then decreasing. Write a function that takes an array as argument and returns the length of the longest bitonic subsequence.

Solution:
int lbs(int *arr, int len)
{
        if(arr == NULL || len == 0)
                return 0;

        int *lis = new int[len];
        for(int i = 0; i < len; ++i)
                lis[i] = 1;

        for(int i = 1; i < len; ++i)
        {
                for(int j = 0; j < i; ++j)
                {
                        if(arr[i] > arr[j] && lis[i] < lis[j] + 1)
                                lis[i] = lis[j] + 1;
                }
        }

        int *lds = new int[len];
        for(int i = 0; i < len; ++i)
                lds[i] = 1;

        for(int i = len-2; i >= 0; --i)
        {
                for(int j = len-1; j > i; --j)
                {
                        if(arr[i] > arr[j] && lds[i] < lds[j] + 1)
                                lds[i] = lds[j] + 1;
                }
        }

        int max = lis[0] + lds[0] - 1;

        for(int i = 1; i < len; ++i)
        {
                if(max < lis[i] + lds[i] - 1)
                        max = lis[i] + lds[i] - 1;
        }       

        return max;
}

Pubmatic Question: Bitonic subarray with maximum length

Given an array A[0 … n-1] containing n positive integers, a subarray A[i … j] is bitonic if there is a k with i <= k <= j such that A[i] <= A[i + 1] ... <= A[k] >= A[k + 1] >= .. A[j – 1] > = A[j]. Write a function that takes an array as argument and returns the length of the maximum length bitonic subarray.

Solution:
int maxBitonicSubArrayLen(int *arr, int len)
{
        if(arr == NULL || len == 0)
                return 0;
        bool isDecreasing = false;
        int max = 1, count = 1;
        for(int i = 0; i < len - 1; ++i)
        {
                if(arr[i] <= arr[i+1])
                {
                        if(isDecreasing)
                        {
                                count = 1;
                                isDecreasing = false;
                        }
                        ++count;
                }
                else
                {
                        if(!isDecreasing)
                                isDecreasing = true;
                        ++count;
                }
                if(max < count)
                        max = count;
        }
        return max;
}