Tuesday, December 22, 2020

Remove Duplicate Letters with lexicographical order

Problem: Given a string, remove duplicate letters so that every letter appears once and only once. Make sure the result is the smallest in lexicographical order among all possible results.

Note that input string consists of lowercase English letters only.

Example:

Input: s = "bacbcdc"
Output: "abcd"
Input: s = "cbacdcbc"
Output: "acdb"


Approach: This problem is similar to the problem of removing characters from string except now we need to return the string in sorted order as much as possible. To maintain a lexicographical order we can use a stack. 

Here is our algorithm:

  1. Foreach ch in s
    1. IF ch is already in stack 
      1. No need to do anything, just GoTo the next character  
    2. WHILE stack is not empty AND character at top of stack > ch AND character at top of stack will not occur later in string
      1. Pop from stack
    3. PUSH ch to stack
  2. Make a string of all the characters in the stack and return.

This algorithm will work correctly but it is inefficient for now. Let's take expensive operation one by one and make it efficient:

1.1 IF ch is already in stack: If we go with simple approach of looking at all the value, it will take linear time and its not acceptable so we maintain a hash to check what all the characters are in stack. Now with hash we can do it in constant time. Obviously whenever we push/pop a character to/from stack, we will add/remove it into/from hash too. 

1.2. character at top of stack will not occur later in string: The other 2 conditions are anyway can be checked in O(1) but for this condition we need to look further into the input string and see if the same character does appear in the string later or not. Again it's a linear time operation and is not acceptable. To make it efficient, we can pre process the input string and record the last index in advance of each character in the string so we can always check if last index of top of the stack is greater than the current index or not to efficiently check if character at top of stack will not occur later in string.

 That's it!


Implementation in C#:

        public static string RemoveDuplicateLettersWithLexicographicalOrder(string s)

        {

            if (s?.Length <= 1)

            {

                return s;

            }

            // Maintaining last index in advance to check if a char is does appear later in the string or not

            int[] lastIndex = new int[26];

            for (int i = 0; i < s.Length; ++i)

            {

                lastIndex[s[i] - 'a'] = i;

            }

            // To maintain lexicographical order using a stack

            Stack<char> lexiStack = new Stack<char>();

            // Hash to quickly tell if a char is already in the stack or not

            bool[] charInStack = new bool[26];

            for (int i = 0; i < s.Length; ++i)

            {

                if (charInStack[s[i] - 'a'])

                {

                    continue;

                }

                // Maintaining lexicographical order if top of stack is there later in the string

                while (lexiStack.Count > 0 && lexiStack.Peek() > s[i] && lastIndex[lexiStack.Peek() - 'a'] > i)

                {

                    charInStack[lexiStack.Pop() - 'a'] = false;

                }

                lexiStack.Push(s[i]);

                charInStack[s[i] - 'a'] = true;

            }

            string result = string.Empty;

            while(lexiStack.Count > 0)

            {

                result = lexiStack.Pop() + result;

            }

            return result;

        }


Complexity: O(n)

No comments:

Post a Comment