Tuesday, December 22, 2020

Maximum Product of Word Lengths with no common characters

Problem: Given a string array words, find the maximum value of length(word[i]) * length(word[j]) where the two words do not share common letters. It is given that each word only consists lower case letters. If no such two words exist, return 0.

Example:

Input: ["b","ab","abc","d","cd"]
Output: 4
Explanation: The two words can be "ab", "cd".


Approach: There are multiple ways of solving this problem: 

1. One brute force approach is we take every pair of words and see if there are any common characters in these two words. If there are no common characters, we calculate the product of the lengths and see if it is more than max product which we have calculated till now if yes we assign current product to max product. That's it. This will work but the time complexity will be O(n^2 * m ^2) where n are total number of words and m is the length of the word. For simplicity I have assumed the all words are of same length that is m. This is really very inefficient.

2. Another way is we can maintain a hash of character to list of indices of the words which will tell us about all the words where a particular character (key) is present. This can help us greatly in identifying if two words have same set of common characters.

Now we can take every pair of words like in brute force say words[i] and words[j] and using the above generated hash we can look at each and every character of words[i] to see if for any character j exist in the hash or not. if not, we can take the product of lengths of words[i] and words[j], update the max product in case max product is less than currently calculated product.

This approach is good and is much more efficient than brute force. Let's see it's time complexity:

  1. 1st loop to create the hash will take O(n * m).
  2. 2nd loop to create bool matrix will take O(n^2 *m).
So overall complexity will be O(n*m) + O(n^2 * m)  ~= O(n^2*m). Yes there is an improvement here but can we do even better?

3. Let's see how to get much more efficient algorithm to solve this problem. The problem in above approach is the hash which we are creating is not enough to efficiently tell if two words have common characters or not. Ultimately we need to traverse through each character of words[i] or words[j] to see if in hash for any character, i and j both exist or not and this is the bottleneck here.

Let's use the fact that the input strings can only have lower case English characters. That means total number of character can't be more than 26. If we use bit to represent every character, we will be needing only 26 bits to represent each and every character. That means for hashing which was step 1 in 2nd approach we can use int and set its bit according to characters present in each word so basically we will create a bit map (int) for each input word.

Now for step 2 where we are checking if words[i] and words[j] have common characters or not we can safely say if bitmap[i] & bitmap[j] is 0 then they have no common characters and we can calculate product of lengths of words[i] and words[j] and update max product accordingly so now you see checking of common characters is now done at constant time.       

Implementation in C#:

Approach 2:

        public static int MaxProduct(string[] words)

        {

            if (words?.Length <= 1)

            {

                return 0;

            }

            Dictionary<char, HashSet<int>> charToWordsHash = new Dictionary<char, HashSet<int>>();

            for(int i = 0; i < words.Length; ++i)

            {

                foreach(char ch in words[i])

                {

                    if (!charToWordsHash.ContainsKey(ch))

                    {

                        charToWordsHash.Add(ch, new HashSet<int>());

                    }

                    charToWordsHash[ch].Add(i);

                }

            }

            int maxProduct = 0;

            for (int i = 0; i < words.Length - 1; ++i)

            {

                for (int j = i + 1; j < words.Length; ++j)

                {

                    bool isCommonChar = false;

                    foreach(char ch in words[i])

                    {

                        if (charToWordsHash[ch].Contains(j))

                        {

                            isCommonChar = true;

                            break;

                        }

                    }

                    if (!isCommonChar)

                    {

                        maxProduct = Math.Max(words[i].Length * words[j].Length, maxProduct);

                    }

                }

            }

            return maxProduct;

        }


Approach 3:

        public static int MaxProduct(string[] words)

        {

            if (words?.Length <= 1)

            {

                return 0;

            }

            int[] bitmap = new int[words.Length];

            for (int i = 0; i < words.Length; ++i)

            {

                foreach (char ch in words[i])

                {

                    bitmap[i] |= (1 << (ch - 'a'));

                }

            }

            int maxProduct = 0;

            for (int i = 0; i < words.Length - 1; ++i)

            {

                for (int j = i + 1; j < words.Length; ++j)

                {

                    if ((bitmap[i] & bitmap[j]) == 0)

                    {

                        maxProduct = Math.Max(words[i].Length * words[j].Length, maxProduct);

                    }

                }

            }

            return maxProduct; 

        }


Complexity: O(n^2) for approach #3.

No comments:

Post a Comment