Sunday, June 2, 2024

[Uber][LeetCode] Substring with Concatenation of All Words

Problem: You are given a string s and an array of strings words. All the strings of words are of the same length.

A concatenated string is a string that exactly contains all the strings of any permutation of words concatenated.

  • For example, if words = ["ab","cd","ef"], then "abcdef", "abefcd", "cdabef", "cdefab", "efabcd", and "efcdab" are all concatenated strings. "acdbef" is not a concatenated string because it is not the concatenation of any permutation of words.

Return an array of the starting indices of all the concatenated substrings in s. You can return the answer in any order.

Example:

Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation: The substring starting at 0 is "barfoo". It is the concatenation of ["bar","foo"] which is a permutation of words. The substring starting at 9 is "foobar". It is the concatenation of ["foo","bar"] which is a permutation of words.
Input: s = "barfoofoobarthefoobarman", words = ["bar","foo", "the"]
Output: [6,9,12]
Explanation: The substring starting at 6 is "foobarthe". It is the concatenation of ["foo","bar","the"]. The substring starting at 9 is "barthefoo". It is the concatenation of ["bar","the","foo"]. The substring starting at 12 is "thefoobar". It is the concatenation of ["the","foo","bar"]
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation: There is no concatenated substring.


Approach: Let's start with the simplest approach. We can take every substring of 's' of size equal to the LENGTH(words) * LENGTH(words[0]) and then see if evey word of words is in the substring or not. If yes we can push the start index of substring into result otherwise we move the start index by 1.

Now we need to see how we can do it efficiently. To do it efficiently, we can use hash maps. Basically we can store every word of words in a hash map as key and their frequency as value. We can have another hash map which will track the all the substring of size LENGTH(words[0]) in the current window. 

Current window is s[i ... i + LENGTH(words) * LENGTH(words[0])] where  i = 0 .. n - LENGTH(words) * LENGTH(words[0])

Rest you can understand by just looking at the implementation.


Implementation in C#:

    public IList<int> FindSubstring(string s, string[] words)
    {
        List<int> result = new List<int>();
        int length = s?.Length ?? 0;
        int numOfWords = words?.Length ?? 0;
        if (length == 0 || numOfWords == 0)
        {
            return result;
        }
        int sizeOfWord = words[0].Length;
        int sizeOfWords = sizeOfWord * numOfWords;
        if (length < sizeOfWords)
        {
            return result;
        }
        var wordsMap = this.BuildWordsMap(words);
        for (int i = 0; i <= length - sizeOfWords; ++i)
        {
            var tempMap = new Dictionary<string, int>();
            int count = numOfWords;
            for (int j = i; j < i + sizeOfWords; j += sizeOfWord)
            {
                string currWord = s.Substring(j, sizeOfWord);
                if (!wordsMap.ContainsKey(currWord))
                {
                    break;
                }
                if (!tempMap.ContainsKey(currWord))
                {
                    tempMap[currWord] = 0;
                }
                ++tempMap[currWord];
                if (tempMap[currWord] > wordsMap[currWord])
                {
                    break;
                }
                --count;
            }
            if (count == 0)
            {
                result.Add(i);
            }
        }
        return result;
    }

    private Dictionary<string, int> BuildWordsMap(string[] words)
    {
        var wordsMap = new Dictionary<string, int>();
        foreach(var word in words)
        {
            if (!wordsMap.ContainsKey(word))
            {
                wordsMap[word] = 0;
            }
            ++wordsMap[word];
        }
        return wordsMap;
    }


Complexity: O(n - k * k) where n is length of s, and k is LENGTH(words) * LENGTH(words[0]

No comments:

Post a Comment