Saturday, February 13, 2021

[Google Question] Repeated Substring Pattern

Problem: Given a non-empty string check if it can be constructed by taking a substring of it and appending multiple copies of the substring together. Basically we need to verify if the given string can be constructed by concatenation of one of substrings multiple times.

Example:

Input: "abcabc"
Output: True
Explanation: It's the substring "abc" twice.


Approach: Let's first discuss the simple idea to solve this problem. Repeated substring string should look like:

  • substr1substr1

and the other one will look like:

  • substr1substr2

Right. Let's double the strings basically create a string by concatenating the given string twice. The strings will look like:

  • substr1substr1substr1substr1
  • substr1substr2substr1substr2

Let's remove the first and last characters from the string:

  • ubstr1substr1substr1substr
  • ubstr1substr2substr1substr

Just by looking at the above strings say 'doubleStrings' we can tell in case of repeated substring string, input string will be a substring of its doubleString. Otherwise not. Here is our algorithm:

  • doubleString = s + s
  • doubleString = doubleString.Substr(1, 2 * length - 1)
  • RETURN doubleString.Contains(s);

The complexity of above solution will be O(n^2) where is n is the length of input string. We can use KMP here to implement contains method in linear time.

However we can use KMP here to solve this question without doing all the above process. We will preprocess the original string to get the longest proper prefix which is also a suffix but we won't execute match(obviously we don't have any pattern to match). 

Say the length of the longest prefix/suffix is len, we can say that the length of the repeated sequence should be n - len. To verify it we should check if n - len is a divisor of n or not. Let's see it using example:

abcabcabc : len = 6, n = 9, repeated sequence length should be 9 - 6 = 3. 9 % 3 = 0 so we can say it is a repeated substring string.

abcdab: len = 2, n = 6. repeated sequence length should be 6 - 2 = 4. 6 % 4 != 0 so we can say it is not a repeated substring string.

Please run more examples and then you will understand what is happening here.


Implementation in C#:

        public bool RepeatedSubstringPattern(string s)

        {

            int n = s.Length;

            int[] lps = new int[n];

            int length = 0;

            int i = 1;

            while(i < n)

            {

                if (s[i] == s[length])

                {

                    ++length;

                    lps[i] = length;

                    ++i;

                }

                else

                {

                    if (length > 0)

                    {

                        length = lps[length - 1];

                    }

                    else

                    {

                        lps[i] = 0;

                        ++i;

                    }

                }

            }

            return length != 0 && n % (n - length) == 0;

        }


Complexity: O(n)

No comments:

Post a Comment