Friday, February 19, 2021

[Google Question] Encode and Decode Strings

Problem: Design an algorithm to encode a list of strings to a string. The encoded string is then sent over the network and is decoded back to the original list of strings.

Approach: One obvious approach is to have a Non-ASCII character as a separator. While encoding the string we can join all the strings with a Non-ASCII separator. While decoding the string we can split based on the same separator and return the array. 

The only challenge here is how to differentiate between array containing only empty string and an empty array while decoding. What we can do is while encoding we can add the length of the array as string in the input string list (at first or last) and then we can apply the join operation. While decoding after splitting the input string, we can read the length first (first/last element of array). In this way we can differentiate between empty array and array containing empty string.

The above solution works but we have dependency that the separator character should not be in the original string. I know we are using Non-ASCII character but still it might appear in the input string so we might need to think of a generic solution. 

We can use approach which is similar to HTTP v1.1 Chunked Transfer Encoding. First add the length of the chunk first in four bytes and then add the original chunk so in our case while encoding we first add the length of the string (4 character string) and then we add the string to output. With this approach here are the steps we are going to follow:

Encoding:

  • For each string in input string array
  • Compute the 4 character length string
  • append length string to output
  • append string to output

Decoding:

  • i = 0
  • i < length of input string
  • Convert  input_string[i...i + 3] (4 characters) to integer to get the length.
  • i = i + 4
  • Add inputString[i...i + length - 1] to output
  • i = i + length

That's all!


Implementation in C#:

With Non-ASCII Separator:

 public class Codec 

{

    public string encode(IList<string> strs) 

    {

        List<string> list = strs.ToList();

        list.Add(strs.Count.ToString());

        return string.Join((char)257, list);

    }


    public IList<string> decode(string s) {

        List<string> strs = s.Split(((char)257).ToString()).ToList();

        if (int.Parse(strs.Last()) == 0)

        {

            return new List<string>();

        }

        strs.RemoveAt(strs.Count - 1);

        return strs;

    }

}


Generic Solution:

public class Codec 

{

    public string encode(IList<string> strs) 

    {

        string encodedStr = string.Empty;

        foreach(string str in strs)

        {

            encodedStr += this.GetLengthString(str);

            encodedStr += str;

        }

        return encodedStr;

    }

    

    private string GetLengthString(string str)

    {

        int length = str.Length;

        char[] bytes = new char[4];

        for (int i = 3; i >= 0; --i)

        {

            bytes[3-i] = (char)((length >> (i * 8)) & 0xff);

        }

        return new string(bytes);

    }


    public IList<string> decode(string s) 

    {

        int i = 0;

        List<string> decodedStrings = new List<string>();

        while (i < s.Length)

        {

            int chunkLength = this.GetChunkLength(s.Substring(i, 4));

            i += 4;

            decodedStrings.Add(s.Substring(i, chunkLength));

            i += chunkLength;

        }

        return decodedStrings;

    }

    

       

    private int GetChunkLength(string byteStr)

    {

        int length = 0;

        foreach (char byteChar in byteStr)

        {

           length = (length << 8) | (int)byteChar;

        }

        return length;

    }

}


Complexity: O(n) for both approaches

No comments:

Post a Comment