Monday, January 25, 2021

[LeetCode] Minimum Genetic Mutation

Problem: A gene string can be represented by an 8-character long string, with choices from "A", "C", "G", "T".

Suppose we need to investigate about a mutation (mutation from "start" to "end"), where ONE mutation is defined as ONE single character changed in the gene string.

For example, "AACCGGTT" -> "AACCGGTA" is 1 mutation.

Also, there is a given gene "bank", which records all the valid gene mutations. A gene must be in the bank to make it a valid gene string.

Now, given 3 things - start, end, bank, your task is to determine what is the minimum number of mutations needed to mutate from "start" to "end". If there is no such a mutation, return -1.

Note:

  1. Starting point is assumed to be valid, so it might not be included in the bank.
  2. If multiple mutations are needed, all mutations during in the sequence must be valid.
  3. You may assume start and end string is not the same.

Example:

start: "AACCGGTT"
end:   "AACCGGTA"
bank: ["AACCGGTA"]

return: 1
start: "AACCGGTT"
end:   "AAACGGTA"
bank: ["AACCGGTA", "AACCGCTA", "AAACGGTA"]

return: 2
start: "AAAAACCC"
end:   "AACCCCCC"
bank: ["AAAACCCC", "AAACCCCC", "AACCCCCC"]

return: 3


Approach: This problem is similar to Word Ladder where to we need to find the length of shortest transformation sequence from beginWord to endWord. This problem is simpler as substitution character are limited which are A, C, G and T only so we don't need to maintain a hash. We will use BFS here too.


Implementation in C#:

        public static int MinMutation(string start, string end, string[] bank)

        {

            HashSet<string> bankSet = new HashSet<string>(bank);

            bankSet.Remove(start);

            Queue<string> queue = new Queue<string>();

            queue.Enqueue(start);

            char[] geneChars = new char[] { 'A', 'C', 'G', 'T' };

            int count = -1;

            while(queue.Count > 0)

            {

                ++count;

                int size = queue.Count;

                for (int i = 0; i < size; ++i)

                {

                    string currStr = queue.Dequeue();

                    if (currStr == end)

                    {

                        return count;

                    }

                    char[] currStrArr = currStr.ToCharArray();

                    for (int j = 0; j < currStrArr.Length; ++j)

                    {

                        char origChar = currStrArr[j];

                        for (int k = 0; k < geneChars.Length; ++k)

                        {

                            if (origChar == geneChars[k])

                            {

                                continue;

                            }

                            currStrArr[j] = geneChars[k];

                            string newUpdatedString = new string(currStrArr);

                            if (bankSet.Contains(newUpdatedString))

                            {

                                queue.Enqueue(newUpdatedString);

                                bankSet.Remove(newUpdatedString);

                            }

                        }

                        currStrArr[j] = origChar;

                    }

                }


            }

            return -1;

        }


Complexity: O(n * 8 * 4) =  O(32 * n) =~ O(n) 

No comments:

Post a Comment