Friday, October 4, 2024

[LeetCode] Snapshot Array

Problem: Implement a SnapshotArray that supports the following interface:

  • SnapshotArray(int length) initializes an array-like data structure with the given length. Initially, each element equals 0.
  • void set(index, val) sets the element at the given index to be equal to val.
  • int snap() takes a snapshot of the array and returns the snap_id: the total number of times we called snap() minus 1.
  • int get(index, snap_id) returns the value at the given index, at the time we took the snapshot with the given snap_id

Example:

Input: ["SnapshotArray","set","snap","set","get"]
[[3],[0,5],[],[0,6],[0,0]]
Output: [null,null,0,null,5]
Explanation: 
SnapshotArray snapshotArr = new SnapshotArray(3); // set the length to be 3
snapshotArr.set(0,5);  // Set array[0] = 5
snapshotArr.snap();  // Take a snapshot, return snap_id = 0
snapshotArr.set(0,6);
snapshotArr.get(0,0);  // Get the value of array[0] with snap_id = 0, return 5

Constraints:

  • 1 <= length <= 5 * 10^4
  • 0 <= index < length
  • 0 <= val <= 10^9
  • 0 <= snap_id < (the total number of times we call snap())
  • At most 5 * 10^4 calls will be made to set, snap, and get.


Approach: First thing in the problem we can notice that we can't save full array, every time we take the snapshot as the length of the array could be 5 * 10 ^ 4 and snap can be called 5 * 10 ^ 4 times.

What we can do is we only store the updates which happened in the current snapshot. In this way for every index we will have a list of snapshot id and the value updates so now when we try to get the value given the snapshot we can do following:

  • If no modification i.e. list of updates at the input index will be  null or if no snapshot is taken i.e. current snapshot id is still 0 then we can simply return 0.
  • Else we find the upper bound of snap_id using binary search and return the stored value.

That's all!


Implementation in C#:

public class SnapshotArray
{
    public SnapshotArray(int length)
    {
        this.snapshots = new List<Tuple<int, int>>[length];
        this.currSnapId = 0;
    }
   
    public void Set(int index, int val)
    {
        if (this.snapshots[index] == null)
        {
            this.snapshots[index] = new List<Tuple<int, int>>();
        }
        int length = this.snapshots[index].Count;
        if (length > 0 &&
            this.snapshots[index][length - 1].Item1 == this.currSnapId)
        {
            this.snapshots[index].RemoveAt(length - 1);
        }
        this.snapshots[index].Add(new Tuple<int, int>(this.currSnapId,
                                                      val));
    }
   
    public int Snap()
    {
        ++this.currSnapId;
        return this.currSnapId - 1;
    }
   
    public int Get(int index, int snap_id)
    {
        // No modification or no snapshot taken
        if (this.snapshots[index] == null || this.currSnapId == 0)
        {
            return 0;
        }
        int snapIndex = this.FindSnapIndex(this.snapshots[index], snap_id);
        return snapIndex == -1 ?
               0 :
               this.snapshots[index][snapIndex].Item2;
    }

    private int FindSnapIndex(List<Tuple<int, int>> snapIds, int snapId)
    {
        int start = 0, end = snapIds.Count - 1;
        if (snapId >= snapIds[end].Item1)
        {
            return end;
        }
        while (start <= end)
        {
            int mid = start + (end - start) / 2;
            int currSnap = snapIds[mid].Item1;
            if (currSnap == snapId)
            {
                return mid;
            }
            else if (currSnap < snapId)
            {
                start = mid + 1;
            }
            else
            {
                end = mid - 1;
            }
        }
        return end;
    }

    private List<Tuple<int, int>>[] snapshots;
    private int currSnapId;
}

Complexity: SnapshotArray: O(1), Set: O(1), Snap: O(1), Get: O(logn) 

[LeetCode] Time Based Key-Value Store

Problem: Design a time-based key-value data structure that can store multiple values for the same key at different time stamps and retrieve the key's value at a certain timestamp.

Implement the TimeMap class:

  • TimeMap() Initializes the object of the data structure.
  • void set(String key, String value, int timestamp) Stores the key key with the value value at the given time timestamp.
  • String get(String key, int timestamp) Returns a value such that set was called previously, with timestamp_prev <= timestamp. If there are multiple such values, it returns the value associated with the largest timestamp_prev. If there are no values, it returns "".

Example:

Input
["TimeMap", "set", "get", "get", "set", "get", "get"]
[[], ["foo", "bar", 1], ["foo", 1], ["foo", 3], ["foo", "bar2", 4], ["foo", 4], ["foo", 5]]
Output
[null, null, "bar", "bar", null, "bar2", "bar2"]

Explanation
TimeMap timeMap = new TimeMap();
timeMap.set("foo", "bar", 1);  // store the key "foo" and value "bar" along with timestamp = 1.
timeMap.get("foo", 1);         // return "bar"
timeMap.get("foo", 3);         // return "bar", since there is no value corresponding to foo at timestamp 3 and timestamp 2, then the only value is at timestamp 1 is "bar".
timeMap.set("foo", "bar2", 4); // store the key "foo" and value "bar2" along with timestamp = 4.
timeMap.get("foo", 4);         // return "bar2"
timeMap.get("foo", 5);         // return "bar2"

Constraints:

  • 1 <= key.length, value.length <= 100
  • key and value consist of lowercase English letters and digits.
  • 1 <= timestamp <= 10^7
  • All the timestamps timestamp of set are strictly increasing.
  • At most 2 * 10^5 calls will be made to set and get.


Approach: This is a binary search problem where we need to find the timestamp which is equal to or just smaller than the given timestamp. 

Given the constraint that timestamps are coming in strictly increasing order, we don't have to sort or maintain a sorted data structure. We can simply use a list.


Implementation in C#:

public class TimeMap
{
    public TimeMap()
    {
        this.kvMap = new Dictionary<string, List<Tuple<int, string>>>();
    }
   
    public void Set(string key, string value, int timestamp) {
        if (!this.kvMap.ContainsKey(key))
        {
            this.kvMap[key] = new List<Tuple<int, string>>();
        }
        this.kvMap[key].Add(new Tuple<int, string>(timestamp, value));
    }
   
    public string Get(string key, int timestamp)
    {
        if (!kvMap.ContainsKey(key))
        {
            return "";
        }
        int index = this.GetIndex(kvMap[key], timestamp);
        return index == -1 ?
               "" :
               kvMap[key][index].Item2;
    }

    private int GetIndex(List<Tuple<int, string>> sortedTimestamps,
                         int timestamp)
    {
        int start = 0, end = sortedTimestamps.Count - 1;
        if (start > end || timestamp < sortedTimestamps[0].Item1)
        {
            return -1;
        }
        while (start <= end)
        {
            int mid = start + (end - start) / 2;
            int currTimestamp = sortedTimestamps[mid].Item1;
            if (currTimestamp == timestamp)
            {
                return mid;
            }
            if (currTimestamp < timestamp)
            {
                start = mid + 1;
            }
            else
            {
                end = mid - 1;
            }
        }
        return end;
    }

    private Dictionary<string, List<Tuple<int, string>>> kvMap;
}

Complexity: Set - O(1), Get - O(logn)

Thursday, October 3, 2024

[LeetCode] Count Negative Numbers in a Sorted Matrix

Problem: Given a m x n matrix grid which is sorted in non-increasing order both row-wise and column-wise, return the number of negative numbers in grid.

Example:

Input: grid = [[4,3,2,-1],[3,2,1,-1],[1,1,-1,-2],[-1,-1,-2,-3]]
Output: 8
Explanation: There are 8 negatives number in the matrix.
Input: grid = [[3,2],[1,0]]
Output: 0

Approach: Given we have a matrix which is sorted row wise and column wise, we can use the approach which we have used in the problem of searching an element in row wise and column wise sorted matrix.

Here, I am going to start from bottom left corner as the matrix is sorted in descending order but we can solve this problem starting from top right corner too.


Implementation in C#:

    public int CountNegatives(int[][] grid)
    {
        int rows = grid?.Length ?? 0;
        if (rows == 0)
        {
            return 0;
        }
        int cols = grid[0].Length;
        int row = rows - 1, col = 0;
        int totalNegNums = 0;
        while (row >= 0 && col < cols)
        {
            if (grid[row][col] < 0)
            {
                totalNegNums += (cols - col);
                --row;
            }
            else
            {
                ++col;
            }
        }
        return totalNegNums;
    }


Complexity: O(m + n)

[LeetCode] Search Insert Position

Problem: Given a sorted array of distinct integers and a target value, return the index if the target is found. If not, return the index where it would be if it were inserted in order.

Example:

Input: nums = [1,3,5,6], target = 5
Output: 2
Input: nums = [1,3,5,6], target = 2
Output: 1
Input: nums = [1,3,5,6], target = 7
Output: 4

Constraints:

  • 1 <= nums.length <= 10^4
  • -10^4 <= nums[i] <= 10^4
  • nums contains distinct values sorted in ascending order.
  • -10^4 <= target <= 10^4


Approach: The problem here is just to find the target itself or just greater element in the array so if you see this is a binary search problem with following changes:

  1. If nums[mid] < target then obviously we need to go to the right side i.e. start = mid + 1
  2. Else end = mid
  3. Return end.


Implementation in C#:

    public int SearchInsert(int[] nums, int target)
    {
        int start = 0, end = nums.Length - 1 ;
        if (target < nums[start])
        {
            return start;
        }
        if (target > nums[end])
        {
            return end + 1;
        }
        while (start < end)
        {
            int mid = start + (end - start) / 2;
            if (nums[mid] < target)
            {
                start = mid + 1;
            }
            else
            {
                end = mid;
            }
        }    
        return end;
    }


Complexity: O(log(n))

[LeetCode] Find Smallest Letter Greater Than Target

Problem: You are given an array of characters letters that is sorted in non-decreasing order, and a character target. There are at least two different characters in letters.

Return the smallest character in letters that is lexicographically greater than target. If such a character does not exist, return the first character in letters

Example:

Input: letters = ["c","f","j"], target = "a"
Output: "c"
Explanation: The smallest character that is lexicographically greater than 'a' in letters is 'c'
Input: letters = ["c","f","j"], target = "c"
Output: "f"
Explanation: The smallest character that is lexicographically greater than 'c' in letters is 'f'.
Input: letters = ["x","x","y","y"], target = "z"
Output: "x"
Explanation: There are no characters in letters that is lexicographically greater than 'z' so we return letters[0].

Constraints:

  • 2 <= letters.length <= 10^4
  • letters[i] is a lowercase English letter.
  • letters is sorted in non-decreasing order.
  • letters contains at least two different characters.
  • target is a lowercase English letter.


Approach: This is a simple binary search problem with following changes:

  1. If letters[mid] <= target then obviously we will search on the right side so start becomes mid + 1.
  2. Else we go to the left to see there is a smaller letter which is greater than target so end becomes mid.
  3. In the end we return letters[end]


Implementation in C#:

    public char NextGreatestLetter(char[] letters, char target)
    {
        int start = 0, end = letters.Length - 1;
        if (target < letters[start] || target >= letters[end])
        {
            return letters[start];
        }
        while (start < end)
        {
            int mid = start + (end - start) / 2;
            if (letters[mid] <= target)
            {
                start = mid + 1;
            }
            else
            {
                end = mid;
            }
        }
        return letters[end];
    }


Complexity: O(log(n))

Monday, August 19, 2024

[LeetCode] 2 Keys Keyboard

Problem: There is only one character 'A' on the screen of a notepad. You can perform one of two operations on this notepad for each step:

  • Copy All: You can copy all the characters present on the screen (a partial copy is not allowed).
  • Paste: You can paste the characters which are copied last time.

Given an integer n, return the minimum number of operations to get the character 'A' exactly n times on the screen.

Example:

Input: n = 3
Output: 3
Explanation: Initially, we have one character 'A'.
In step 1, we use Copy All operation.
In step 2, we use Paste operation to get 'AA'.
In step 3, we use Paste operation to get 'AAA'.
Input: n = 1
Output: 0


Approach: We are going to use bottom-up DP here. We can define a 1D array where table[i] tells the minimum number of steps required to display i A's. Obviously table[n] will be our answer.

Now the main logic here is to get the relations between table[i] and table[i-1]...table[1], right?  Let's understand it; let's say we had j A's on the screen where 1 <= j < i. Now we can say we can copy these j A's and paste it exactly (i - j) / j times to display i A's so total number of operations will be:

f(i) = f(j) + 1 + (i - j) / j

Why? Let's see:

  • f(j) - Number of steps to display j A's
  • 1 - Copy All
  • (i - j) / j - Number of pastes.

Now let's simplify this equation:

f(i) = f(j) + 1 + (i - j) / j

=> f(i) = f(j) + 1 + (i/j) - (j/j)

=> f(i) = f(j) + 1 + (i/j) - 1

=> f(i) = f(j) + i / j

So here is our final relation:

f(i) = f(j) + i / j

We also need to make sure that i should be divisible by j as it's kind of repeated pastes. Now we just need to get the minimum of all such j's where 1 <= j < i and i is divisible by j.

That's all!


Implementation in C#:

    public int MinSteps(int n)
    {
        if (n <= 1)
        {
            return 0;
        }
        int[] table = new int[n + 1];
        for (int i = 2; i <= n; ++i)
        {
            table[i] = int.MaxValue;
            for (int j = 1; j <= i / 2; ++j)
            {
                if (i % j != 0)
                {
                    continue;
                }
                table[i] = Math.Min(table[i], table[j] + i / j);
            }
        }
        return table[n];
    }


Complexity: O(n^2)

Thursday, August 15, 2024

Design Stack overflow

Problem: Design a highly scalable public discussion forum like reddit, quora or Stack Overflow. Here are the features that needs to be supported:

  • Post questions / news to public
  • Comments on existing posts
  • Upvote/downvote posts or comments
  • Feed of most popular posts.


Requirements:

Functional requirements:

  1. This is browser only application.
  2. A logged in user only can post, comment and vote.
  3. A post contains following:
    • Title
    • Tags
    • Body containing text and image
  4. Any user (logged in) can comment on any post.
  5. Comments are shown as list in descending order by time.
  6. Use can delete his/her own comments or posts.
  7. Any user can vote any post or comment.
  8. A user's home page contains the top popular posts in the last 24 hours where:
    • Popularity = Upvotes - Downvotes

Non functional requirements:

  1. Scalabilty: Millions of daily users.
  2. Performance: 500 ms response time 99 percentile.
  3. Availability: Priority to availability over consistency as it is ok if the user won't see the latest data.
  4. Durability: Have posts and comments till the user doesn't delete it.


Design: Now that the requirements are clear. We will start with our design. 

Step 1:  API design:

We will first try to come up with the Rest APIs as per our functional requirements. For this we need to first identify what are the entities in our system and try to map those to URIs

  • Users: 
    • /users
    • /users/{user-id}
  • Posts:
    • /posts
    • /posts/{post-id}
  • Images
    • /posts/{post-id}/images
    • /posts/{post-id}/images/{image-id}
    • /posts/{post-id}/comments/{comment-id}/images
    • /posts/{post-id}/comments/{comment-id}/images/{image-id}
  • Comments
    • /posts/{post-id}/comments
    • /posts/{post-id}/images/{comment-id}
  • Votes
    • /posts/{post-id}/vote
    • /posts/{post-id}/comments/{comment-id}/vote
Let's see how the post looks like: (response of GET)

{
    post_id: string
    title: string
    tags: List of strings
    user_id: string
    upvotes: int
    downvotes: int
    body: json
}

Let's see how the comment looks like: (response of GET):

{
    post_id: string
    comment_id: string
    body: json
    user_id:
    upvotes: int
    downvotes: int
}

Now that we know the entities and URIs, its time to assign the HTTP method which wll ultimately results ino our final APIs:
  • Users:
    • POST /users/ - Create/signup a new user
    • POST /users/login - Login a user
  • Posts:
    • POST /posts - Create a new post 
    • GET /posts - View posts. Response of this requires pagination as it can contain even 1 million posts so the parameters can be:
      • limit
      • offset
      • user_id (optional)
    • GET /posts/post-id - View a post
    • DELETE /posts/post-id - Delete a post
  • Comments:
    • POST /posts/{post-id}/comments - Create new comment
    • GET /posts/{post-id}/comments - View post's list of comments
    • GET /posts/{post-id}/comment/{comment-id} - View a comment
    • DELETE /posts/{post-id}/comment/{comment-id} - Delete a comment
  • Votes:
    • POST /posts/{post-id}/vote - Upvote/downvote a post
    • POST /posts/{post-id}/comments/{comment-id}/vote - Upvote/downvote a comment
  • Images:
    • POST /posts/{post-id}/images - Upload a image to a post body
    • GET /posts/{post-id}/images/{image-id} - Get an image of post
    • POST /posts/{post-id}/comments/{comment-id}/images - Upload a image to a comment body
    • GET /posts/{post-id}/comments/{comment-id}/images/{image-id} - Get an image of a comment

Step 2: Mapping functional requirements to architectural diagram:

1. Browser only application: For this we will have web app service which will serve static web pages.


2. User Sign up / login: We will have a user service and this service have its own DB to store user info which we can choose as SQL DB as it's going to be structured data and also its not going to be huge.



3.  Create/Delete a post: Again for this we will have another microservice say post service which will have it's own DB. As these number of posts are going to be huge and also post schema, we may want to vary time to time, I am going to use Nosql DB for this.

Given we can upload an image in the post, we can use any blob store to save the imges so if the post contains the image, a request will go to web app service which will upload the image to blob store and then take the image-id and put it in the post body and send a request posts service to save the post

While viewing the post, first we will get the post content using the posts service and then when the browser sees the image url, it can directly fetch from the blob store using the url.





4. Posting comments / Deleting comments: 
5. Comments are shown in list in descending order of posting time:
We can use a different microservice Comments with it's own DB but we can merge it with Post service and use the same DB. As you see comments are tightly attached to post and also their schema is almost same except comments will have a post id attached to it. We also need to add a timestamp fields as we need to show comments in sorted order.

We can opt for any approach but I am choosing to have one micro service for posts and comments and calling it Post and Comment service.

If you see we are not satisfying FR 6: Deleting posts and comments too.


7. Upvote or downvote a post or comment: For this functionality, we will have a different service say Votes service with its on DB. Now we have a choice here:

We can just maintain a schema like folllows:

Post Id 

count

Comment Id 

count 


But the problem with the above schema is 
  • We can't restrict user to just vote once for a single post/comment.
  • FR 7: Can't get popular posts of last 24 hours.
That means we can't go wth the above schema. Hence here is the schema which I am proposing:

Post_id 

User_id 

Vote (+1 / -1) 

Timestamp 

Comment_id 

User_id 

Vote (+1 / -1) 

Timestamp 

 

With this we can at least achieve the functinalties including FR 7 too. We will see the performance part when we will address non functional requirements. 



8: Home page contains the top popular posts: This is the most trickest functional requirement of this design. If you see the voting data is with Votes service and post content is with Posts service. Getting popular posts when the api call has been made could be very expensive. As this list is there in the home page, we can't risk of having delays in the page load.

To solve this issue we are going to use CQRS pattern. We are now going to introduce a Ranking microservice which will pull voting data from Votes service and post content from Post service. It then sort this data based on ranking and save this data into its own nosql DB. Now whenever the call for popular posts comes, it will be redirected to ranking service.

There are some consideration here:
  • We know that the sliding window here is 24 hours so rankng service can query only the posts which are kind of active in this window.
  • Given we don't have to show always the most recent data, we can use batch processing in the ranking service as this operation is heavy.
So here is what ranking service will do at a regular interval:
  • Get the active votes from voting service for last 24 hours.
  • Group the votes (upvotes - downvotes) by post_id. 
  • Sort the post_ids in descending order according to votes.
  • Save it in the DB.



So above is the final design of our product which is satisfying all the functional requirements. Now let's move to non functional requirements.


Step 3: Mapping non functional requirements to architectural diagram: 

1. Scalablity: Given we are talkng about millions of active users, a single instance of service won't work. Obviously we need to have multiple instances of every services and a load balancer to balance the traffic.

Another scalability issue here is the large data of posts and comments so we are going to shard the Post and comments data.

For posts we can use hash based sharding with hashing on post_id. I know this can be a bottlenect if we have to show all the posts from a particular user as posts might be distributed among multiple shards but this is not even our functional requirement. If it is added then we can still serve it and with little tweaks we can serve it effficiently.

For comments we can't shard based on comment_id as in general the comments will be fetched according to post_id and if you see if we go with comment_id, we might end up retrieving comments from different shards which is a big performance issue. 

So what we can do is we can shard the comments based on post_id using hash based sharding. This will work well but it will create problem when a post become popular it starts having too many comments then what will happen that a particular shard will become too big and can become the scalability issue and also performance bottleneck.

To handle such issue we can use a compound key (post_id, comment_id) and we can apply range based sharding. If we use it, mostly we will get data from one shard or at most two shards.

In this way we can handle the scalability.



2. Performance: There are multiple ways to make this system performant.
  • Image load time can be greater while fetching from the blob store. To avoid that we can use CDNs where we can store images of popular posts. We can also use CDNs to serve the static html pages too.
  • We can use cache to store the most popular posts and it's content in order to avoid traffic on GET requests of posts which are ultimately be directed to Posts service or Ranking service. Even the cache is not up to date, it is fine as we have chosen availability over consistency so eventual consistency is just fine.
  • Index on post_id for post collection and compound index(post_id, comment_id) on comment collection really can help on speeding up the Get requests. We can do the similar indexing on other DBs too.
  • Now another problem is, for a post/comment to load, we also need to show the upvotes and downvotes of the post or comment but votes data are there in Votes service DB so to load the post we need to call two microservice. We can take following steps to make it faster to fetch all post data or comment data including votes from one microservice only:
    • Two fields will be added to post and comments collection schema:
      • upvotes_count
      • downvotes_count
    • Queue the voting event from Votes service to Posts service so post service worker will fetch the events and can bulk update the DB with upvotes and downvotes count.
I think we are good with the performance point of view and now our system looks like following:



3. Availability: To achieve the availability we can replicate the databases and our services. We can also have our system running and replicated in different regions. This will also boost the performance.

4. Durability: We are already achiving the durability when we replicated the data accross regions. We can still periodically backed up our data to cheaper storage like S3 in order to have backups. This can also help us to cleanup old data from DBs to boost the performance if required.

Now that we have handled our every functional and non function requirements. Here is our final design:




That's all for this system design problem!