Friday, October 18, 2024

Design Instagram

Problem: Design highly scalabale image sharing social media platform like instagram.

Requirements:

1. Functional Requirements:

  1. Only registered user can access the platform. They need to provide following info while registering:
    • Mandatory: first name, last name, email, phone number, profile image, password
    • Optional: age, sex, location, interests etc.
  2. User can share/post only images
    • We need to design it in a way to extend it to videos or text.
  3. Search a user using different attibutes
  4. Unidirectional relationship: User A follows User B does not mean User B also follows User A.
  5. Load a timeline of latest images posted by people they follow sorted by recency in descending order
 

2. Non-functional Requirements:

  1. Scalability:
    • ~1-2 billion active users
    • 100-500 million visits / day
    • Each user uploads ~1 image / day
    • Each image size ~2 MB.
    • Data processing volume: ~1 PB / day
  2. Availability: Prioritize availability over consistency as it is okay even if user does not see the latest data. We are targetting 99.99% here.
  3.  Performance: 
    • Response time < 500 ms at 99 percentile.
    • Timeline load time <1000 ms at 99 percentile.


Design:

Step 1: API design:

For the api design, let's first translate our functional requirements in a sequence diagram which should look like follows:


Now if you see above, we can easily identify the entities and URIs:

  • Users
    • /users
    • /users/{user-id}
  • Images
    • /images
    • /images/{image-id}
  • Search
    • /search
Now let's put HTTP methods:

  • Users 
    • POST /users - Create user / register user
    • POST /users/login - Login a user
    • POST /users/{user-id}/follow - Follow a user
    • DELETE /users/{user-id}/follow - Unfollow a user
    • GET /users/{user-id}/timeline - Get user time line. This requires pagination.
  • Images
    • POST /images - Post an image
    • GET /images/{image-id} - Get an image
  • Search
    • Get /search - Search a user


Step 2: Mapping functional requirements to architectural diagram:

Before we move to our main functional requirements, I want to add a service say "Web app service" to serve the html pages since we have support on desktop web browser too.

1. Registering user to our platform:

We will have a User service with it's own DB which will help with login and registration of users. User service database will have following fields as per FR#1:

  • user_id
  • user_name
  • first_name
  • last_name
  • email
  • password
  • phone_number
  • profie_image_url
  • and some optional fields like age, sex, interests, location etc.
If you see we have so many optional fields which can change with time like some optional fields can be removed and some can be added. That means our schema is fluid and that's why we are going with NoSql Document DB. Given we have around ~1-2 billions users, these schma changes become an considerable overhead.

In the schema we are storing the profile image url instead of profile image because DBs are not optimized for blob storage so what we will do is we will store the profile image into a blob store or object store and then save the image url in our DB.

Once the registration is completed, client will get user id and auth token which will help client with further operations.



2. Post an image:

Let's have a post service for this service, we are not calling it as image service so that we can extend it to any type of post. This service will have it's own DB which contains the following fields:

  • post_id
  • user_id
  • post_url
  • timestamp
  • post_type - This will be by default image but can be extended to video etc.
Since this is a fixed schema, I am going to use SQL DB for storing this info. Here is the flow of image upload:
  1. User send the request to Post service to share an image
  2. Post service send the image to object store
  3. Object store return the url of uploaded image
  4. Save the image metdata and url in the DB
  5. Return confirmation to user.


3. Search users using different attributes:

We can use the same user service to search the users but if you see this DB is not optimized for search As we don't know in advance what all the attributes are searchable or what kind of search we are going to support, it's better if we use a different service say Search service with DB which is optimized for search scenario like elastic search or other lucene based DB.

So now whenever there is a new user added or there is an update in user's record we can queue it to the search service. Search service will updates it's DB. 




4.: Follow/Unfollow user:

We can create another service to handle follow/unfollow activity but for me it's not that much useful and is unnecessary. We can use existing User service only, we just need to create another collection with following fields:

  1. follower_user_id
  2. target_user_id 
  3. target_user_name



5. Loading the timeline of the user:

That's the most complex problem. Within the current design here is how we can achieve it.

  • Client send the request to User service
  • User service gets all the users who the user is following.
  • It will then query the posts of all the users sorted by timestamps with pagesize of 20-50 using the Post service.
  • Send page size number of posts sorted based on timestamp to the client.
  • Client can then dowload the images using the post_urls from object store.
This will definitely solve the problem but it is very inefficient. I know we are not solving the non-functional requirements now but we should think about the performance.

To make it efficient, we will use CQRS pattern. We will have a new service called Timeline service. This will have an efficient key value pair DB where key is the user_id and value is the list of post records of all the users followed by the user with user_id(key). Every post record will have {post_url, user_id, timestamp}

Now here is what we do with this new service:

  • POST service will queue the new posts containing post_url, user_id, timestamp to Timeline service.
  • Timeline service will take the user_id and get it's followers from User service.
  • It then add this post to the front of all the followers' list of post records. It can remove the posts if the list is have more records than what we need to show in the timeline.
With this new service, our flow of showing timeline will become straight forward:
  • Client requests Timeline service for the timeline.
  • Timeline service returns the list of post records from it's key value db with key as requested user's user_id.
  • Now client can download the images from object store using post_url.
This will be eventual consistent but that's okay as per our non functional requirements. This will be much efficient than the older design.





Step 3: Mapping nonfunctional requirements to architectural diagram:


1. Scalability: 

If you see we have following scalability requirements here:

  • Number of users: We have 1-2 billion users data which will be huge so we can't rely on just one DB instance so we have to shard the User service DB. We can shard using hashing technique:
    • User DB: Shard on the basis of user_id.
    • Follower DB: Shard based on target_user_id as our main case is to get the followers which is a call from Timeline service.
  • For search we are already using elastic search and we can shard it too.
  • Number of visists: To support these number of visits, we have to run multiple instances of different services behind load balancer.
  • Number of posts: This has two parts:
    1. Data in the DB: This will be huge also so we have to shard Post DB as well as Timeline DB. Post DB can be sharded using post_id and Timeline DB using user_id.
    2. Image Data: As per requirement we need to save petabytes of data because we are getting uncompressed image. Not only these images will take lots of space but also these are not optimized for viewing on Mobile device or browser. We can introduce an async image processing pipeline like AWS severless image handler to compress these images which will convert this petabytes of data into TBs or even GBs.



2. Availability:

By using multiple instances of our services behind load balancer, we have almost achieved the availability. We can have replicas of our DBs to achieve the availability requirements. Additionally we can do multi region deployement and have a global load balancer in case one region is down. It will also increase the performance.



3. Performance:

We have two performance benchmarks:

  • 500 ms for every page load: To achieve it we can use CDNs to store html pages, CSS and post images.
  • 1 second for timeline page load: We have already made a decision to use CQRS pattern and made Timeline service to serve the timeline page. This will take care of the performance part but it will create a different problem. There are few thousands of users whom we call celebrity / influencers and millions of users follow them. In case if a celebrity user makes a post, millions of entries of Timeline service DB will be updated which might slow down the DB and hence the performance. To tackle this situation we can make following steps:
    1. Define which user is celebrity; Say a celebrity has 1 M followers.
    2. Make a column in User DB - IsCelebrity which will tell if the user is a celebrity or not.
    3. When celebrity post an image and Timeline serivec calls User service to get the followers list. Instead of returning followers list, User service will return IsCelebrity = true.
    4. Another Key-Value pair say Celebrity POST DB will be added to Timeline service DB which where key is celebrity_user_id and value is sorted list of post records.
    5. When Timeline service receive IsCelebrity as true. It just add the post into Celebrity POST DB.
    6. While loading the Timeline, Time line service get the list of celebrities from User service which the user is following using new REST endpoint on User service say GET /users/{user-id}/celebrity.
    7. Timeline service then merge the sorted results it gets from celebrity key value pair and user key value pair and returns it to user.
And that's all about the performance. Now that we have addresses every requirement functional or non functional, here is our final architecture diagram:



Have fun!


No comments:

Post a Comment