Monday, October 28, 2024

Design Youtube

Problem: Design a highly scalable video on demand (VOD) streaming platform.

Requirements:

Before we jump into requirements, we should realize we have two different types of users:

  • Content creators
  • Viewers

If we try to observe how they use our platform, we can see the requirements are different both functional as well as nonfunctional requirements.

1. Functional requirements:

a. Content creators:

  1. Upload any format/codec video.
    • Once the video is uploaded, it can be deleted but not be modified.
  2. Each video should have metadata:
    • Mandatory: title, author, description.
    • Optional: List of categories / tags.
    • Metadata can be updated anytime.
  3. Get an email notification when the video is available publically.
  4. Live streaming is not supported.

b. Viewers:

  1. Only registered users can view the content.
  2. Can search using free text in all of video metadata.
  3. Can watch vidoes on any kind of device (desktop / phone / TV) and network conditions.

2. Nonfunctional requirements:

a. Content creators:

  1. Scalability:
    • Thousands of content creators.
    • Upload 1video/week per creator
    • Average video size ~50 GB (50 TB / week)
  2. Consistency:
    • We prefer to have consistency here over availability.
  3. Availability:
    • 99.9 percentile
  4. Performance:
    • Response time of page load < 500 ms at 99 percentile.
    • Video becomes available to view in hours.

b. Viewers:

  1. Scalability:
    • 100K-200K daily active users
  2. Availability:
    • 99.99 percentile
  3. Performance:
    • Search results and page loads < 500 ms at 99 percentile
    • Zero buffer time for video play


Design:

Step 1: API design:

As usual, for the api design let's first translate our functional requirements in a sequence diagram which should look like follows:




Now we can easily identify the entities and URIs using above sequence diagram:

  • Users
    • /users
  • Videos:
    • /videos
    • /videos/{video_id}
  • Search:
    • /search

Now let's put HTTP method:

  • Users:
    • POST /users: Create / register user
    • POST /users/login: Login a user
  • Videos:
    • POST /videos: Start upload a data
    • PUT /videos/{video_id}/metadata: Create/update metadata of video
    • GET /videos/{video_id}: Get the video url
    • POST /videos/{video_id}/play: Send the video content to client
    • DELETE /videos/{video_id}: Delete a video
  • Search:
    • GET /search


Step 2: Mapping functional requirements to architectural diagram:

a.1 Content creator can upload any format/codec video:

b.3 Viewers can watch video on any device and at different network conditions:

I am taking both of these requirements together as these are inter related. We need to upload any video in such a manner that it can be supported on any device and adapted to different network conditions to view.

Let's first understand what is a video file. 

The video file is a container which contains:

  • Video stream
  • Audio stream
  • Subtitles
  • Metadata like codec, bitrate, resolution, frame rate

The binary representation of these containers like mpg/avi/mp4 can be different based on how these streams are encoded and algos to encode and decode these streams are called codecs like H.264 or AV1 or VP9 etc.

The video captured using a camera is encoded based on lossless compression algorithm which makes it suitable for professional editing but it is too big in size so this kind of codec is not suitable for streaming and storage at scale so the first step is too apply a lossy compression algorirthm. This method of converting a encoded stream to a different encoded stream is called Transcoding.

Size of a video = Bit rate (bits/second) * Video length (seconds)

So this is obvious that we need to reduce the bit rate in order to reduce the size of video but given that we need to support different devices and different network bandwidth. We can't depend on only one bit rate. We need to support multiple outputs file supporting different bit rates which is directly proportonal to resolutions. The standard resolutions are 360p, 640p, 720p, 1080p and 4K.

This will partially take care of supporting multiple devices but it won't support varying network conditions like home network getting used by multiple users or the user is travelling. To support this requirement, we will use technique called Adaptive bit rate or Adaptive streaming.

In Adaptive streaming, we break our stream into multiple small chunks of size say 5 seconds or 10 seconds. We put references to all of these streams in a text file called manifest(mpd). When player tries to play the video. It first download this manifest file and then choose a default resolution (say 720p) and play it's say first 4-5 chunks to analyze how the network conditions are. If the network conditions are better it switches to better resolution say 1080p or even 4K. It there are download delays then it goes for lesser resolution chunks. Player keeps analysing the chunks download speed to decide which resolution to go for.

The next step is to fully support every kind of devices. For this we need to package our video content to support different streaming protocols. Different OS / browser supports different protocols. Here we can also apply DRM(Digital Rights Management) to protect our video in order to support FR# b.1 Only registered users can watch the video. Using DRM we can also support subscrption when we want to intoduce it.

Now if you see there are steps which we need to take in order to upload the video and making it available for different devices and different locations / network bandwidth. We will use pipes and filter pattern here to support it. 

We will have a Video service as public interface for this whole activity. This service will have it's own DB which is NoSql DB in order to support fluid schema. So here is the flow of video upload:

  1. Content creator call Video service to upload video.
  2. Video service will start the upload to object store asynchronously and save the metadata into it's DB and return the confirmation to user with video id.
  3. Once the video upload is complete to object store, it queues this message to a new service Transcoding service.
  4. Transcoding service first convert this video to 5-10 seconds chunks and transcode each chunk into multiple resolution and upload it into it's object store. It also generates manifest file for adaptive streaming.
  5. Transcoding service now queue the message to new service say Packaging service.
  6. Packaging service package these streams according to streaming protocols and save it into it's own object store.
  7. Package service now queue the video_id and video_url(which is ultimately manifest download) back to Video service and video service updates it's DB using the video_id.  

We can debate over a point on Transcoding service where we can propose a new service to break the uncompressed video in chunks, queue these chunks to let transcoding service just transcode these chunks in parallel. But that's what we can achieve using multithreading in the transcoding service itself. In that way it will be much easiser to debug issues and also much easier to support the restartabilty as all these chunkings and transcoding will happen in just one service.

We can support FR# a.3 Notify creatore when video is publically available here only by adding a new service say Notification service. Video service will queue the video details to this service. Notification service now can send the notification (email) to content creator.

With this knowlege let's see how our architectural diagram looks like.



a.2 Update the video metadata:

a.1 Delete the video:

User can simply call the video service to perform these opeations so now here is how the diagram looks like



b.2 Viewer can search for the video against the video metadata:

To support search we need to have a different service say Search service whose DB is optimized for search like Elastic search. Video service can queue the metadata to this service so you can assume it will also become the part of video upload / update of metdata / deletion of metadata. 

We also need to use pagination here.



b.3. Watch video on any device:

We have already done enough to support this requirement. Client first need to call Video service to get the video_url. Client then downloads the manifest file and then client/player will directly stream the chunks of video from object store as per adaptive streaming which I have already explained.


 So now we are done with every functional requirement we have.


Step 3: Mapping non functional requirements to architectural diagram:

a.1 Content creator scalability:

There is not much to do in terms scalability for the first scalability requirement as the frequency of video upload is not much (1/week/user) That means ~10K video uploads in a week. However it can happen that at a particular time we can get thousands of upload requests. To support those we can have multiple instances of Video service and Web app service. 

To tackle the video size we have already created a pipeline to compress the video size but there is still one problem; 

If we take the whole video content to first video service and then upload it to object store, this whole process will consume lot's of resources and as it can take hours to upload uncompressed video, we might end up scaling video service too much. 

To resolve the problem we can use presigned urls of object store. Presigned urls are the urls with limited permissions and limited time. With presigned urls, the flow of video upload looks like as below:

  1. Client send request for video upload to video service.
  2. Video service requests presigned url from object store with it's own permissions.
  3. Send the presigned url as response of video upload API to the client.
  4. Client now directly upload the video to object store using presigned url. 
  5. Rest of the pipeline remains same.

With this flow, we can see Video service doesn't have to scale and we will save lot's of resources so now with these changes here is how architecture diagram looks like:




a.2 Content creator availability:

We have already take care of the availabity using the multiple instances of web app and video services. We can use the cloud native services for video transcoding and packaging like AWS Elemental MediaConvert and AWS Elemental MediaPackage to support the availability.

We can replicate the Video service DB to support the availability.



a.3 Content Creator Performance:

We have already taken care of the performance as there is not much to do. However the only problem is when all/many content creators try to upload videos at the same time. In such scenario, we might not able to complete the video upload pipeline in hours. 

We need to parallelize this process. That means we need to have multiple instances of Transcoding service and Packaging service.




a.4 Content Creator CP over AP:

To achieve this we just need to choose the right Video service DB or DB's configuration which supports consistency over availability. That's all!


b.1 Viewers Scalability:

To support this 100K - 200K user visits we have already scale our web app service and video service but to scale the search functionality, we need to have multiple instances of search service. This will take care of the scalabilty of service. 

As elastic search is not cloud native, we can use AWS opensearch / Elastic cloud on AWS which autoscale itself.




b.2 Viewers Availability: 

We have done mostly everything for the availability but as here our availability requirement is high. We can use muti region deployment and a global load balancer too. This will also help with the performance.


b.3. Viewers Performance:

We are using adaptive bitrate streaming for the zero buffer time but as you see still we need to download intial chunks, we need to go to object store which might be expensive so we can use CDN to provide it. We can have initial chunks in the CDN to increase the performance of download and then the client can use adaptive streaming to go for the right chunks.

Please note that we can put the whole video too on the CDN which will definitely improve the performance and the quality of the video but it can be very expensive so I am still opting for initial chunks of videos.





 

b.4. Viewer AP over CP:

We are already using lot's of async operations / message broker which guarantees availability but eventual consistency. For search we are already receiving metadata updates using queue and also Elastic search provides AP over CP so this requirement is already satisfied.


With this we are done with every functional and non functional requirements and here is how our final architectural diagram looks like:





That's all!

* Please note that here we should have a User service to support user login and registration but that's very obvious so I have not discussed it here. Given the user's volume is in hundreds of thousand, we don't need to do many things.

No comments:

Post a Comment