Problem: Design a highly scalable rideshare service.
Requirements:
1. Functional requirements:
- Only registered users can request a cab.
- Driver registration is manual and is out of scope here.
- Driver and user needs to login to request and receive the ride.
- Driver needs to make himself available to receive the rides.
- Each driver can be matched to multiple riders depends on users and the occupancy of the car.
- Price of the ride depends on base flat fee, distance and duration.
- Exact formula is not given and is subject to change
- We need to minimize the:
- Wait time of users
- Distance / Time for drivers to pickup the user.
- Collect payment information from users.
- Collect bank information of drivers to pay them.
- Driver should start/end the trip.
- For actual payment process, we can use third party services.
2. Nonfunctional requirements:
- Scalability:
- 100 M to 200 M daily users.
- 100 K to 200 K drivers.
- Peak hours support where demand is 10 times of average.
- Collect driver's location every ~5-10 seconds =~ 3B messages / day
- Availability:
- 99.99% uptime or higher
- Performance:
- Match users with drivers 10 seconds at 99 percentile
- AP over CP:
- Higher availability is required here over consistency.
Design:
Step 1: API Design:
This looks like a very complex system which makes it even more difficult to come up with the all the APIs which are needed for the system to work.
However we will use sequence diagram to make it simple which we are doing in our previous design problems:
Just one thing to notice, given the sequence diagram, we can see the need of having a bidirection connection between server & drivers and also server & riders.
From the sequence diagram we can now have the clear APIs:
Drivers:
- POST /drivers/login
- POST /drivers/{driver_id}/join
- POST /drivers/{driver_id}/location
- POST /trips/{trip_id}/start
- POST /trips/{trip_id}/end
Riders:
- POST /riders/ - Register a rider
- POST /riders/login
- POST /rides
Step 2: Mapping functional requirements to architectural diagram:
1. Only registered users can request a cab.
2. Driver and user needs to login to request and receive the ride.
7. Collect payment information from users.
8. Collect Bank information of drivers to pay them.
To serve these requirements 1, 2, we can have a single User service. User service will have a SQL DB which contains two tables:
Rider:
- rider_id
- username
- password
- age
- image_url
Driver:
- driver_id
- username
- password
- license
- vehicle
- image_url
For the profile image we will store the image in object store and put the image url in DB.
To support requirement 7 and 8, we will have a separate service Payment Service which will have it's own SQL DB. This DB will contain the RiderPaymentInfo table and DriverPaymentInfo table.
RiderPaymentInfo:
- user_id
- credit_card
- billing address
DriverPaymentInfo:
- driver_id
- bank_name
- account_number
This payment service will connect with credit card companies / banks for the payment processing.
Here is how our architectural diagram looks like:
3. Driver needs to make himself available to receive the rides:
For this Driver needs to call join API and keep sending it's location. We need to have a new service say Driver Service which will create a bidirection connection using web sockets when Driver call the join API. This will have following benefits:
- Driver needs to send location very frequently so it will remove the overhead of TCP connection every time.
- Server can also push message to driver such as pick_up rider etc.
To record these location we will use a new service say Location Service. Driver service will continue to queue these locations to Location service. We will use a write performant LSM tree based NoSQL DB to store these locations updates. This will contain driver_id and location like lat and long and we continue to update this record until the driver is booked for a trip.
6. Rider can request the ride and the driver must be matched with minimum wait time:4. Each driver can be matched to multiple riders depends on users and the occupancy of the car:
To request a ride, we will have a new service called Rider Service. Upon requesting the ride, a bidirectional websocket connection has been stablished between rider and this service as we need to send the updates to the riders.
To match a rider to a driver we will have a new service Matching Service. Let's double click into this matching service to see how to match a driver.
First approach which directly comes into our mind is simple:
- Take the lat, long of rider.
- Draw a circle centered at rider's location of radius of 2-3 KM and try to find drives within this circle
- Given the location of drivers is already present in Location Service DB, we can use this DB to get the list of drivers.
- Take the driver with the least distance.
However the direct distance of two points is not good measure because of road structures, traffic etc.
Given that the ETA calculation is complex and is not our functional requirement, we can use a third party ETA calculation service. With this third party service here is how our flow changes:
- Matching service sends the rider location to Location service.
- Location service finds all the closed drivers as per our previous approach of 2-3 KM radius circle.
- Location service now sends the list of pairs {user location, driver location} to third party service.
- Third party service sends the ETA of every pair of locations to Location service.
- Location service send these pairs {driver_id, ETA} back to Matching service to make the final decision.
There are few considerations for matching the driver based on that we can match the driver:
- Match driver with the lowest ETA
- Math the driver who is about to finish the trip nearby rider.
- Allow multiple riders to share a ride
- Prioritize drivers whose rating is high for high rated rider.
- Permium service
- Prioritize drivers who have earned very less.
Once the Matching service matches the driver, it needs to store this info somewhere. For this we will have new service say Trip service with it's own SQL DB which will contain source and destination, user_id, driver_id, fare, start_time, end_time, distance etc. Here are the next set of steps:
- Create a new Trip using the Trip service. Get the trip_id in response.
- Send the signal to Location service to tell it that Driver is booked. It will also send the trip_info. The trip_info will contain trip_id and rider_id for sure.
- Location service will add two new columns: Driver state and Trip Info.
- Matching service now send the trip info and driver details including location to Rider service which will push this info to Rider.
- Similarly Matching service send the trip info and rider details including location to Driver service which will push this info to driver
- Now when driver send the location to Driver service, Location service append this record to it's DB and also send this location update to Rider service which will update the rider.
You can see how much this web socket connection is useful in these scenarios.
9. Driver should start/end the trip:
Upon reaching the source/destination or where the user wants a drop, driver will start/end the trip by calling the APIs. This will be redirected to trip service. Now Trip service will notify Location Service to change the status of driver as In_Trip / Free.
Trip service can get the details of locations from Location Service to calculate the distance and everything else for the payments etc.
5. Price of the ride depends on base flat fee, distance and duration:
10. For actual payment process, we can use third party services:
Let's come to the Post trip part where we need to collect the payment from rider and send driver fee to driver account. We also need to send this info as an Email to rider and driver.
Here is the flow:
- Trip servivce will calculate the ride cost and driver fee.
- Trip service will queue the trip_info including user_id, driver_id and payment_info to Payment Service.
- Payment service will interact with third party service to mak the payment.
- Once the payment is successful, It will queue two messages; one for user and one for driver including trip info and their corresponding payment_info to new service Notification Service.
- Notification service will send email to driver and user.
That's all about the functional requirements
Step 3: Mapping non functional requirements to architectural diagram:
1. Scalability:
Given the traffic we need to run multiple instances of every service. Even the notification service is not public facing service but still at the time of peak, we might need to scale up this service too.
Now the problem is about the Driver service and Rider service as these are having web socket persistent connections. How the location service will come to know which instance to connect to send driver location to rider or how Matching service will know which instance to connect to send trip info to driver.
To solve this issue, we will have a new service called Connection Manager which will have its very efficient key value pair NoSql DB like Redis to maintain this mapping. Now Location Service or Matching Service can query Connection Manager Service to get the right instance.
For this we just need to add replica of our DBs in order to achieve availability.
3. Performance:
We need to match Rider to Driver with 10 seconds. Let's see what are major steps we have for match:
- Get the closest drivers list to the rider.
- Get the ETA of these drivers from third party service.
- Take the Driver with the lowest ETA.
If you see we can't do anything about step 2 and step 3 is fairly simple. Let's see how we can get the list of closest drivers.
We need to calculate the distance of every free/in_trip driver with the user's location which is fairly expensive. Say somehow we are able to apply certain index on Location DB over Lat and Long (I am not sure how we will use it), it still be very expensive.
Also these floating point calculations are little time consuming for computers. It looks like we won't able to achieve our ETA.
We will use Geohash here to divide the earch into geo cells. Now these geo cell ids will be stored in the Location DB as per the driver locations. That means we are going to add a new column geo_cell_id in the Location DB.
Now the circle which we are drawing centered at User's location can have around 3 or 4 geo cells. Here is how the flow looks like for the step 1 of getting the closes drivers:
- Get the geo cell id as per the user's location.
- Get all the geo cell ids which is covered by 2 KM radius circle
- Now the query to DB will be something like SELECT * from Location where geo_cell_ids in [list of geo cells]
To make the above query we can shard the Location DB based on geo_cell_id or create a hash index over it.
That's all about the performance. Now that we have covered every functional and non function requirements, here is our final architectural diagram:
I know the diagram became a bit messy so I'll improve it later!