Dropbox

the educative course seems to have been copied from here: https://chainertech.com/post/system-design-let-s-make-a-cloud-storage

1. Why (Brief)

Availability: your data, anywhere, anytime
Reliability and Durability: 100% reliability (because its a lot harder for the cloud to die vs your phone/laptop hdd)
- cloud services always have backups of servers in different locations
Scalability: never run out of storage space (if you pay)

2. requirements (given + clarified)

users should be able to upload and download their files/photos from any device
users should be able to share files or folders with other users
our service should support automatic syncing between devices (updates on one device should be reflected on all other devices) - broadcasting
the system should support storing large files up to 1 GB.
ACID is required. Atomicity, Consistency, Isolation and Durability of all file operations should be guaranteed.
our system should support offline editing. users should be able to add/delete/modify files while offline, and as soon as they come online, all their changes should be synced.

extended requirements

the system should support snapshotting of their data, so users can go back to any version of the files (like git)

some design considerations (initial thoughts on the system)

expect huge read and write volumes
read / write ratio is expected to be the same
internally, files can be stores in chunks (of about 4MB). this can provide a lot of benefits:
- failed operations shall only be retried for chunks and not the whole file
- if the user upload fails, we can continue from the failed chunk.
we can reduce the amount of data exchange by only transferring updated chunks
we can remove duplicate chunks (to save storage and bandwidth usage), because each chunk is guaranteed to be the same.
keeping a local copy of the metadata (file name, size etc) with the client can save us round trips to the server <span class="text-highlight">dunno why yet</span>
for small changes, the client can intelligently upload the diffs instead of a whole chunk. (again, like git)

the atomic unit for this system is a chunk.

high level design (how it will work)

user specifies folder as a cloud workspace on their device.
any file/photo/folder place in this folder will be uploaded to the cloud storage
any modifications / deletions will be reflected on the cloud storage
user can specify similar workspaces on all their devices
any changes on one device will be synced to all other devices

what we need to store

files
metadata info about the files
- file name
- file size
- directory . etc..
- who the file is shared with

what services we need to manager them

servers to help upload/download files to cloud storage
servers to help update metadata about files and users
some mechanism to notify all clients whenever an update happens so they can synchronize their files (publish-subscribe messaging pattern, like firebase realtime database)

an example a high level diagram for dropbox.

![./images/dropbox-high-level.jpg](dropbox, copied from educative.io)

in this instance, a 'block server' is a server that is even more bare metal than an object/file server: it stores data as blocks. a file system / operating system is needed on top of it to manage the data in the block server. think of it as a hdd without a format (it is neither formatted for windows of ios. its just raw hdd.)

when compared to file storage or object storage, block storage has the highest performance and flexibility, at the cost of complexity and cost (block storage is more expensive)

3. capacity estimation and constraints

assumptions

500M total users, 100M daily active users (DAU)
on avg each user connects to 3 different devices
on avg each user has 200 files/photos
on avg each file size is 100KB <span class="text-highlight">this does not sound plausible</span>

file size references

ref 1 ref 2

from the above reference, it seems an average user's file is more likely to be around the 400KB - 3MB range. let's be conservative and say 2MB

ballpark numbers

from the above, we will have 100 billion total files
- 500M * 200 = 100 billion
for an avg file size of 2mb, we will need:
- 2MB * 100 billion = 200 petabytes
- in 2020, there are 2000 exabytes of data on all cloud services source. that's 10,000x more data then our system
we will also assume 100 million active connections per minute

type	unit
total users	500 million
files per user	200
avg file size	2MB
total files	100 billion
total storage	200 petabytes

Dropbox

1. Why (Brief)

2. requirements (given + clarified)

extended requirements

some design considerations (initial thoughts on the system)

high level design (how it will work)

what we need to store

what services we need to manager them

3. capacity estimation and constraints

assumptions

file size references

ballpark numbers

4. System APIs

5. Database Design

6. Basic System Design and Algorithm

7. Data Partitioning and Replication

8. Caching

9. Load Balancing

10. Telemetry (analytics)

11. Security and Permissions