On facebook, you may have noticed that you can set different privacy levels for the posts you published to be only to a specific group of users, like public, friends, friends of friends, etc
Key Features
1. Enable a user to specify the different levels of privacy for a post so that it is only visible to a particular set of users on Facebook
2. For simplicity, implement two levels of privacy , public or Friends, more complexity levels: friends of friends, and custom groups.
Design Goals
- Minimum latency and high consistency
- Due to CAP, lower the availability in the interest of high consistency
- Read-heavy service
Scale Estimation
- One billion daily active users(DAUs)
- Privacy level will be set by users for about two billion posts daily.
- Check the privacy level of about 100 billion posts daily so that they can be made visible to the correct users.
API Design
1. /setPrivacyLevel: api to set the privacy level for a post,
Params: user_id, post_id, privacy_level_enum, and timestamp
2. /canView: api to check if a user can view a particular post or not.
Params: viewer_id, post_id, and timestamp.
High Level Design
To define the different privacy levels, we will user the Enum datatype: 1. public, 2. friends, 3. friends of friends, 4. customized group. This enum will be stored a long each post in the database.
- To check if a post is visible to a specific user or not, review the privacy level enum:
– If it is set to Public, then it will be visible to everyone
– If it is set to Friends, we will check if the current user is a friend of the post’s owner, if yes, then the post will be displayed on the UI. - Store the friends data of the users in a key-value store.
– Key will be the user_id, and the value will be a set of all the friends that user has.
– Since this data will be massive, it will be shared across multiple servers using the user_id as the key
– Also need to discuss what will happen if a shared dies and create a fault-tolerant design. - Friends of Friends
– Given the scale of facebook, we can not enumerate through all the friends of friends as it will be extensive list and increase the system’s latency.
– One possible way to solve this challenging problem is to perform an interaction of the post’s owner and the viewers friends lists.
– if the intersection results in a non-empty set, then the post can be displayed.
(from: https://www.youtube.com/watch?v=-LjbhtswNwE&t=335s)
Scale the K-V Database
1. Sharding: Distributing data across multiple machines is crucial for scalability. Sharding can be based on user ID or other unique identifiers, ensuring an even distribution of data and workload.
2.Cluster master-replica model: To remain available when a subset of master nodes are failing or are not able to communicate with the majority of nodes, let KV Cluster uses a master-replica model where every hash slot has from 1 (the master itself) to N replicas (N-1 additional replica nodes). (source: https://redis.io/docs/management/scaling/)
Redis Cluster automatically manages sharding and replication. It distributes data across multiple nodes, and each shard’s data is replicated to one or more other nodes.
In the event of a node failure, Redis Cluster can continue to operate as long as at least one replica of the data is available.
Redis Cluster also provides some degree of automated failover. If a primary node fails, one of its replicas is automatically promoted to be the new primary.
+-----------------+ +-----------------+ +-----------------+
| Shard 1 | | Shard 2 | | Shard 3 |
| Primary Node | | Primary Node | | Primary Node |
+-----------------+ +-----------------+ +-----------------+
/|\ /|\ /|\
| | |
| Replication | Replication | Replication
| | |
\|/ \|/ \|/
+-----------------+ +-----------------+ +-----------------+
| Replica Node 1 | | Replica Node 1 | | Replica Node 1 |
+-----------------+ +-----------------+ +-----------------+
+-----------------+ +-----------------+ +-----------------+
| Replica Node 2 | | Replica Node 2 | | Replica Node 2 |
+-----------------+ +-----------------+ +-----------------+