Designing a system like Instagram involves several key components and considerations. Here’s a high-level overview of the elements and architecture you would need to consider:
Requirements:
- Functional:
- User should be able to upload an image/video
- User should be able to view images/videos of other users followed(from feed)
- User should be able to follow other users
- User can perform search for an image/video based on tag/title
- Notification service
- Non-Functional
- Provide scalability, high availability and fault-tolerance
- Good user experience(low-latency)
Capacity Estimation
Assume we have 100 million users every day, and 20% of the users, about 20 million upload 2 images per-day, that about 40 million images, about 40,000,000/86400 ~= 462 images per second.
Assume each image take about 200KB, that’s about 200kb * 40,000,000 = 8 TB per day.
This is a ready-heavy system.
API Design
Upload an image
Upload an image to the users timeline: POST /api/images/upload with request body:
{
"user_id": "12345",
"image": "<binary image data or image URL>",
"caption": "Optional image caption",
"tags": ["beach", "sunset"]
}
Fetch User Feed
- Endpoint:
GET /api/users/{user_id}/feed
- Description: Fetches the feed for the specified user.
- URL Parameters:
user_id
: Unique identifier of the user.
- Query Parameters (optional):
limit
: Number of posts to return.offset
: Pagination offset.
Follow a User
- Endpoint:
POST /api/users/{user_id}/follow
- Description: Follows another user.
- URL Parameters:
user_id
: Unique identifier of the user to follow.
- Request Body: { “follower_id”: “12345”}
Unfollow a User
- Endpoint:
POST /api/users/{user_id}/unfollow
- Description: Unfollows a user.
- URL Parameters:
user_id
: Unique identifier of the user to unfollow.
- Request Body:jsonCopy code
{ "follower_id": "12345" }
Search Images
- Method:
GET
- URL:
/api/images/search
- Description: This endpoint allows users to search for images based on various criteria.
- Request Parameters
tag
: Search by tags associated with the images.uploader
: Search by the user who uploaded the image.date_from
anddate_to
: Search by the date range of when images were uploaded.keyword
: Search by keywords in the image description or title.
Data Schema Design
Designing a data schema for images uploaded on Instagram involves defining the structure of how image data and its associated metadata are stored.
Table: images
This table stores the core data related to each image.
Field Name | Data Type | Description |
---|---|---|
image_id | VARCHAR or UUID | Unique identifier for the image. |
user_id | VARCHAR or UUID | Identifier for the user who uploaded it. |
upload_time | TIMESTAMP | Time when the image was uploaded. |
image_url | VARCHAR | URL where the image is stored. |
thumbnail_url | VARCHAR | URL of the image thumbnail. |
descript ion | TEXT | Descript ion provided by the user. |
location | VARCHAR | Location tagged in the image (optional). |
privacy_setting | ENUM(‘public’, ‘private’, ‘friends’) | Privacy setting of the image. |
Table: image_tags
This table manages the tags associated with each image.
Field Name | Data Type | Description |
---|---|---|
tag_id | VARCHAR or UUID | Unique identifier for the tag. |
image_id | VARCHAR or UUID | Identifier for the associated image. |
tag | VARCHAR | The tag text. |
Table: image_likes
This table records the likes each image receives.
Field Name | Data Type | Description |
---|---|---|
like_id | VARCHAR or UUID | Unique identifier for the like. |
image_id | VARCHAR or UUID | Identifier for the associated image. |
user_id | VARCHAR or UUID | Identifier for the user who liked the image. |
like_time | TIMESTAMP | Time when the image was liked. |
Table: image_comments
This table stores comments made on images.
Field Name | Data Type | Description |
---|---|---|
comment_id | VARCHAR or UUID | Unique identifier for the comment. |
image_id | VARCHAR or UUID | Identifier for the associated image. |
user_id | VARCHAR or UUID | Identifier for the user who commented. |
comment | TEXT | The comment text. |
comment_time | TIMESTAMP | Time when the comment was made. |
Main Components
Load Balancer
Distributes incoming application traffic across multiple servers, improving responsiveness and availability. Manages high user traffic, distributing requests to API servers, ensuring no single server is overwhelmed.
API Gateway
A single entry point for all client requests, providing an abstraction layer over internal services, manages authentication, and applies rate limiting.
API Servers
- Write Path: Manages uploading of images, creating new posts, updating user profiles, etc., ensuring data consistency and integrity.
- Read Path: Retrieves data like user feeds, image details, comments, etc., optimized for high-speed access.
Message Queue
Used for decoupling services (e.g., post-upload processing, notifications), ensuring scalability and reliability in asynchronous processing.
Search Service
Handles search queries and returns relevant results.. Powers search functionality for finding users, hashtags, locations, etc., possibly using sophisticated algorithms and indexing.
- Search Aggregation service: Aggregates data from various sources to provide comprehensive search results.
Object Storage
Stores unstructured data (like images and videos) at scale. Used for storing all user-uploaded media content, ensuring durability and high availability.
Metadata Store(& Cache)
Stores metadata associated with objects (like images) and user data, often with caching for faster access. Maintains data like image descriptions, tags, user profiles, etc., with caching for frequently accessed data.
Feed Generation Service
Generates personalized content feeds for users with images from followers.
Content Delivery Network (CDN)
Distributes content to servers located close to users to reduce latency.
Notification Service
Manages the sending of notifications to users, notifies users about new followers, likes, comments, and other relevant activities in real-time.