Distributed System Design 🔬 (10min read) | Life, The Universe, and Everything

type

status

date

slug

summary

Nomenclatures

Abstraction: it helps developing modularized component so that you don’t need to bother with the details inside each component when building large and complex systems.

RPC: is used to treat remote call over internet as local function calls so that you don’t need to worry about the details of this process when designing or implementing large systems.

Five types of failures in a network:
fail-stop
crash
omission failure
temporal failure
byzantine failure

Key concepts

SLA vs Latency

Service Level Agreement (SLA): Let's take Amazon Web Services (AWS) as an example. AWS provides a number of SLAs for its various services. For example, for its S3 storage service, AWS offers an SLA that guarantees a Monthly Uptime Percentage of at least 99.9%. If AWS falls short of this commitment in a given month, customers are eligible for a service credit. The SLA outlines specific calculations for determining actual uptime, as well as the process for customers to submit claims for service credits.

Latency: Let's consider a video streaming service like Netflix. When you select a movie to watch, the movie data needs to be sent from Netflix's servers to your device. The latency is the time it takes from when you press 'Play' to when the movie actually starts playing. If the latency is high, you may experience a noticeable delay or buffering.

Availability vs Reliability

Reliability is the probability that a system will fail-free in a given period, usually measured over a specified interval. In other words, it's the system's ability to perform its required functions under stated conditions for a specified period. A more reliable system has fewer failures.

For example, a reliable car starts every morning without any issues. If it occasionally fails to start, it's considered less reliable.

Availability, on the other hand, is the probability that a system is operating satisfactorily at any point in time. It's usually expressed as the percentage of time a system is expected to be available for use. This includes time to recover from failures, which is why it differs from reliability. Even a system that is highly reliable (doesn't fail often) can have low availability if it takes a long time to recover when it does fail.

For example, a website might be designed to be available 99.999% of the time, meaning it's only down for about 5 minutes per year.

Now, let's look at the combinations:

Low A, Low R: A system is often unavailable and fails frequently. An example could be an old, poorly maintained server that often crashes and takes a long time to reboot.

Low A, High R: A system doesn't fail often but takes a long time to recover when it does. For example, a system may be very stable and rarely fail, but when it does, it requires significant downtime for maintenance.

High A, Low R: A system frequently fails but recovers very quickly each time. This could be a website with a microservices architecture that frequently experiences failures in individual services, but those services are quickly restarted or rerouted, causing minimal impact on overall availability.

High A, High R (Desirable): A system rarely fails, and when it does, it recovers quickly. For example, a well-designed and maintained cloud-based service with robust error handling and quick failover capabilities would fall into this category.

In a nutshell, while reliability and availability are related, they focus on different aspects of a system's performance. Therefore, it's crucial to consider both when designing, implementing, and maintaining systems.

Web Server vs Application Server

Web servers and application servers are both critical components of modern web applications, but they have distinct roles and functionalities. They are separated for scalability concerns.

Web Server: A web server's primary role is to serve static content to the client in response to HTTP requests. When a user's browser requests a file (like an HTML page, an image file, a CSS file, or a JavaScript file), the web server locates that file and sends it back to the client. Web servers are specifically designed to serve web pages to clients and they do this efficiently.

Some common examples of web servers include Apache HTTP Server, Microsoft's Internet Information Services (IIS), and NGINX.

Application Server: An application server's primary role is to provide the business logic for an application program. It hosts and exposes business logic and processes to client applications through various protocols, possibly including HTTP. The application server works in conjunction with the web server, which delegates application-level requests for dynamic content to the application server.

In many cases, an application server also provides additional services such as transaction management, messaging, and security. It's important to note that an application server can also serve static content to clients, but it's generally not as efficient at this as a dedicated web server.

Some common examples of application servers include Apache Tomcat, IBM WebSphere, and WildFly (previously known as JBoss).

In summary, while both web servers and application servers can process HTTP requests and responses, web servers are typically used for serving static content and directing requests for dynamic content to application servers, while application servers handle the business logic that generates that dynamic content.

Different level of internet access

Network access

Physical level: fiber, cable, wireless,
link level: MAC address

Internet

IPv4
IPv6

Transport

Application

HTTP, HTTPS, POP, SMTP, DHCP, FTP, SSH, DNS

General Questions

To maintain high availability, should the TTL value be large or small?

To maintain high availability, the TTL value should be small. This is because if any server or cluster fails, the organization can update the resource records right away. Users will experience non-availability only for the time the TTL isn’t expired. However, if the TTL is large, the organization will update its resource records, whereas users will keep pinging the outdated server that would have crashed long ago. Companies that long for high availability maintain a TTL value as low as 120 seconds. Therefore, even in case of a failure, the maximum downtime is a few minutes.

How does TLS work in general?

In a Transport Layer Security (TLS) connection, decoding the encrypted message happens in several steps, making use of both symmetric and asymmetric cryptography. Here's a simplified overview of the process:

1. Establishing a Secure Connection (TLS Handshake):

Before any data is exchanged, the client and server perform what is known as a "TLS Handshake" to agree upon the encryption standards to be used.

The client sends a "ClientHello" message to the server, indicating the TLS versions and cipher suites it supports.

The server responds with a "ServerHello" message, specifying the chosen protocol and cipher suite from the client's list. The server also sends its public key (embedded in its digital certificate), which the client can use to encrypt data that only the server can decrypt with its private key.

The client verifies the server's certificate against a list of trusted Certificate Authorities (CAs). If the certificate is valid, the client generates a secret key known as the "pre-master secret" and encrypts it with the server's public key. This encrypted pre-master secret is then sent to the server.

The server uses its private key to decrypt the pre-master secret. Both the client and the server then use this pre-master secret to generate the same session key (or "master secret"), which is used for symmetric encryption during the data transfer stage.

2. Data Transfer:

The sender (either client or server) encrypts the payload data using the agreed-upon symmetric encryption algorithm and the session key.

The recipient receives the encrypted data and decrypts it using the same symmetric encryption algorithm and session key.

3. Closing the Connection:

When the data transfer is complete, both the client and server exchange encrypted "Finished" messages, after which the session keys can be discarded.

This process ensures that the session keys used for symmetric encryption (which is faster for large amounts of data) are exchanged securely using asymmetric encryption (which is more secure but slower). It also verifies the identities of the client and server (but usually just the server in typical web browsing) through the use of digital certificates.

What are the query patterns that document database support but not key-value database?

Key-value databases are highly efficient for retrieving values when you know the key, but they're not designed for complex querying. You can think of key-value stores as a sort of dictionary, where you look up a word (the key) to find its definition (the value). However, if you want to perform a more complicated operation—like finding all words that contain a certain letter or have a certain length—you'd need a more sophisticated tool.

On the other hand, document databases like MongoDB allow for much more complex querying. For example, with a document database, you could perform operations like:

Field querying: Find all documents where a certain field matches a certain value. This is similar to the WHERE clause in SQL.

Range queries: Find all documents where a certain field falls within a certain range.

Sub-document queries: If your documents contain nested fields (like a JSON object within a JSON object), you can query on the inner fields.

Aggregation operations: These allow you to run complex analytics and statistical analysis, like summing up values, counting distinct values, grouping by fields, etc.

Full-text search: Some document databases support full-text search, which makes it easy to build search functionality into your application.

These types of operations would be challenging or impossible with a key-value store, because key-value stores are not designed to understand the structure of their values. They see each value as an opaque blob, whereas document databases understand the structure of their documents and can index and query on any part of a document.

How UTF-8 is used in website?

Why need a shared secret key instead of using the pre-master key during TLS handshake?

The use of a shared secret key, also known as the session key, in the TLS handshake is a result of the need for efficiency and performance in secure communication.

Here is a brief overview of the TLS handshake and the creation of the session key:

The client sends a ClientHello message to the server, indicating that it wants to start a secure session. The message includes the version of TLS the client supports, the cipher suites it supports, and a randomly generated ClientHello.random value.

The server responds with a ServerHello message, selecting the version of TLS and cipher suite to use from the client's list. It also generates a ServerHello.random value.

The server sends its certificate to the client and may request the client's certificate for mutual authentication.

The client verifies the server's certificate and, if server requested, sends its own certificate.

The client generates a pre-master secret and encrypts it with the server's public key, then sends the encrypted pre-master secret to the server.

The server uses its private key to decrypt the pre-master secret.

Both client and server use the pre-master secret and the random values sent in the Hello messages to generate the shared secret, known as the master secret. This is done via a pseudorandom function (PRF).

The master secret is then used to generate session keys for symmetric encryption and MAC (Message Authentication Code) keys.

The reason for using a session key for data encryption, rather than the pre-master key or the master secret, is primarily performance. Public key encryption methods (like those used to encrypt and exchange the pre-master secret) are very secure, but they are also computationally intensive and slow. Symmetric key encryption (like that used with the session key) is much faster and therefore more suitable for encrypting large amounts of data.

Additionally, generating a session key for each session provides what's called "forward secrecy". If a session key is compromised, only the data from that specific session is at risk. Past sessions that used different keys remain secure. This is why a new session key is generated for each session, rather than using the same key (like the pre-master secret) for multiple sessions.

So, the pre-master key is used as part of the process to establish a shared secret (the master secret), which is then used to generate session keys for efficient and secure communication.

Synchronous and Asynchronous replication of datastore

In synchronous replication, the primary node waits for acknowledgments from secondary nodes about updating the data. After receiving acknowledgment from all secondary nodes, the primary node reports success to the client. Whereas in asynchronous replication, the primary node doesn’t wait for the acknowledgment from the secondary nodes and reports success to the client after updating itself.

Consistent hashing vs hash-based partition

in short, consistent hashing will re-distribute the loads into neighbor nodes after adding a node while the hash-based partition will force a re-balance of data on every single node. For consistent hashing, The concept is a hash circle. We wii apply the same hashing on top of nodes and incoming requests. Then the request will be routed to the next node on the ring. So when you add a node, it will only impact the next available node on the ring. All the other nodes will stay the same, leaving out the problems of adjusting all data in the traditional hashing/ modulo approach.

4XX and 5XX HTTP Response code

HTTP response status codes indicate whether a specific HTTP request has been successfully completed. They are grouped into five classes:

Informational responses (100–199)

Successful responses (200–299)

Redirection messages (300–399)

Client error responses (400–499)

Server error responses (500–599)

The 4xx class of HTTP status codes is intended for situations in which the client seems to have erred. Examples of the 4xx status codes include:

400 Bad Request: The server could not understand the request due to invalid syntax.

401 Unauthorized: The request lacks valid authentication credentials for the target resource.

403 Forbidden: The server understood the request but refuses to authorize it.

404 Not Found: The server could not find the requested URL.

429 Too Many Requests: The user has sent too many requests in a given amount of time ("rate limiting").

The 5xx class of status code is intended for situations in which the server is aware that it has encountered an error or is otherwise incapable of performing the request. Examples of 5xx errors include:

500 Internal Server Error: A generic error message, given when an unexpected condition was encountered and no more specific message is suitable.

501 Not Implemented: The server either does not recognize the request method, or it lacks the ability to fulfill the request.

502 Bad Gateway: The server was acting as a gateway or proxy and received an invalid response from the upstream server.

503 Service Unavailable: The server is currently unavailable (because it is overloaded or down for maintenance). Generally, this is a temporary state.

504 Gateway Timeout: The server was acting as a gateway or proxy and did not receive a timely response from the upstream server.

The response codes are defined by sections of the HTTP standard, and they are a key part of how servers and clients communicate and understand each other during HTTP transactions.

Service Discoverer in the context of monitoring distributed system

A Service Discoverer isn't a database in the traditional sense, but it does maintain a registry, which you might think of as a specialized form of a database. This registry stores information about the services in a distributed system, including details like their network locations (IP addresses and ports), the APIs they expose, their status, and other metadata.

However, a service discoverer does more than just store and provide information. It's also responsible for:

Health checking: It periodically checks the status of the services in its registry to ensure they are still operational. If a service fails a health check, the service discoverer updates the registry, marking that service as unavailable or removing it entirely.

Load balancing: In some implementations, a service discoverer may also handle client requests by routing them to appropriate service instances based on load-balancing rules.

Service coordination: It plays an integral role in allowing services within a distributed system to discover and communicate with each other, which is crucial for the system's overall functioning.

So while the service discoverer does involve a "database-like" component in the form of its registry, it also has responsibilities and functionalities that go beyond what we typically associate with a database.

How can a monitoring system reliably work if it uses the same infrastructure in a data center that it was supposed to monitor? Consider this given that a failure of a network in a data center can knock out the monitoring components.

The actual deployment of a monitoring system needs special care. We might have an internal, monitoring-specific network to isolate it from the common network. We should use a separate instance of blob stores and other services.

It also helps to have external components to the monitoring, where external might mean an independent service provider’s infrastructure. However, designing such a system is complex and is more expensive.

How to monitor the client-side error accessing your service?

To ensure that the client’s requests reach the server, we’ll act as clients and perform reachability and health checks. We’ll need various vantage points across the globe. We can run a service, let’s call it prober, that periodically sends requests to the service to check availability. This way, we can monitor reachability to our service from many different places.

Instead of using a prober on vantage points, we can embed the probers into the actual application instead. We’ll have the following two components:

Agent: This is a prober embedded in the client application that sends the appropriate service reports about any failures.

Collector: This is a report collector independent of the primary service. It’s made independent to avoid the situations where client agents want to report an error to the failed service. We summarize errors reports from collectors and look for spikes in the errors graph to see client-side issues.

🔥

It takes time to gain system design knowledge. But be patient, and then you will code it!

Useful Links 🔗

https://www.educative.io/courses/grokking-modern-system-design-interview-for-engineers-managers. I found its components of distributed system intros are quite well-explained.