So you want to keep up with all the cool kids throwing around terms like “multi-cluster replication” but you don’t have time to read several textbooks. This handy quick-reference will give you the framework for following (and participating in!) conversations involving distributed systems or Temporal with ease. At the next dinner party you’ll be able to win friends and influence people with your ability to explain distributed systems succinctly in plain English… because we all know you’re not gonna do it with salad. This guide builds upon itself, with terms requiring no additional context first.
Core Distributed Systems Terms To Know
concurrency ↠
Roughly, the idea of running multiple things at once. Two people eating dinner at the same time (“in parallel”) are eating concurrently. Your operating system context switching between a web browser and IDE is also a form of concurrency.
scalability 📈
The ability for a system, such as a website, to accommodate a growing number of requests or work. You can improve scalability by finding places where work can be executed simultaneously or removing performance bottlenecks.
reliability ✅
The likelihood of a system to run without failure for a period of time. Systems can be made more reliable by reducing single points of failure, and detecting failures quickly.
eventual consistency 🐌
Let’s say you’ve replicated a database to improve reliability (and possibly scalability). Great! Eventual consistency says a change in the data in one location will eventually be updated in every location that the database lives; however, until every location is updated, a read from one of the locations may not yet have the updated value. You, the programmer, need to bear in mind you may not always have the most up-to-date data when working under this model.
strong consistency 💪
The guarantee that a data store will always provide the most up-to-date value.
CAP Theorem 🧢
The rule that you gotta pick two of the three: (strong) consistency, availability, partition tolerance. Any distributed data store can only provide at most any two of these qualities, alas. Availability is defined as every request returns a non-error response. Partition tolerance is the ability for a system to continue to operate despite requests between data store nodes being delayed or dropped. See also strong consistency and eventual consistency.
ACID 🧪
Hardcore-sounding acronym borrowed from databases that stands for atomicity, consistency, isolation, durability. See strong consistency, eventual consistency, and other deets below.
atomicity ⚛️
Executing a sequence of operations all together as if they were a single unit, or not at all.
isolation 📦
Executing a sequence of operations concurrently with another sequence has the same effect as executing each operation sequentially.
durability 🗿
Think long-lasting. Standing the test of time. Persisting—i.e. written to disk, or if you were really hardcore, etched on a stone tablet—at which point it can be looked up even in the face of system failure such as a power outage or crash.
durable execution 🔜
Similar to durability, once a program has started executing, it will continue executing to completion. Persisting every step the program takes so that execution can be continued by another process if the current process dies.
idempotent function 🥪
Scary-sounding word, less scary meaning: a function that has the same observed result when called with the same inputs, whether it is called one time or many times.
A function setting some field foo=3
? Idempotent. The function foo += 3
? Not idempotent, because the value of foo
is dependent on the number of times your function is called. Naive implementations of functions that transfer money or send emails are also not idempotent by default.
deterministic function 🧮
Code that always has the same effect/output when given a particular input. Things that are not deterministic use some external state such as user input, a random number, or stored data. Code that reads or writes to a variable that other code can also modify simultaneously is also not deterministic.
platform 💻
Windows, iOS, Docker, and VMware are all platforms. They’re execution environments that define how programs behave inside them. Temporal is also a platform, which defines that code run with Temporal is failure and timeout resilient. You may see the term platform-level used in relation to failures. Platform-level failures are caused by low-level issues such as network errors or process crashes.
application 〉
The code you write. You may see the term application-level used in relation to failures. Application-level failures are domain-specific failures like “insufficient inventory”, or “user canceled ride request.”
event sourcing 🎤
A design pattern that creates event objects for every state change in a system, and records this sequence of events in a log (or event history). Temporal uses event sourcing “under the hood” to ensure failure resilience.
Temporal-Specific Terms
Temporal ✧
A way to run your code, a service and library that work together, that ensures your code never gets stuck in failure at the application-level. or the platform-level. While libraries like async-retry take care of retry logic for functions that fail, what happens if your code making that library call crashes? Temporal says “we gotchu.” It abstracts away complex concepts around retries, rollbacks, queues, state machines, and timers, so that no matter where the failure happens, we’ll ensure your code keeps running the way you want.
Worker 👷
The process that’s actually doing the work executing all of your Temporal code (the Workflow and Activities). Capitalized here to denote the Temporal-specific concept of a Worker, to differentiate from the generic idea of a worker process.
Workflow 📖
The high-level business logic of your program. Essentially, this is where the logic of your application begins. (Technically execution starts with the Worker, and the Worker runs the Workflow code.) All Workflow logic must be deterministic.
Activity 💾
Components of your Workflow that might fail, like network or file system calls, inventory holds, or credit card charges. The decision around how many Activities your program should have–whether you make a separate Activity for every non-deterministic call or put the entire rest of your program in an Activity (don’t do that)–is generally a function of how you’d like your program to behave when retrying a failure. For example, if a downstream instruction should always grab the very freshest data when retrying, those instructions should be grouped together in a single Activity. If you can retry with the old data, they can be in separate Activities. Since Activities can be retried, they should be idempotent.
Query 🙋
A way to inspect the state of a Workflow. The results are guaranteed to show the most recent state.
Signal 🧑🏫
A way to notify or send information to a Workflow. A common use case is notifying a Workflow that the user added items to their shopping cart.
retry 🔄
Generally, re-executing an Activity that has failed. Technically, Workflows can also be retried, but they are far less common, such as a developer attempting to update Workflow code running in production.
Cluster 🏘️
The collection of services and databases that make failure and timeout resilience possible. You might sometimes see this colloquially called the Temporal Server.
History 🗃️
A log of events that happened over the course of execution. This log contains attempts to run Activities, Workflow status changes (started, failed, scheduled, etc), timer events, and external information signaled to the system during the run.
In Closing
Knowing this core set of 25 terms should give you sufficient lay-of-the-land to sling references like ACID and Workflow in conversations with coworkers, friends, and family with ease! Better yet, you now know enough to dive deeper into subdomains of interest. If you’d like to try out these terms in practice, check out our getting started guides, courses, and examples in Go, Java, Python, PHP, and TypeScript.
Top comments (0)