A Computer Science survival guide for C++/Linux developers
Fri Feb 14 23:04:37 UTC 2025
Let’s start with a haiku written by a computer.
Code runs swiftly
Syntax rules guide the way
C++, a language of power
This reference is a distillation of 15+ years of online logbook notes into only the essentials that have continued to remain relevant as a senior software developer today. Just for fun – and where the topic is readily available or established – I have reached out to OpenAI to provide a paragraph or two. Consequently, the exact content and the chapter order will vary each night. Hopefully this will keep repeat visits interesting and also prevents me focusing all my attention on the first few chapters.
If time is tight, try the random daily chapter. And you can also raise a ticket on this repo.
A cache is a small amount of unusually fast memory.
A modern multi-core machine will have a multi-level cache hierarchy, where the faster and smaller caches belong to individual processors. When one processor modifies a value in its cache, other processors cannot use the old value any more; and that memory location will be invalidated in all of the caches. Furthermore, since caches operate on the granularity of cache lines and not individual bytes, the entire cache line will be invalidated in all caches.
A multiprocessor is cache consistent if all writes to the same memory location are performed in some sequential order”.
OpenAI: Processor caches refer to small, high-speed memory units located on the processor chip that store frequently accessed data and instructions. These caches improve the performance of the processor by reducing the time taken to access data from the main memory.
There are typically three levels of processor caches:
Level 1 (L1) cache: This is the smallest and fastest cache, located on the processor itself. It is divided into two parts - the instruction cache, which stores frequently used instructions, and the data cache, which stores frequently used data. The size of L1 cache ranges from 8KB to 64KB.
Level 2 (L2) cache: This is a larger cache, located on the processor chip or on a separate chip next to it. It is used to store data and instructions that are not found in the L1 cache. The size of L2 cache ranges from 256KB to 4MB.
Level 3 (L3) cache: This is the largest and slowest cache, typically found on the motherboard. It is used to store data and instructions that are not found in the L1 and L2 caches. The size of L3 cache can range from 4MB to 64MB.
The purpose of these caches is to reduce the time taken to access data from the main memory, which is much slower than the processor. When the processor needs to access data, it first checks the L1 cache, if the data is not found there, it checks the L2 cache and then the L3 cache. If the data is not found in any of the caches, it is retrieved from the main memory.
The use of processor caches greatly improves the performance of the processor, as it reduces the number of times the processor needs to access the main memory. This is especially beneficial for tasks that involve repetitive data access, such as gaming, video editing, and web browsing.
However, since the size of these caches is limited, they can only store a certain amount of data and instructions. This means that if the data or instructions needed by the processor are not found in any of the caches, the processor will have to access the main memory, resulting in a slower performance. Additionally, if the caches are not managed efficiently, it can lead to cache conflicts where different processes are trying to access the same cache at the same time, causing delays in data retrieval.
In summary, processor caches play a crucial role in improving the performance of the processor by reducing the time taken to access data from the main memory. They are an integral part of modern processors and their effectiveness is constantly being improved with advancements in technology.
A multiprocessor is cache consistent if all writes to the same memory location are performed in some sequential order.
OpenAI: Cache coherence refers to the synchronization of data between different cache memory locations and the main memory in a computer system. When multiple processors or cores share access to the same data, cache coherence ensures that all copies of the data are consistent and up-to-date.
In a multi-processor system, each processor has its own cache memory which stores frequently used data from the main memory. When one processor updates a data item in its cache, other processors that have a copy of that data need to be notified and update their copies as well. This ensures that all processors have the most recent version of the data.
Cache coherence protocols are used to manage this synchronization process. These protocols define rules and mechanisms for maintaining data consistency between caches, such as invalidating outdated copies and updating them with the latest data.
Without cache coherence, there is a risk of data inconsistency and errors in the system. For example, if one processor modifies a data item and another processor reads an outdated copy of that data, it can lead to incorrect results or system crashes.
Overall, cache coherence is essential for efficient and reliable data sharing in multi-processor systems, ensuring that all processors have access to consistent and up-to-date data.
Each cache line has state, tag, data.
See bus snooping.
Used for large CPUs. See wiki.
All Intel-compatible CPUs during the last one/two decades used a write-back strategy for caches (which presumes fetching a cache line first to allow partial writes). Of course that’s the theory, reality is slightly more complex than that.
There are three kinds of cache misses:
OpenAI: Cache misses occur when the processor needs to access data that is not currently stored in the cache memory. This means that the data needs to be retrieved from a lower level of memory, such as the main memory or the hard drive. This process is slower compared to accessing data from the cache memory, which is why cache misses can result in slower performance.
There are two types of cache misses: compulsory and capacity.
Compulsory cache misses occur when a program is first run and data is not yet stored in the cache. This is because the cache memory is empty and needs to be filled with data as it is accessed.
Capacity cache misses occur when the cache memory is full and a new piece of data needs to be stored. In this case, the cache must first make room by evicting some existing data before it can store the new data. This process takes time and can result in slower performance.
Cache misses can also occur due to conflicts in the cache memory. This happens when multiple pieces of data are mapped to the same cache location, resulting in one piece of data overwriting another. This can be mitigated by using different cache mapping techniques, such as set-associative or fully associative mapping.
Overall, cache misses can significantly impact the performance of a system, especially in applications that require frequent access to large amounts of data. To reduce the number of cache misses, developers can optimize their code to make better use of the cache memory and ensure efficient data access.
+ (vec.size() & 1)
The way virtual functions work may cause issues with caching.
But we may be able to use CRTP to avoid the first two.
The curiously recurring template pattern (CRTP) is an idiom in C++ in which a class X derives from a class template instantiation using X itself as template argument. More generally it is known as F-bound polymorphism, and it is a form of static polymorphism.
// Base class template using CRTP
template<typename Derived>
class Base {
public:
void foo() {
static_cast<Derived*>(this)->fooImpl();
}
};
// Derived class using CRTP
class Derived : public Base<Derived> {
public:
void fooImpl() {
// Implementation of foo specific to Derived
std::cout << "Hello from Derived!" << std::endl;
}
};
Typically a cache-oblivious algorithm works by a recursive divide and conquer algorithm, where the problem is divided into smaller and smaller sub-problems. Eventually, one reaches a sub-problem size that fits into cache, regardless of the cache size.
Design to get the most value out of every single cacheline that is read. Split apart the functions and data.
Cache locality refers to the principle of organizing data in a way that takes advantage of the hierarchy of computer memory systems, such as the cache, main memory, and secondary storage. This means that data that is frequently accessed or used together is stored in close proximity to each other in memory, making it faster to retrieve and process.
There are two main types of cache locality:
Temporal locality: This refers to the principle that data that has been recently accessed is likely to be accessed again in the near future. This is because programs often have loops or repetitive operations that require the same data to be accessed multiple times.
Spatial locality: This refers to the principle that data that is stored close together in memory is likely to be accessed together. This is because programs often access data in a sequential or contiguous manner.
By optimizing for cache locality, the computer can reduce the number of times it needs to retrieve data from slower memory systems, improving overall performance and efficiency. This is especially important in modern computer architectures where the gap between processor speed and memory speed continues to widen.
The Translation Lookaside Buffer (TLB) is a hardware cache that is used in computer systems to improve the efficiency of virtual memory address translation. It is a small, high-speed cache that stores recently used virtual-to-physical address translations.
When a program references a memory location, the virtual memory address needs to be translated into a physical memory address by the operating system. This translation process involves looking up a page table or a similar data structure. Since this translation operation is performed frequently, it can introduce overhead and impact system performance.
The TLB acts as a cache for these translations, reducing the need for repeated lookups in the page table. It stores a subset of the most recently used translations, making it faster to access and retrieve the corresponding physical memory address.
TLBs can be shared between multiple cores or processors, or each core may have its own dedicated TLB.
OpenAI:
Translation lookaside buffer (TLB) is a hardware cache that stores the most recently accessed virtual-to-physical address translations. It is used in virtual memory systems to speed up the process of translating virtual addresses to physical addresses. The TLB holds a limited number of translations, typically a few hundred, and is used to reduce the time it takes to access memory. When a virtual address is accessed, the TLB is checked first to see if the translation is already present. If it is, the physical address is retrieved from the TLB and the translation process is expedited. This helps to improve the overall performance and efficiency of the system. However, if the translation is not present in the TLB, the system will have to access the page table, which can be a slower process. The TLB is constantly updated and managed by the operating system to ensure that the most frequently used translations are always available.
A memory fence (or barrier) is a programming language construct used to enforce an ordering constraint on memory operations issued by a thread of execution on a CPU. This typically means that operations issued prior to the fence are guaranteed to be performed before operations issued after the fence.
const int num_mailboxes = 32;
std::atomic<int> mailbox_receiver[num_mailboxes];
std::string mailbox_data[num_mailboxes];
// The writer threads update non-atomic shared data
// and then update mailbox_receiver[i] as follows:
[i] = ...;
mailbox_datastd::atomic_store_explicit(&mailbox_receiver[i], receiver_id, std::memory_order_release);
// Reader thread needs to check all mailbox[i], but only needs to sync with one.
for (int i = 0; i < num_mailboxes; ++i)
if (std::atomic_load_explicit(&mailbox_receiver[i],
std::memory_order_relaxed) == my_id)
{
// synchronize with just one writer
std::atomic_thread_fence(std::memory_order_acquire);
// guaranteed to observe everything done in the writer thread
// before the atomic_store_explicit()
(mailbox_data[i]);
do_work}
See wiki and cppreference.
There are several factors that can push something out of the cache:
For simple readonly types, pass by const value.
void func(const int p) {
}
For large readonly types – i.e., larger than a pointer – pass by const reference. You should also consider if the type is trivially copyable: requiring a shallow or deep copy.
void func(const big_type& p) {
}
If the first thing you do is make a copy, pass by non-const value.
void func(int p) {
}
If you want to modify the caller’s copy, pass by non-const reference.
void func(int& p) {
}
The only difference between a class and a struct is the default access level: public, protected, private.
But typically a struct is used for “plain old data” or memory-mapped data where you don’t want the members to align in a predictable way (virtuals/inheritance adds the complication of a vtable pointer.)
Either way, you can test your assertions at compile time with
static_assert
and the type
traits header.
struct A {
// public: <-- default for structs
int x_;
};
If the data are constrained by an invariant – i.e., not all values are valid – use a class.
class B {
private: // <-- default for classes
int x_; // must be between 0-1000
};
Internal linkage in C++ refers to the visibility and accessibility of a variable or function within a single translation unit. A translation unit is a single source file along with all of its included header files. Variables and functions with internal linkage can only be accessed within the translation unit in which they are declared. This is achieved by using the static keyword in front of the variable or function declaration.
External linkage, on the other hand, refers to the visibility and accessibility of a variable or function across multiple translation units. Variables and functions with external linkage can be accessed from any translation unit as long as they are declared in one and defined in another. This is the default linkage for global variables and functions.
Variables and functions with external linkage can also be declared as extern in a translation unit in which they are used but not defined. This indicates to the compiler that the variable or function is defined in another translation unit and should be linked to during the compilation process. This allows for sharing of variables and functions between different parts of a program.
In summary, internal linkage restricts the visibility of a variable or function to a single translation unit, while external linkage allows for sharing of variables and functions between different translation units.
Dependencies on static variables in different translation units are, in general, a code smell and should be a reason for refactoring.
If an object or function inside such a translation unit has internal linkage, then that specific symbol is only visible to the linker within that translation unit. If an object or function has external linkage, the linker can also see it when processing other translation units. The static keyword, when used in the global namespace, forces a symbol to have internal linkage. The extern keyword results in a symbol having external linkage.
std::call_once
vs double checked lockingUsed to declare many things with internal linkage.
namespace {
int a = 0;
int b = 0;
int c = 0;
}
Construction of a building is often used to describe the software design process. But the analogy breaks down the building designer only has one attempt to compile the building. You can’t swap out the foundation halfway through.
Actually, in Royce’s later revisions of the model, he stated that the whole process should be done “twice if possible”.
Validation is concerned with checking that the system will meet the customer’s actual needs. Verification will help to determine whether the software is of high quality, but it will not ensure that the system is useful.
Verification is objective (quality), validation is subjective (customer needs).
Validation is the process of checking whether the specification captures the customer’s needs, while verification is the process of checking that the software meets the specification.
See difference between verification and validation.
Stop doing the things that nobody wants
What to call your test is easy – it’s a sentence describing the next behaviour in which you are interested. How much to test becomes moot – you can only describe so much behaviour in a single sentence. When a test fails, simply work through the process described above – either you introduced a bug, the behaviour moved, or the test is no longer relevant.
Test-driven development (TDD) is a software development process that relies on the repetition of a short development cycle: requirements are turned into specific test cases, then the software is improved to pass the new tests, only. This is opposed to software development that allows software to be added that is not proven to meet requirements.
As a [X] I want [Y] in order to [Z]
The Jonathan Boccara CppCon 105 STL Algorithms in Less Than an Hour is well worth a watch for a quick overview.
Quicksort is the poster boy of sorting algorithms.
{ “id”: “cmpl-B0yz954pqaeaAJCV1J0QDRo9Ou8To”, “object”: “completion”, “created”: 1739574127, “model”: “gpt-3.5-turbo-instruct:20230824-v2”, “choices”: [ { “index”: 0, “text”: ” is O(nlog(n)) in the average and best case and O(n^2) in the worst case. This is because in the average and best case, the algorithm divides the input into two sublists of roughly equal size, with each iteration reducing the size of the sublists by half. This results in a logarithmic time complexity. However, in the worst case, the pivot chosen is consistently the smallest or largest element, resulting in unbalanced partitions and a time complexity of O(n^2) as each iteration only reduces the size of the sublist by 1.”, “finish_reason”: “stop” } ], “usage”: { “prompt_tokens”: 6, “completion_tokens”: 117, “total_tokens”: 123, “prompt_tokens_details”: { “cached_tokens”: 0, “audio_tokens”: 0 }, “completion_tokens_details”: { “reasoning_tokens”: 0, “audio_tokens”: 0, “accepted_prediction_tokens”: 0, “rejected_prediction_tokens”: 0 } } }
Below is a Python implementation just for fun. But how is it implemented in C++?
def partition(arr, low, high):
= ( low-1 )
i = arr[high]
pivot
for j in range(low , high):
if arr[j] < pivot:
= i+1
i = arr[j],arr[i]
arr[i],arr[j]
+1],arr[high] = arr[high],arr[i+1]
arr[ireturn ( i+1 )
def quickSort(arr, low, high):
if low < high:
= partition(arr,low,high)
pi
-1)
quickSort(arr, low, pi+1, high)
quickSort(arr, pi
= [10, 7, 8, 9, 1, 5]
arr = len(arr)
n 0,n-1)
quickSort(arr,print (\"Sorted array is:\")
for i in range(n):
print (\"%d\" %arr[i]),
Introsort is a sorting algorithm that combines the best features of quicksort, heapsort, and insertion sort. It is a hybrid algorithm that uses quicksort to partition the data and then switches to heapsort when the recursion depth exceeds a certain level. If the data set is small enough, it switches to insertion sort for faster sorting. This algorithm is designed to handle the worst-case scenario of quicksort, which is when the data is already sorted or nearly sorted.
The algorithm starts by choosing a pivot element, typically the first or last element in the list. Then, it partitions the data into two sublists, one with elements less than the pivot and one with elements greater than the pivot. This process is repeated recursively on each sublist until the sublists are small enough to be sorted using insertion sort.
However, to prevent the worst-case scenario of quicksort, the algorithm also checks the recursion depth and switches to heapsort when it exceeds a certain level. Heapsort guarantees a worst-case time complexity of O(n log n), which is better than quicksort’s worst-case time complexity of O(n^2).
Introsort has a time complexity of O(n log n) in the average and best cases, and O(n log n) in the worst case. It also has a space complexity of O(log n). This makes it a very efficient sorting algorithm for large data sets, especially when the data is mostly sorted or nearly sorted.
In summary, introsort is a hybrid sorting algorithm that combines the strengths of quicksort, heapsort, and insertion sort to efficiently sort large data sets with a guarantee of O(n log n) time complexity in all cases.
Introsort is in place but not stable: i.e., equivalent
elements are not guaranteed to remain in the same order. However, the
Standard Library offers stable_
versions of various sorting
algorithms.
If additional memory is available,
stable_sort
remains O(n ∗ logn). However, if it fails to allocate, it will degrade to an O(n ∗ logn ∗ logn) algorithm.
https://leanpub.com/cpp-algorithms-guide
The threshold for switching to insertion sort varies for each compiler.
std::list
std::sort
requires random access to the elements, so
std::list
has its own sort method, but it still
(approximately) conforms to O(n log n). It can be implemented with merge
sort as moving elements is cheap with a linked list.
Considerations for choosing an algorithm: size of input, Type of input: e.g., partially sorted, random.
Sorting algorithms are used to organize a list of elements in a specific order. The performance of a sorting algorithm is usually measured by its time complexity and space complexity.
Bubble sort is a simple and intuitive sorting algorithm that works by repeatedly swapping adjacent elements if they are in the wrong order. It has a time complexity of O(n^2), which means that it takes quadratic time to sort a list of n elements. This means that as the size of the input increases, the time taken to sort the list increases significantly. However, it has a space complexity of O(1), which means that it does not use any additional memory other than the given list.
Selection sort works by repeatedly finding the minimum element from the unsorted part of the list and placing it at the beginning. It has a time complexity of O(n^2), which is the same as bubble sort. However, it has a slightly better space complexity of O(1) as it does not require any additional memory other than the given list.
Insertion sort is similar to selection sort, but instead of finding the minimum element and placing it at the beginning, it finds the correct position for each element in the sorted part of the list and inserts it there. It also has a time complexity of O(n^2) and a space complexity of O(1).
Merge sort is a divide and conquer algorithm that splits the list into smaller sublists, sorts them, and then merges them back together. It has a time complexity of O(nlogn), which is significantly better than the previous sorting algorithms. However, it has a space complexity of O(n), as it requires additional memory to store the sublists during the sorting process.
Quick sort is also a divide and conquer algorithm that works by selecting a pivot element and partitioning the list into two sublists, one with elements smaller than the pivot and one with elements larger than the pivot. It then recursively sorts the sublists. It has a time complexity of O(nlogn), which is the same as merge sort, but it has a space complexity of O(logn) as it uses a stack to keep track of the recursive calls.
Heap sort uses a heap data structure to sort the list. It first builds a heap from the list and then repeatedly removes the root element and places it at the end of the sorted part of the list. It has a time complexity of O(nlogn), which is the same as merge and quick sort, but it has a space complexity of O(1) as it sorts the list in place without requiring any additional memory.
In summary, the time complexity of sorting algorithms can range from O(n) to O(n^2), with merge and quick sort having the best time complexity of O(nlogn). However, this comes at the cost of a higher space complexity. In terms of space complexity, bubble, selection, and insertion sort have the best complexity of O(1), while merge and heap sort have a space complexity of O(n) and quick sort has a space complexity of O(logn).
All of these are Θ(n log n) in all cases apart from Timsort has a Ω(n) and Quicksort has a terrible O(n^2) (if we happen to always pick the worst pivot). Quicksort is a good all rounder with O(n log n) average case. But it does have a O(n^2) worst case. It is said that this can be avoided by picking the pivot carefully but an example could be constructed where the chosen pivot is always the worst case.
Less common but another set of principles to consider.
Information Expert: Assign responsibility to the class or object with the most information needed to fulfill it.
Creator: Assign the responsibility to create objects to the class that has the necessary information about the object’s creation.
High Cohesion: Assign responsibility to a class or object that has a tightly focused and well-defined purpose.
Low Coupling: Reduce inter-dependencies between classes by assigning responsibilities to classes with minimal connections to other classes.
Controller: Assign the responsibility of handling input and coordinating actions to a separate controller class.
Polymorphism: Use inheritance and interface implementation to enable polymorphic behavior, allowing different objects to respond differently to the same message.
Indirection: Assign responsibility for handling a request to an intermediate object that can delegate the request to another object.
Pure Fabrication: Create a class solely for the purpose of supporting other classes and objects.
Protected Variations: Encapsulate the parts of the system that are likely to change, in order to protect other parts from being affected.
Creator-Information Expert: Assign the responsibility of creating objects to the class that has the most information about the object being created.
Low Coupling-Controller: Assign the responsibility of coordinating actions to a separate controller class with minimal dependencies.
Polymorphism-Controller: Use polymorphism to enable different controllers to handle the same input in different ways.
Pure Fabrication-Controller: Create a controller class solely for the purpose of handling input and coordinating actions.
Controller-Protected Variations: Use a controller class to encapsulate variations in handling input and coordinating actions.
Polymorphism-Pure Fabrication: Use polymorphism to enable pure fabrication classes to support different types of objects.
The big things for me are ranges and views, so effortlessly express
and manipulate ranges of data with zero iterators; they read naturally
and are also lazy-evaluated (and gcc 14 brings
ranges::to<>
to the party). Concepts I’ve used a
little, but mostly to narrow the allowed types of an auto parameter:
e.g., std::floating_point auto
.
I’ve attempted to deep dive into templates a few times and it’s
really quite an unpleasant language! However, whilst not a recent
addition, constexpr
is becoming increasingly supported; it
gives you the readability of “normal” code with the potential
performance benefits of compile-time. And it also finds UB! However, you
quickly hit the issue where something isn’t constexpr
so
you have to either implement it yourself or not bother.
Coroutines are quite high-profile but I haven’t got into them yet. Finally, the multidimensional array operator seemed like a game changer but I’ve only used it once.
You’ll have to reach into gcc 14 but std::print
is very
welcome; as are modules (which might actually be available in 13.2 with
-fmodules-fs
flag) but poorly supported and not quite what
I’d like yet.
Notable mentions from C++17 are structured bindings and execution policy.
constexpr
cmathSee the presentation by Marc Gregoire (CppCon 2022).
You’ve gotta track gcc head to avoid disappointment.
data[x, y, z]
and
std::mdspan
uz
literals.contains
for strings and containersconstexpr
for std::optional
and
std::variant
[[assume(true)]]
– introduce UB into your code with
assumptionsstd::print
and std::println
– gcc 14std::flat_map
and
std::flat_set
(contiguous binary search trees) – not in gcc
14import <iostream>;
– modules, gcc 13 with flag
-fmodules-ts
consteval
– immediate functions: only execute at
compile time<generator>
<stack_trace>
std::expected
std::byteswap
A lot of effort has gone into ranges and views in C++23.
starts_with
shift_left
ranges::to<>
– gcc14, convert a range to a vector
(for instance)find_if
contains
contains_subrange
std::ranges::fold_left
– gcc 13.1zip
adjacent
pairwise
chunk
slide
chunk_by
stride
– take every nth elementjoin/join_with
– I’ve had no luck with these outside of
a trivial examplerepeat
iota
– infinite views may be more performant as no
boundary checkslide
– love theseenumerate
– gcc 13.2, easily enumerate your range-based
for loops, you get a pair of index and the element.contains
for maps.starts_with
for stringsstd::jthread
– thread you don’t have to explicitly join
(it’s the same as std::thread
but joins in the destructor),
also has a built-in stop tokenstd::barrier
for thread synchronisation (like
std::latch
but reusable)std::filesystem
– from Booststd::string_view
std::clamp
[[maybe_unused]]
<string_view>
std::byte
Also see cppreference.
C++14 is an extension and improvement of C++11.
0b1111000
auto
return typeauto
in lambda
parameterstemplate <class T> constexpr T bins = T{24'576};
decltype(auto)
constexpr
A test is not a unit test if it:
See the complete article.
See Linux Debuginfo Formats - DWARF, ELF, dwo, dwp - What are They All? - Greg Law - CppCon 2022.
Section | Description |
---|---|
stack / | Local variables |
heap /\ | Dynamic memory |
.rodata |
Read-only data |
.bss |
Uninitialised data |
.data |
Initialised data |
.text |
Code |
Header: This segment contains information about the file, such as the file size, entry point, and architecture.
Text/code: This segment contains the actual machine code that is executed by the processor. It includes instructions, function definitions, and data literals.
Data: This segment contains initialized data such as global and static variables.
BSS (Block Started by Symbol): This segment contains uninitialized data, usually declared as global or static variables.
Stack: This segment contains the program’s execution stack, which is used for local variables, function parameters, and return addresses.
Heap: This segment contains dynamically allocated memory, typically managed by the program’s memory management functions.
Dynamic Linking: This segment contains information required for the program to link and load dynamic libraries at runtime.
Relocation: This segment contains information about the addresses of symbols and how to relocate them when the program is loaded into memory.
Debug: This segment contains debugging information, such as line numbers, symbols, and source code information, used for debugging and error handling.
Resources: This segment contains additional resources such as icons, images, and strings used by the program.
Export: This segment contains information about symbols that can be accessed by other programs.
Import: This segment contains information about symbols that are required by the program and must be resolved at runtime by the operating system.
In 2014 Randall Munroe estimated that Google stores 10 exabytes of data across all of its operations. However, as a C++ developer, you will only come across at most petabytes of storage; and if CPUs are topping out at gigahertz, then you won’t see anything much faster than a nanosecond.
1 000 kilo | milli .001
1 000 000 mega | micro .000 001
1 000 000 000 giga | nano .000 000 001
1 000 000 000 000 tera | pico .000 000 000 001
1 000 000 000 000 000 peta | femto .000 000 000 000 001
1 000 000 000 000 000 000 exa | atto .000 000 000 000 000 001
1 000 000 000 000 000 000 000 zetta | zepto .000 000 000 000 000 000 001
See list of SI prefixes.
You may be tempted to use binary prefixes to be more precise – kibi, mebi, gibi, etc – but most people won’t know what you’re talking about. Also, manufacturers tend to use 10003 rather than 220 because it makes their performance look better.
See why is everyone in such a rush?
1/1000000000 second == 1 nanosecond
Approximate durations of typical operations (rounded to help remember.)
Action | Duration (nanoseconds) |
---|---|
L1 cache read, variable increment | <1 |
Branch misprediction (how do you measure this?) | 5 |
L2 cache read, std::atomic |
10 |
std::mutex/scoped_lock |
20 |
Fetch from main memory | 100 |
Semaphore acquire | 200 |
Send 2KiB over 1Gbps network | 20,000 |
Create a std::thread |
20,000 |
Send packet from US to Europe and back | 200,000,000 |
A hash function is a mathematical algorithm that takes in data of any size and produces a fixed-size output called a hash value or hash code. This output is typically a unique representation of the input data and is used to identify and compare data quickly. Hash functions are commonly used in cryptography, data storage, and data retrieval systems.
Multithreaded concepts are important: e.g., atomics, locks, issues with different designs, how to make things thread safe.
Why use multithreading? For performance and/or separation of concerns.
A routine can be described as reentrant if it can be interrupted and not leave the operation partially complete: the invariants remain true.
A good way to reason about your code is to consider the invariants that must be true at any given point in time. For a linked list, node A’s next pointer must point to node B and node B’s previous pointer must point to node A. However, if you are deleting an element from a linked list there will be a period where this invariant does not hold, and you must acquire mutual access to both nodes whilst the list is being updated; otherwise another thread may see the list in an invalid state.
OpenAI: Invariants refer to specific conditions or properties that remain constant or unchanged throughout a particular process, system, or situation. They are important concepts in mathematics, computer science, and engineering, as well as in other fields such as physics and economics.
Invariants are often used to describe the behavior of a system or process, and they can help identify and understand patterns and relationships between different variables. They can also be used to prove the correctness of algorithms and programs.
There are different types of invariants, including:
Loop invariants: These are conditions that remain true before, during, and after each iteration of a loop in a computer program.
Data invariants: These are properties or constraints that remain true for a particular data structure or object throughout its lifetime.
Physical invariants: These are properties or quantities that remain constant in a physical system, such as energy or momentum.
Logical invariants: These are constraints or relationships that remain unchanged in a logical system, such as a set of axioms or rules of inference.
Invariants are important because they can help identify errors or bugs in a system. If an invariant is violated, it means that something in the system is not working as expected and needs to be fixed. They also provide a way to verify the correctness of a system, as any changes that violate the invariants can be identified and corrected.
Overall, invariants play a crucial role in understanding and analyzing systems and processes, and they are essential tools for ensuring their correctness and stability.
A function is idempotent if it can be applied multiple times without changing the outcome. E.g., repeatedly pressing the “on” switch of an “on/off” pair: it’s still on. See Wikipedia.
tl;dr “Race condition: an unfortunate order of execution causes undesirable behaviour.”
OpenAI: A race condition in computer science refers to a situation where the outcome or behavior of a system depends on the timing or sequence of events, and different outcomes or behaviors can occur depending on which event occurs first. This can lead to unexpected or incorrect behavior of the system.
One common example of a race condition is when multiple processes or threads access and modify a shared resource without proper synchronization. If the timing of these accesses is not controlled, it can lead to conflicts and inconsistencies in the shared resource, resulting in incorrect behavior of the system.
Another example is in concurrent programming, where multiple processes or threads are running simultaneously and accessing shared data. If these processes are not properly synchronized, they can interfere with each other’s operations and cause unexpected results.
Race conditions can also occur in distributed systems, where multiple nodes are communicating and sharing data. If the communication is not synchronized, it can lead to conflicts and inconsistencies in the shared data, resulting in incorrect behavior of the system.
In general, race conditions can occur in any situation where there is shared access to a resource or when multiple processes are interacting and their behavior is dependent on the timing of events. They can be difficult to identify and debug, making them a common source of errors and bugs in computer systems. Proper synchronization and careful design can help prevent race conditions and ensure the correct behavior of a system.
See the Dining philosophers problem.
A real world example: Ken and Barbie walk over to the barbecue. Ken grabs the last remaining sausage and attempts to acquire the hot sauce; meanwhile, Barbie has already secured the hot sauce and is waiting for the sausage to become available. A sausage cannot be enjoyed without hot sauce, so neither can proceed and remain stuck in a deadlock indefinitely.
A livelock occurs when measures are taken to avoid a deadlock, but the measures themselves do not resolve the conflict. For example, Ken and Barbie both agree to put down their respective items and try again; however, they both pick up the same item agains and the deadlock continues.
OpenAI: Both deadlocks and livelocks are types of concurrency issues that can occur in multi-threaded programs. They both involve situations where threads are unable to make progress, but there are some key differences between them.
Deadlocks occur when two or more threads are blocked and unable to continue because they are waiting for each other to release a resource that they both need. This creates a circular dependency and the threads will remain blocked indefinitely unless the issue is resolved externally. In other words, the threads are "deadlocked" with each other.
On the other hand, livelocks occur when two or more threads are constantly changing their states in response to each other, but are unable to make any real progress. This can happen when the threads are trying to be polite and avoid conflicts, but end up getting stuck in a loop of constantly trying to give way to each other. In a livelock, the threads are not blocked, but they are not making any progress either.
In summary, the key differences between deadlocks and livelocks are:
Cause: Deadlocks are caused by threads waiting for each other’s resources, while livelocks are caused by threads trying to be polite and avoid conflicts.
State: In a deadlock, threads are blocked and unable to make progress, while in a livelock, threads are constantly changing their states but not making any progress.
Resolution: Deadlocks can only be resolved externally by breaking the circular dependency, while livelocks can be resolved by implementing a different strategy for thread interaction.
Run your code on different numbers of CPUs; it’s interesting how the bottlenecks move around depending on the available resources.
OpenAI: 1. Use a lock ordering hierarchy: To avoid deadlocks, establish a lock ordering hierarchy where locks are acquired in a specific order. This will prevent two threads from trying to acquire the same set of locks in a different order and potentially causing a deadlock.
Limit lock acquisition time: Set a timeout for how long a thread can hold a lock. If the thread is unable to complete its task within the allotted time, it will release the lock and allow other threads to acquire it.
Use thread-safe data structures: Thread-safe data structures have built-in mechanisms to prevent multiple threads from accessing them at the same time. This can help avoid deadlocks caused by multiple threads trying to access the same data.
Avoid nested locks: Nested locks occur when a thread tries to acquire a lock while holding another lock. This can lead to deadlocks if the second lock is already held by another thread. To avoid this, try to limit the use of nested locks or use a lock-free algorithm.
Use a deadlock detection algorithm: Some programming languages and operating systems have built-in deadlock detection algorithms that can identify and resolve deadlocks. Make use of these tools to help prevent and detect deadlocks in your code.
Avoid long-running transactions: Long-running transactions that hold locks for extended periods of time can increase the chances of deadlocks occurring. Consider breaking up long transactions into smaller, more manageable chunks to reduce the likelihood of deadlocks.
Test and debug your code: It’s important to thoroughly test and debug your code to identify potential deadlocks. Use debugging tools and techniques to identify and resolve any potential issues before they become a problem in a production environment.
std::scoped_lock
OpenAI: Mutex and semaphore are both synchronization mechanisms used in multi-threaded programming to control access to shared resources. They both ensure that only one thread can access a shared resource at a time, preventing race conditions and data corruption.
Semaphore is used to control the number of threads that can access a shared resource at a given time. It maintains a counter and allows a fixed number of threads to access the resource simultaneously. Once the maximum limit is reached, other threads are put on hold until a thread releases the semaphore.
Semaphore, on the other hand, uses a counter to keep track of the number of threads accessing the resource. It uses wait and signal operations to acquire and release the semaphore. Wait operation decreases the counter and if it reaches 0, the thread is put on hold until the semaphore is signaled and the counter is increased.
Semaphore is used in scenarios where a limited number of threads should have access to a resource, such as a pool of resources or a producer-consumer problem.
Semaphore does not have the concept of ownership and any thread can release the semaphore, making it more prone to deadlocks.
Semaphore has a slightly higher overhead due to its counter and wait/signal operations. However, it is more suitable for scenarios where a fixed number of threads need to access a resource simultaneously.
In conclusion, mutex and semaphore serve different purposes and have different implementations. They are both important synchronization mechanisms that help ensure thread safety and prevent data corruption in multi-threaded environments.
See Mutex vs Semaphore.
This is getting into the weeds, but mutexes and the data they protect can happily be decalared consecutively in the code; so consider what this might mean when both mutex and data are in the same cache line.
tl;dr A thread is a branch of execution. A process can consist of multiple threads.
OpenAI: Threads and processes are both mechanisms used in computer operating systems to execute multiple tasks concurrently. They are both important for multitasking and improving the efficiency of a system, but they differ in their implementation and purpose.
Definition Threads are lightweight units of execution within a process. They share the same memory space and resources as their parent process, but each thread has its own stack. Processes, on the other hand, are independent instances of a program that are managed by the operating system. They have their own memory space and resources, and are isolated from other processes.
Creation Threads are typically created within a process by the operating system or by the parent process itself. This allows for multiple threads to share the same resources and communicate with each other easily. Processes, on the other hand, are created by the operating system when a program is launched. Each process has its own resources and is completely independent from other processes.
Resource sharing Threads can share resources such as memory, files, and I/O devices with other threads within the same process. This makes them more efficient and faster than processes, as there is no need for context switching or inter-process communication. Processes, however, do not share resources and must use inter-process communication mechanisms such as pipes or sockets to communicate with each other.
Context switching Context switching is the process of switching from one thread or process to another. In threads, context switching is faster and less expensive as it involves switching between different execution contexts within the same process. In processes, context switching is more expensive as it involves switching between different processes, which have their own memory space and resources.
Scheduling Threads are scheduled by the operating system scheduler, which allocates CPU time to each thread within a process. This allows for efficient use of CPU resources and better performance. Processes are scheduled by the operating system as well, but each process is treated as a separate entity and its scheduling is not affected by other processes.
Communication Threads within the same process can easily communicate with each other through shared memory and variables. They can also communicate by using synchronization mechanisms such as locks or semaphores. Processes, on the other hand, must use inter-process communication mechanisms to communicate with each other.
Parallelism Threads within a process can run in parallel on multiple CPU cores, making them efficient for parallel processing. Processes, however, cannot run in parallel as they are isolated from each other and cannot share resources.
In conclusion, threads and processes are both important for multitasking and improving the efficiency of a system, but they differ in their implementation and purpose. Threads are used for lightweight and efficient multitasking within a process, while processes are used for isolation and communication between different programs.
This is all a lot easier since std::thread
was
introduced in C++11. Now we have a platform dependent interface. See the
C++ concurrency support library
examples.
However, remember there is an overhead in creating a thread: if the operation you’re repeating is quick then could it actually take longer to parallelise it? You need to profile: how long does it take to just create a thread?
std::async
Think of it like pushing a calculation into the background.
std::async
executes a function asynchronously and returns a
std::future
that will eventually hold the result of that
function call. It’s quite a nice way to reference the result of a
calculation executed in another thread. Also, it manages the issue of
creating too many threads: they will just be executed sequentially.
Additionally, exceptions thrown in the asynchronous routine destroy the
future and the exception is rethrown by the calling get()
routine. You ought to pass it a strategy:
std::launch::async
or std::launch::deferred
as
a first parameter.
std::thread
You need to call join()
for every thread created with
std::thread
. This can be done by storing your threads as a
vector and joining each of them in a loop.
std::jthread
Like a regular thread but you don’t have to join it in the caller: it actually joins for you in the destructor. Quite nice to not have to remember to join, but joining all your threads can be a convenient synchronisation point.
Many of the Standard Library algorithms can take an execution policy, which is quite an exciting way to parallelise existing code. But remember it offers no protection of shared data: it’s still just a bunch of threads.
std::execution::seq
: execution may not be
parallelizedstd::execution::par
: execution may be parallelizedstd::execution::par_unseq
: execution may be
parallelized, vectorized, or migrated across threads (such as by a
parent-stealing scheduler)std::execution::unseq
: execution may be vectorized,
e.g., executed on a single thread using instructions that operate on
multiple data itemsSome of these have an _if
version that takes a
additional predicate: e.g., std::replace
and
std::replace_if
.
std::sort
std::copy
std::transform
std::accumulate
std::for_each
std::reduce
std::inclusive_scan
std::exclusive_scan
std::transform_reduce
std::remove
std::count
std::max_element
std::min_element
std::find
std::generate
std::mutex
A standard way to protect access to something, but there are multiple ways to unlock it.
Here be dragons! In the shape of deadlocks. There are several
strategies to improve your chances of not become stuck, see the deadlock
chapter for more. You can use std::scoped_lock
with
multiple resources or single, but also I think it expresses intention
better, by virtue of having “scope” in the name.
std::mutex
std::atomic
- an operation that is guaranteed to
operate as a single transactionstd::scoped_lock
std::lock_guard
std::thread
and std::join
std::jthread
(auto join)See the Standard Library concurrency support library.
#include <iostream>
#include <queue>
#include <thread>
#include <mutex>
#include <condition_variable>
using namespace std;
<int> buffer; // shared buffer
queue; // mutex for locking buffer
mutex mtx; // conditional variable for synchronization
condition_variable cv
const int BUFFER_SIZE = 10; // buffer size
const int MAX_VALUE = 100; // maximum value for producer to produce
const int NUM_PRODUCERS = 3; // number of producers
const int NUM_CONSUMERS = 2; // number of consumers
// function for producer to produce values and add to buffer
void producer(int id) {
while (true) {
// generate random value
int value = rand() % MAX_VALUE + 1;
// lock buffer
<mutex> lck(mtx);
unique_lock// wait if buffer is full
.wait(lck, [](){return buffer.size() < BUFFER_SIZE;});
cv// add value to buffer
.push(value);
buffer// print message
<< \"Producer \" << id << \" produced value \" << value << endl;
cout // notify waiting consumer
.notify_one();
cv// unlock buffer
.unlock();
lck// sleep for random time
::sleep_for(chrono::milliseconds(rand() % 1000 + 1000));
this_thread}
}
// function for consumer to consume values from buffer
void consumer(int id) {
while (true) {
// lock buffer
<mutex> lck(mtx);
unique_lock// wait if buffer is empty
.wait(lck, [](){return buffer.size() > 0;});
cv// get value from buffer
int value = buffer.front();
// remove value from buffer
.pop();
buffer// print message
<< \"Consumer \" << id << \" consumed value \" << value << endl;
cout // notify waiting producer
.notify_one();
cv// unlock buffer
.unlock();
lck// sleep for random time
::sleep_for(chrono::milliseconds(rand() % 1000 + 1000));
this_thread}
}
int main() {
// create producer threads
[NUM_PRODUCERS];
thread producersfor (int i = 0; i < NUM_PRODUCERS; i++) {
[i] = thread(producer, i);
producers}
// create consumer threads
[NUM_CONSUMERS];
thread consumersfor (int i = 0; i < NUM_CONSUMERS; i++) {
[i] = thread(consumer, i);
consumers}
// join producer threads
for (int i = 0; i < NUM_PRODUCERS; i++) {
[i].join();
producers}
// join consumer threads
for (int i = 0; i < NUM_CONSUMERS; i++) {
[i].join();
consumers}
return 0;
}
Branchless programming is an important approach in low-latency code. It may offer performance improvements but potentially at the cost of maintainability.
#include <iostream>
int main() {
int x = 10;
int y = 20;
// Using a branch statement
int max_val_branch = (x > y) ? x : y;
std::cout << "Max value using branch statement: " << max_val_branch << std::endl;
// Using branchless programming
int max_val_branchless = y ^ ((x ^ y) & -(x > y));
std::cout << "Max value using branchless programming: " << max_val_branchless << std::endl;
return 0;
}
In summary, branch prediction focuses on predicting the outcome of a specific branch instruction to enable speculative execution, while branch target prediction aims to predict the target address of a branch to reduce the delay in fetching and decoding subsequent instructions. Both techniques are utilized in modern processors to optimize instruction execution and improve overall performance.
OpenAI: Branch prediction is a technique used in computer processors to improve the performance of branch instructions. Branch instructions are instructions that cause the processor to change the sequence of instructions being executed, such as conditional statements or loops.
Branch prediction works by attempting to predict the outcome of a branch instruction before it is actually executed. This prediction is based on past patterns and behavior of the program’s code. If the prediction is correct, the processor can continue executing instructions without having to wait for the branch instruction to be completed. However, if the prediction is incorrect, the processor must undo any changes made and execute the correct instructions.
There are several types of branch prediction techniques, including static and dynamic prediction. Static prediction uses simple rules or heuristics to make predictions, while dynamic prediction uses more advanced algorithms and techniques to make predictions based on past behavior. Dynamic prediction can also adapt to changes in the program’s behavior, making it more accurate than static prediction.
One of the main benefits of branch prediction is that it can significantly reduce the number of pipeline stalls in a processor. Pipeline stalls occur when the processor has to wait for a branch instruction to be completed before it can continue executing other instructions. By predicting the outcome of branch instructions, branch prediction can reduce the number of stalls and improve the overall performance of the processor.
However, branch prediction can also have drawbacks. If the prediction is incorrect, the processor must undo any changes made and execute the correct instructions, which can result in a performance decrease. Additionally, predicting the outcome of some branch instructions can be difficult, especially in complex programs with unpredictable behavior.
Overall, branch prediction is an important technique used in modern processors to improve performance by reducing pipeline stalls. It is constantly being researched and improved upon to make predictions more accurate and efficient.
When the
[[likely]]
attribute is applied, it suggests to the compiler that the branch following the attribute is frequently executed. This allows the compiler to prioritize the optimization of code paths that are more likely to be taken, potentially resulting in better performance.
However, it’s important to note that the
[[likely]]
attribute is just a hint, and it does not guarantee any specific behavior or optimization. The behavior of[[likely]]
can vary depending on the compiler and its version. Some compilers may ignore the attribute altogether, while others may consider it as a hint for optimization.
Typically parallelism makes you think “threads within a process” but it’s worth considering different points in the processing that could be executed/calculated in parallel.
See wikipedia.
time/program = instructions/program * clockCycles/instruction * time/clockCycles
Amdahl’s law is a principle in computer architecture and performance engineering that states that the overall performance improvement of a system is limited by the speed of its non-parallelizable components. Developed by computer scientist Gene Amdahl in the 1960s, it is used to analyze the potential speedup of a system or program when adding additional resources, such as processors or cores.
The law states that the theoretical maximum speedup of a system is limited by the portion of the program that cannot be parallelized. This is also known as the "serial fraction" or "sequential fraction" of the program. In other words, if a program cannot be broken down into smaller, independent tasks that can be executed simultaneously, then the overall speed of the program will not be significantly improved by adding more processors or cores.
Mathematically, Amdahl’s law can be expressed as:
Speedup = 1 / [(1 - P) + (P / N)]
Where P is the proportion of the program that can be parallelized and N is the number of processors or cores.
For example, if 80% of a program can be parallelized and 4 processors are added, the theoretical maximum speedup would be:
Speedup = 1 / [(1 - 0.8) + (0.8 / 4)] = 1 / 0.6 = 1.66x
This means that even with 4 processors, the program can only be sped up by a maximum of 1.66 times. The remaining 20% of the program that cannot be parallelized will continue to limit the overall performance.
Amdahl’s law highlights the importance of efficient parallelization in computer systems. It also emphasizes the diminishing returns of adding more processors or cores as the non-parallelizable portion of a program becomes a larger factor. This law has been used to guide the design of multi-core processors and parallel computing systems to ensure that the benefits of parallelization are maximized.
It’s very tempting to optimise the parts your are familiar with, but it’s not obvious how much your efforts will benefit the overall process. Spend your time on the bottlenecks.
Efficient and Robust Implementation: The C++ Standard Library containers are implemented by experienced developers and have gone through extensive testing and optimization. This ensures a robust and efficient implementation of data structures, making them reliable and performant.
Portability: The C++ Standard Library is a cross-platform library, meaning that containers and algorithms written using it will work on any platform that supports C++. This makes them highly portable and useful for developing cross-platform applications.
Wide Range of Containers: The C++ Standard Library provides a wide range of containers, including vector, list, map, set, deque, etc. These containers are suitable for various data storage and retrieval needs, allowing developers to choose the most efficient one for their use case.
Uniform Interface: All containers in the C++ Standard Library have a uniform interface, making it easy to learn and use them. This uniformity also allows for easier interchangeability of containers in different parts of a program.
Memory Management: The C++ Standard Library containers provide memory management capabilities, such as automatic memory allocation and deallocation, making it easier for developers to manage memory and avoid memory leaks.
Standardized Algorithms: The C++ Standard Library also provides a set of algorithms that can be used with its containers, such as sorting, searching, and manipulating data. These algorithms are highly optimized and can be used with any container, making them highly versatile.
High-level Abstractions: The C++ Standard Library containers provide high-level abstractions, making it easier for developers to focus on the logic of their program rather than the low-level details of data structures.
Community Support: The C++ Standard Library is widely used and has a large community of developers who contribute to its development and provide support to other developers. This makes it easier for developers to find resources and solutions to their problems when using the C++ Standard Library containers.
vector
list
forward_list
deque
array
map
multimap
set
multiset
unordered_map
unordered_multimap
unordered_set
unordered_multiset
stack
queue
priority_queue
bitset
valarray
string
wstring
string_view
span
array_view
vector_bool
map_view
set_view
priority_queue_view
stack_view
queue_view
hash_map
hash_multimap
hash_set
hash_multiset
flat_map
flat_multimap
flat_set
flat_multiset
Sequence containers in C++ are data structures that store elements in a linear sequence or ordered manner. Examples of sequence containers include arrays, vectors, lists, and deques. These containers allow for efficient insertion and deletion of elements at the beginning, end, or middle of the sequence. The elements in a sequence container are accessed using an index or iterator, and the order of elements is maintained.
On the other hand, associative containers in C++ are data structures that store elements in a specific key-value pair. Examples of associative containers include maps, sets, and multi-maps. These containers allow for efficient retrieval of elements based on their associated keys. The elements in an associative container are sorted based on their key values, and the order of insertion is not preserved. These containers use a hashing or tree-based data structure to organize and access elements based on their keys.
In summary, the main difference between sequence and associative containers is the way they store and access elements. Sequence containers maintain the order of elements and allow for efficient insertion and deletion, while associative containers prioritize efficient retrieval based on keys but do not guarantee the order of elements.
Object-oriented programming (OOP) is a programming paradigm that focuses on the use of objects to represent data and functionality. It is based on the concept of objects, which have properties (attributes or data) and methods (procedures or functions) that can interact with each other.
In OOP, objects are created from classes, which act as blueprints for objects. Classes define the properties and methods that all objects of that type will have. This allows for code reusability and modularity, as objects can inherit properties and methods from their parent classes and can also be used as building blocks for more complex programs.
One of the key principles of OOP is encapsulation, which means that the properties and methods of an object are contained within the object and can only be accessed and modified through designated interfaces. This helps to keep the code organized and secure, as it prevents external code from directly accessing or changing the internal workings of an object.
Another important concept in OOP is inheritance, which allows for the creation of subclasses that inherit properties and methods from their parent classes. This allows for more specialized and specific objects to be created, while still maintaining the code reusability and structure of the parent class.
Polymorphism is also a key aspect of OOP, which refers to the ability of objects to take on different forms and behaviors depending on their context. This allows for flexibility and extensibility in code, as objects can be used in different ways without needing to be rewritten or modified.
Overall, OOP provides a powerful and efficient way to organize and manage code by breaking it down into smaller, more manageable objects with their own properties and behaviors. It is widely used in software development and is the foundation of many popular programming languages such as Java, C++, and Python.
In C++, the only difference between a class
and a
struct
is the default access level. In a
class
, members are private by default, while in a
struct
, members are public by default. Also, the default
inheritance access level is different: class
uses private
inheritance by default, while struct
uses public
inheritance by default.
Ad-hoc Polymorphism: This type of polymorphism allows a function or operator to behave differently based on the type of arguments it receives. It can be achieved through overloading or overriding, and it is often used in object-oriented programming.
Parametric Polymorphism: Also known as generic polymorphism, this type allows a function or data type to be written in a way that can handle multiple data types without specifying them explicitly. This is commonly used in functional programming languages.
Subtype Polymorphism: This type of polymorphism is based on inheritance, where a subclass can be used in place of its superclass. This allows objects of different types to be treated uniformly, as long as they share a common superclass.
Coercion Polymorphism: This type of polymorphism involves automatically converting one type of data into another type when needed. This is commonly used in dynamically typed languages.
Overriding Polymorphism: This occurs when a subclass overrides a method or property of its superclass, allowing it to modify or extend the behavior of the inherited method.
Compile-time Polymorphism: Also known as static polymorphism, this type is resolved at compile time and is achieved through function overloading or operator overloading.
Run-time Polymorphism: Also known as dynamic polymorphism, this type is resolved at run-time and is achieved through virtual functions and late binding. This allows objects of different types to respond differently to the same method call based on their actual type at runtime.
A standard question at interviews! A very nuanced feature of OOP in C++ but you can now test this at compile time with type traits.
Destructors must be declared virtual when the class contains virtual functions or is part of a polymorphic hierarchy. This ensures that the destructor of a derived class is called if it is deleted via a pointer to the polymorphic base class.
The virtual specifier specifies that a non-static member function is virtual and supports dynamic dispatch. It may only appear in the decl-specifier-seq of the initial declaration of a non-static member function (i.e., when it is declared in the class definition).
In derived classes, you should mark overridden methods as
override
to express intention and also protect from the
base class being changed unwittingly.
A non-virtual class has a size of 1 because in C++ classes can’t have zero size (objects declared consecutively can’t have the same address). A virtual class has a size of 8 on a 64-bit machine because there’s a hidden pointer inside it pointing to a vtable. vtables are static translation tables, created for each virtual-class.
RAII (Resource Acquisition Is Initialization) is a programming pattern used in C++ that ensures proper handling of resources by tying their acquisition to the initialization of an object. It is a form of automatic resource management where the lifetime of a resource is bound to the lifetime of an object.
In RAII, a resource is acquired and initialized in the constructor of an object, and automatically released and deinitialized in the destructor. This guarantees that the resource will be properly managed and released, even in the case of exceptions or unexpected program flow.
This pattern is commonly used for managing dynamically allocated memory, file handles, database connections, and other resources that require explicit acquisition and release. It simplifies resource management and helps avoid common bugs such as memory leaks or resource leaks.
RAII is a key part of C++ programming and is often used in conjunction with smart pointers and the concept of ownership, where the object responsible for acquiring and releasing the resource is the owner of that resource. This allows for a clear and intuitive ownership model, making it easier to reason about resource management in a program.
Overall, RAII promotes good programming practices by automating resource management and reducing the likelihood of errors and bugs related to resource handling.
Overload resolution is the process of determining which version of a function or operator should be called when there are multiple versions with the same name but different parameters. There are several strategies that can be used for overload resolution, including:
Exact Match: This is the simplest strategy, where the compiler looks for an exact match between the parameters in the function call and the parameters in the function definition. If there is an exact match, that function is called. If there is no exact match, an error is thrown.
Implicit Conversion: In this strategy, the compiler checks if the parameters in the function call can be implicitly converted to match the parameters in the function definition. This can include converting between different data types or using default values for optional parameters.
Best Fit: This strategy considers all possible conversions and chooses the version of the function that requires the least amount of conversions. This can help prevent unexpected behavior when there are multiple versions of a function that could potentially be called.
Named Arguments: Instead of relying on the order of arguments, this strategy allows for arguments to be specified by name, making it easier to call a specific version of a function that may have multiple optional parameters.
Variadic Functions: This strategy allows for a variable number of arguments to be passed to a function, making it more flexible and able to handle different input scenarios.
Default Arguments: This strategy allows for default values to be specified for one or more parameters in a function. If no value is provided for those parameters in the function call, the default values will be used.
Access Modifiers: Overload resolution can also be affected by the access modifiers of a function. For example, if there are two versions of a function with the same name and parameters, but one is declared as private and the other as public, the private version will not be accessible outside of the class.
Overall, the choice of overload resolution strategy depends on the language and its specific rules and conventions. Some languages may use a combination of these strategies to determine the best overload for a given function call. ___
You should be comfortable explaining the complexity of your code. See the Big O Cheatsheet.
Complexity | Description | Example |
---|---|---|
O(1) | Constant time complexity | Accessing an element in an array |
O(log n) | Logarithmic time complexity | Binary search |
O(n) | Linear time complexity | Finding the maximum element in an unsorted array |
O(n log n) | Linearithmic time complexity | Sorting algorithms like Merge Sort and Quick Sort |
O(n^2) | Quadratic time complexity | Nested loops |
O(n^3) | Cubic time complexity | Triple nested loops |
O(2^n) | Exponential time complexity | Recursive algorithms |
O(n!) | Factorial time complexity | Permutation algorithms |
The average complexity of linked lists and arrays can vary depending on the specific operation being performed. However, in general, linked lists tend to have a slightly higher average complexity compared to arrays.
The average complexity of an array for basic operations such as accessing an element by index, inserting or deleting an element at the end, and searching for an element is O(1). This means that these operations can be done in constant time, regardless of the size of the array.
On the other hand, the average complexity of a linked list for the same operations is O(n), where n is the number of elements in the list. This is because in a linked list, accessing an element by index, inserting or deleting an element at the end, and searching for an element all require traversing through the list, which takes linear time.
However, the average complexity of a linked list can be better than an array for certain operations. For example, inserting or deleting an element at the beginning of a linked list has an average complexity of O(1), while the same operation in an array has an average complexity of O(n) because all the elements after the insertion or deletion point need to be shifted.
In summary, the average complexity of linked lists and arrays can vary depending on the specific operation, but in general, arrays have a lower average complexity for basic operations, while linked lists can have a better average complexity for certain operations.
See OSI on Wikipedia.
Layer | Name | Function | Protocols | Examples |
---|---|---|---|---|
7 | Application | Provides network services to applications | HTTP, SMTP, FTP | Web browsers, email clients, FTP clients |
6 | Presentation | Formats and translates data for the application layer | SSL, JPEG, MIDI | Encryption, compression, file format conversion |
5 | Session | Establishes, manages, and terminates connections between applications | NetBIOS, PPTP, NFS | Remote desktop, online gaming, file sharing |
4 | Transport | Provides reliable data transfer between hosts | TCP, UDP | File transfer, web browsing, email |
3 | Network | Routes data between networks using logical addresses | IP, ICMP | Internet communication, routing |
2 | Data Link | Transmits data between adjacent network nodes | Ethernet, Wi-Fi | Local area network communication |
1 | Physical | Transmits raw data between devices | Ethernet cables, fiber optics | Electrical signals, light waves |
The TCP three-way handshake is the process of establishing a connection between two devices over a TCP/IP network. It is used to ensure reliable and error-free communication between the two devices.
The three steps of the TCP handshake are:
Step 1: SYN (Synchronize) The client sends a SYN packet to the server, indicating that it wants to establish a connection. This packet contains a random sequence number (SYN) that is used to initiate the connection.
Step 2: SYN-ACK (Synchronize-Acknowledge) Upon receiving the SYN packet, the server responds with a SYN-ACK packet. This packet contains the same sequence number (SYN) as well as another random sequence number (ACK) to acknowledge the request.
Step 3: ACK (Acknowledgement) Finally, the client sends an ACK packet to the server, which completes the three-way handshake. This packet contains the ACK sequence number, confirming that it has received the SYN-ACK packet.
After the three-way handshake is completed, the TCP connection is established, and both devices can begin sending data packets to each other. This process is repeated every time a new TCP connection is established between two devices. It also allows for the synchronization of sequence numbers between the two devices, ensuring that data is transmitted in the correct order and without errors.
SYN stands for “synchronise”.
=> SYN
<= SYN-ACK
=> ACK
=> HTTP (request)
<= ACK
<= HTTP (response)
=> ACK
=> FIN
<= FIN-ACK
=> ACK
It’s important to know the common data structures and their characteristics.
{ “id”: “cmpl-B0z0eZwEE5HqSnx42lVmkuoKMtFSU”, “object”: “completion”, “created”: 1739574220, “model”: “gpt-3.5-turbo-instruct:20230824-v2”, “choices”: [ { “index”: 0, “text”: ” and algorithms
std::vector
std::vector
is the go-to container, so let’s give it
some special attention. It has contiguous storage – therefore cache
friendly – and uses the RAII
paradigm: the data are created on the heap and the allocation and
deallocation (new/delete) are handled for you. Interestingly,
std::string
exhibits many of the same characteristics, but
it’s not quite containery enough to qualify.
Estimate how many times the fax
destructor is called
below.
#include <iostream>
#include <vector>
auto x = 0uz;
int main() {
struct fax {
// Constructor and destructor
() { std::cout << x << " ctor\n"; }
fax~fax() { std::cout << "\t" << x << " dtor\n"; }
// Copy constructors
(const fax&) { std::cout << x << " copy ctor\n"; };
fax(fax&&) { std::cout << x << " move ctor\n"; };
fax
// Assignment constructors
& operator=(const fax&) {
faxstd::cout << x << " copy assignment ctor\n";
return *this;
}
& operator=(fax&&) {
faxstd::cout << x << " move assignment ctor\n";
return *this;
};
const size_t id = x++;
};
std::vector<fax> y;
// Reduce copies by allocating up front
// y.reserve(3);
for (size_t i = 0; i < 3; ++i) {
.push_back(fax{});
ystd::cout << "-- capacity " << y.capacity() << " --\n";
}
// fax f1 = fax{};
// fax f2 = std::move(fax{});
}
}
See the program output (below), note how the capacity is growing exponentially (doubling each time).
1 ctor
2 move ctor
2 dtor
-- capacity 1 --
3 ctor
4 move ctor
5 copy ctor
5 dtor
5 dtor
-- capacity 2 --
6 ctor
7 move ctor
8 copy ctor
9 copy ctor
9 dtor
9 dtor
9 dtor
-- capacity 4 --
9 dtor
9 dtor
9 dtor
For each push we call the default constructor to create a temporary object and then call the copy constructor to populate the new element with its data… so far so good. But crucially, when the vector is resized we must also copy all of the existing elements into the new container. Not an issue for this simple case, but if the class is expensive to copy there could be trouble ahead; also, if there’s a bug in the copy constructor, we may be corrupting existing valid data simply by copying it around.
Binary search trees have an amortized complexity of O(log n) unless
the tree is unbalanced. Worst case, if the tree is unbalanced, then it
is O(n). std::set
and std::map
are implemented
using a red-black tree, a type of balanced binary search tree. C++23
introduces std::flat_map
which is implemented using
contiguous storage to make it more cache-friendly.
A binary search tree is a data structure that is used to store and organize data in a hierarchical manner. It consists of nodes, each of which has a value and two child nodes - a left child node and a right child node. The root node is the topmost node in the tree.
The binary search tree follows a specific rule for storing the data - all values in the left subtree of a node must be less than the value of the node, and all values in the right subtree must be greater than the value of the node. This allows for efficient searching and sorting of data in the tree.
Binary search trees have several important properties, including:
Some common operations that can be performed on binary search trees include:
Binary search trees have an average time complexity of O(log n) for search, insert, and delete operations, making them efficient for storing and retrieving data. However, if the tree is not balanced (i.e. there is a significant difference in the number of nodes on the left and right subtrees), the time complexity can increase to O(n), making operations less efficient.
A balanced tree is one of height O(log n), where n is the number of nodes in the tree. It is a sorted hierarchy of data where the left most node is the smallest, and the right most node is the largest.
Below each node of a binary search tree is a mini tree. The top of the tree is the middle element.
e| /e
d| d
c| / \c
b| b
a| / \ /a
9|----- / 9
8| / \8
7| 7
6| \ /6
5|----- \ 5
4| \ / \4
3| 3
2| \ /2
1| 1
0| \0
Hash tables have an amortized complexity of O(1) unless there are collisions. Worst case, if everything is in the same bin, then it is O(n). Additionally, if the proportion of slots – known as “load factor” or “fill ratio” – exceeds a threshold, then the hash table must be resized/rebuilt.
std::unordered_set
and std::unordered_map
are implemented using hash tables.
A heap is a useful data structure when it is necessary to repeatedly remove the object with the highest (or lowest) priority.
Support of random access iterators is required to keep a heap structure internally at all times. A min heap is also a sorted list.
123456789abcedef
1
23
4567
89abcdef
Project down, the top of the tree is a smallest element.
1
2 3
4 5 6 7
8 9 a b c d e f
1
/ \
\
/ \
/
2 3/ \ / \
4 5 6 7/ \ / \ / \ / \
8 9 a b c d e f
A heap is a data structure that is based on a complete binary tree, where the parent node is either larger or smaller than its child nodes. It is commonly used to implement priority queues, where the highest priority element is always at the root. A heap does not necessarily have a specific ordering of its elements, as long as the parent-child relationship is maintained.
On the other hand, a binary search tree (BST) is a binary tree where the left child of a node is always smaller and the right child is always larger than the parent node. This ordering property makes it efficient for searching, inserting, and deleting elements. BSTs are commonly used to implement binary search algorithms.
In summary, the main differences between a heap and a binary search tree are:
Ordering: A heap does not have a specific ordering of its elements, while a binary search tree follows a specific ordering.
Structure: A heap is a complete binary tree, while a binary search tree can have different structures depending on the order of insertion.
Usage: A heap is commonly used to implement priority queues, while a binary search tree is commonly used for searching, inserting, and deleting elements.
Performance: A heap has faster insertion and deletion times compared to a binary search tree, but a binary search tree has faster search times.
Adding/removing at the beginning is O(1), but adding at the end means search the whole list, therefore O(n). Searching is also O(n).
std::forward_list
is a singly linked list.
Like a singly linked list but we can iterate both ways. It stores a pointer to both the beginning and end, therefore operations on the end are also O(1).
std::list
is a doubly linked list.
{ “id”: “cmpl-B0z0tSjt0pdEKoJBmp3jqbTkPko8I”, “object”: “completion”, “created”: 1739574235, “model”: “gpt-3.5-turbo-instruct:20230824-v2”, “choices”: [ { “index”: 0, “text”: ”
Container | Average Case Time Complexity | Worst Case Time Complexity | Space Complexity |
---|---|---|---|
Vector | O(1) | O(n) | O(n) |
Deque | O(1) | O(n) | O(n) |
List | O(1) | O(n) | O(n) |
Forward List | O(1) | O(n) | O(n) |
Array | O(1) | O(1) | O(n) |
Set | O(log n) | O(n) | O(n) |
Unordered Set | O(1) | O(n) | O(n) |
Map | O(log n) | O(n) | O(n) |
Unordered Map | O(1) | O(n) | O(n) |
Stack | O(1) | O(n) | O(n) |
Queue | O(1) | O(n) | O(n) |
Priority Queue | O(log n) | O(n) | O(n) |
"finish_reason": "stop"
}
], “usage”: { “prompt_tokens”: 12, “completion_tokens”: 275, “total_tokens”: 287, “prompt_tokens_details”: { “cached_tokens”: 0, “audio_tokens”: 0 }, “completion_tokens_details”: { “reasoning_tokens”: 0, “audio_tokens”: 0, “accepted_prediction_tokens”: 0, “rejected_prediction_tokens”: 0 } } }
However, the conventional CS wisdom for when to use a linked list over contiguous storage hasn’t applied for years: you have to measure. E.g., if a container fits in the cache, a vector might outperform everything else.
OpenAI: 1. Increased Memory Capacity: Virtual memory allows computers to use more memory than physically available. This is because it uses a portion of the hard drive as virtual memory, which acts as an extension of the physical memory. This allows computers to run larger programs and processes that would otherwise be limited by the amount of physical memory available.
Improved Performance: Virtual memory can improve overall system performance by allowing multiple programs to run at the same time without causing the system to crash or slow down. This is because virtual memory allows the computer to allocate memory resources efficiently and prioritize which programs get access to physical memory.
Cost Savings: Virtual memory reduces the need for expensive physical memory upgrades. Instead of purchasing additional physical memory, virtual memory uses the existing hard drive space to act as virtual memory. This can save money for both consumers and businesses.
Efficient Memory Management: Virtual memory allows the computer to manage memory more efficiently by using a technique called paging. This involves transferring data between physical memory and the hard drive, allowing the computer to free up physical memory for other processes when needed.
Flexibility: Virtual memory allows for more flexibility in managing memory resources. It can allocate memory to different programs as needed, and adjust the amount of memory assigned to each program based on its usage. This ensures that the system runs smoothly and efficiently without any program causing it to crash due to lack of memory.
Multi-Tasking: Virtual memory enables a computer to run multiple programs at the same time, even if each program requires more memory than is physically available. This allows for efficient multi-tasking, where users can switch between different programs seamlessly without any noticeable slowdowns.
Fault Tolerance: Virtual memory can also improve system reliability and stability. If a program crashes or encounters an error, virtual memory can prevent the entire system from crashing by isolating the faulty program and allowing other programs to continue running.
Compatibility: Virtual memory is compatible with most operating systems and hardware configurations. This makes it a versatile solution for managing memory resources on various types of computers, from personal laptops to large servers.
Better User Experience: With virtual memory, users can run more complex and resource-intensive programs without experiencing slowdowns or crashes. This improves the overall user experience and allows for a smoother and more efficient workflow.
Facilitates Advanced Features: Virtual memory is a prerequisite for many advanced features such as memory-mapped files, which allow data to be shared between processes, and hibernation, which saves the system state to the hard drive for fast startup. These features would not be possible without virtual memory. ___
Argument-dependent lookup, also known as ADL, or Koenig lookup, is the set of rules for looking up the unqualified function names in function-call expressions, including implicit function calls to overloaded operators. These function names are looked up in the namespaces of their arguments in addition to the scopes and namespaces considered by the usual unqualified name lookup.
Notice begin and end parameters don’t specify a namespace, see ADL.
#include <iostream>
#include <vector>
#include <iterator>
int main() {
const std::vector v{1, 2, 3, 4, 5};
std::copy(cbegin(v), cend(v), std::ostream_iterator<int>(std::cout, "n"));
}
This a large, complicated topic. See C++ Move Semantics - The Complete Guide by Nicolai M. Josuttis.
std::move
&&
modifier indicates parameter is an object
that we intend to move from instead of copyingnoexcept
(they are anyway)noexcept
As a Linux developer you need a strong command line game.
bash
is ubiquitous, and a powerful language in itself,
without resorting to something like Python.
git
is awesome on the command line and pretty much
essential. Use it to manage your chores. See how
to undo almost anything with git.
(echo hello; sleep 1) | telnet 127.0.0.1 80
echo hello > /dev/tcp/127.0.0.1/80
echo hello | nc localhost 80
# server
nc -knvlp 3389
# client
bash -i >& /dev/tcp/server_ip/3389 0>&1
git add !(unit.md)
shuf -n 1 readme.txt
From bash 5.
echo $EPOCHREALTIME
1614870873.574544
echo $EPOCHSECONDS
1614870876
The three load average values are 1, 5 and 15 minutes.
uptime
15:29:28 up 20:23, 0 users, load average: 5.08, 1.49, 0.51
Stress your system in different ways.
stress --cpu 8
echo $(nproc)
localhost
127.0.0.1
127.0.0.2
127.0.0.3
127.1
0.0.0.0
0 # wut
mv {,_}.bash_history
watch -d ip a
pushd
equivalentI use this all the time. Rather than pushd
and
popd
to navigate around the file system:
# Navigate to new dir
cd ~/deanturpin.gitlab.io/content/post
# Return whence you came
cd -
“Generic programming is why we can’t have nice things” – Andrei Alexandrescu
constexpr
– find undefined behaviour at compile
timeTemplate metaprogramming is a programming technique that uses templates in a programming language to perform computations at compile time. This technique leverages the power of templates to generate code based on types and values known at compile time.
In template metaprogramming, templates are used to define generic algorithms or data structures that can work with different data types. These templates are instantiated at compile time with specific data types, and the compiler generates specialized code for each instantiation. This allows for efficient code generation and optimization, as well as increased flexibility and reusability.
One of the key features of template metaprogramming is the ability to perform computations and make decisions at compile time. This is achieved through the use of template parameters, which can be types, constants, or other templates. These parameters can be used in conditional statements, loops, and other control structures, enabling the creation of complex compile-time computations.
Template metaprogramming is often used in C++ and other languages that support templates, such as D and Rust. It is commonly used for tasks such as type introspection, static polymorphism, and code generation. It is also used extensively in libraries and frameworks, as it allows for the creation of highly customizable and efficient code.
However, template metaprogramming can be complex and difficult to understand, and it requires a solid understanding of templates and the underlying language. It also has limitations, as not all computations can be performed at compile time, and the resulting code can be difficult to debug. Despite these challenges, template metaprogramming remains an important and powerful technique in modern programming.
Templates in the C++ Standard Library allow for generic programming, where code can be written once and used with different data types without having to rewrite the code for each type. They are used extensively in the C++ Standard Library to provide a wide range of data structures and algorithms that can be used with various types of data.
Some common examples of templates in the C++ Standard Library include:
Containers: The standard library provides a variety of container classes such as vectors, lists, and maps, which can hold different types of data. These containers are implemented using templates, allowing them to store any type of data.
Algorithms: The standard library also includes a large number of algorithms such as sorting, searching, and manipulating data. These algorithms are also implemented using templates, allowing them to work with different data types.
Iterators: Iterators are used to traverse data structures such as containers and provide a consistent way to access elements. The standard library provides a variety of iterator types, such as input, output, and bidirectional iterators, which are all implemented using templates.
Function objects: Function objects are objects that behave like functions and are commonly used in algorithms to perform operations on elements. The standard library provides a variety of function objects, such as predicates and comparators, which are implemented using templates.
Smart pointers: Smart pointers are used to manage memory and provide a safer alternative to raw pointers. The standard library provides smart pointers, such as unique_ptr and shared_ptr, which are implemented using templates to work with different types of data.
Templates allow for code reuse and make the standard library more flexible and efficient. They also make it easier for developers to create their own custom data structures and algorithms that can work with different data types. Overall, templates are an essential component of the C++ Standard Library and are used extensively to provide a powerful and versatile set of tools for developers.
SFINAE, or "Substitution Failure Is Not An Error," is a feature of C++ template metaprogramming that allows the compiler to discard template instantiations that would otherwise result in an error. This allows for more flexible and robust code, as well as better error messages.
SFINAE works by using the compiler’s ability to deduce template arguments and call functions with different argument types. When the compiler encounters a template with a particular set of arguments, it attempts to substitute those arguments into the template and compile it. If the substitution fails due to a compiler error, SFINAE kicks in and causes the compiler to discard that template and continue searching for another one that can be instantiated successfully.
The key to SFINAE is that the compiler treats failure of template substitution as a non-fatal error, rather than halting compilation. This allows for templates to be used in a more flexible manner, as the compiler can discard templates that would result in errors and continue searching for a valid one.
SFINAE is particularly useful in situations where different code paths need to be taken based on the properties of types used as template arguments. For example, it can be used to enable different functions for different types, or to specialize templates for certain types.
Overall, SFINAE is an important feature of C++ template metaprogramming that allows for more powerful and flexible code, while also improving error handling and debugging.
The objective of design patterns is to provide reusable and proven solutions to common software design problems. They help developers to create flexible, maintainable, and scalable software by providing a set of standard, well-documented and tested solutions. These patterns encapsulate the knowledge and experience of expert developers, making it easier for others to understand and apply them in their own projects. By using design patterns, developers can avoid reinventing the wheel and save time and effort in developing high-quality software. They also promote consistency and standardization in software development, making it easier for teams to collaborate and communicate effectively. Ultimately, the goal of design patterns is to improve the overall quality and efficiency of software development.
{ “id”: “cmpl-B0z1CJ7moaVFfbtPm8MFb0Kzes1NL”, “object”: “completion”, “created”: 1739574254, “model”: “gpt-3.5-turbo-instruct:20230824-v2”, “choices”: [ { “index”: 0, “text”: “?
Singleton Pattern: Ensures that only one instance of a class exists and provides a global point of access to it.
Factory Pattern: Provides a way to create objects without specifying the exact class of object that will be created.
Observer Pattern: Defines a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically.
Decorator Pattern: Allows behavior to be added to an individual object, dynamically, without affecting the behavior of other objects from the same class.
Adapter Pattern: Allows the interface of an existing class to be used as another interface. It is often used to make existing classes work with others without modifying their source code.
Strategy Pattern: Defines a family of algorithms, encapsulates each one, and makes them interchangeable. It allows the algorithm to be selected at runtime.
Iterator Pattern: Provides a way to access the elements of an aggregate object sequentially without exposing its underlying representation.
Template Method Pattern: Defines the skeleton of an algorithm in a method, deferring some steps to subclasses. It allows subclasses to redefine certain steps of an algorithm without changing the algorithm’s structure.
State Pattern: Allows an object to change its behavior when its internal state changes. It encapsulates the behavior associated with a particular state in an object.
Prototype Pattern: Creates new objects by cloning existing ones, instead of creating new objects from scratch. It is useful when creating objects is time-consuming or expensive.
Composite Pattern: Allows objects to be composed into tree structures to represent part-whole hierarchies. It allows clients to treat individual objects and compositions of objects uniformly.
Bridge Pattern: Decouples an abstraction from its implementation so that the two can vary independently. It is useful when there are multiple implementations of an abstraction.
Facade Pattern: Provides a unified interface to a set of interfaces in a subsystem. It simplifies and decouples the client from the complex subsystem.
Command Pattern: Encapsulates a request as an object, allowing it to be treated as a first-class object. It allows requests to be queued, logged, and undone.
Proxy Pattern: Provides a surrogate or placeholder for another object to control access to it. It is useful when creating expensive objects or when accessing remote objects.”, “finish_reason”: “stop” } ], “usage”: { “prompt_tokens”: 6, “completion_tokens”: 474, “total_tokens”: 480, “prompt_tokens_details”: { “cached_tokens”: 0, “audio_tokens”: 0 }, “completion_tokens_details”: { “reasoning_tokens”: 0, “audio_tokens”: 0, “accepted_prediction_tokens”: 0, “rejected_prediction_tokens”: 0 } } }
The factory design pattern is a design pattern that falls under the creational category of design patterns. It is used to create objects without specifying the exact class of object that will be created. This pattern provides a way to create objects based on a common interface, allowing for flexibility and scalability in code.
The main idea behind the factory design pattern is to define an interface for creating objects, but let subclasses decide which class to instantiate. This allows for the creation of objects without having to specify the exact class, making the code more adaptable to change.
The basic structure of the factory design pattern consists of a factory class, which is responsible for creating objects, and a set of concrete classes that implement the common interface. The client code interacts with the factory class to create the desired objects, rather than directly creating them.
There are different types of factory design patterns, including simple factory, factory method, and abstract factory. The simple factory pattern uses a single factory class to create all types of objects, while the factory method pattern uses a separate factory method for each type of object. The abstract factory pattern uses a set of related factory classes to create families of objects.
One of the main advantages of using the factory design pattern is that it promotes loose coupling between objects, as the client code is not directly dependent on the concrete classes. This makes it easier to add new types of objects without having to modify the client code.
Another advantage is that it centralizes object creation, making it easier to manage and maintain. This can be especially useful in large codebases where there may be multiple instances of object creation.
In summary, the factory design pattern is a useful tool for creating objects in a flexible and scalable manner. It promotes loose coupling and centralizes object creation, making it a valuable pattern in software design.
Store a system configuration in an external text file and construct at runtime using the factory pattern.
The visitor design pattern is a behavioral design pattern that allows for the separation of an algorithm from an object structure on which it operates. It allows for the addition of new operations to an object structure without modifying the objects themselves.
In this pattern, there are three main components: the visitor interface, the concrete visitor, and the element interface. The visitor interface defines the visit methods for each element in the object structure. The concrete visitor implements these visit methods and performs the desired operations on the elements. The element interface defines an accept method that allows the visitor to visit the element and perform its operations.
The visitor design pattern works by having the object structure accept a visitor, which then traverses through the structure and performs the specific operations on each element. This allows for the addition of new operations by simply creating a new concrete visitor and adding it to the object structure.
One of the key benefits of the visitor design pattern is that it allows for the addition of new operations without modifying the existing code. It also helps to keep the code clean and organized by separating the algorithm from the elements.
However, one potential downside of this pattern is that it can make the code more complex and difficult to understand, especially for beginners. It also requires a good understanding of the object structure and the operations that need to be performed on it.
Overall, the visitor design pattern is useful in situations where there are multiple operations that need to be performed on a complex object structure, and when the structure needs to be easily extensible without modifying existing code.
The visitor pattern exploits a feature of subtype polymorphism named “double dispatch.”
Double dispatch is a design pattern that allows for subtype polymorphism in object-oriented programming. It allows a method to be dynamically dispatched based on the runtime type of two objects, rather than just one. This means that the same method can behave differently depending on the types of both the caller and the callee objects.
In traditional single dispatch, the method is dispatched based on the runtime type of the caller object only. This means that the same method will always behave in the same way, regardless of the type of the callee object. However, in double dispatch, the method is dispatched based on both the runtime types of the caller and callee objects, allowing for more specific and dynamic behavior.
This is achieved through the use of virtual methods, which are methods that can be overridden by subclasses. When a virtual method is called on an object, the runtime type of that object is used to determine which implementation of the method to use. With double dispatch, this determination is extended to also consider the runtime type of the method’s parameters.
For example, let’s say we have a Shape interface and two subclasses, Circle and Rectangle. Both subclasses have a draw() method that takes in a Color object as a parameter. In single dispatch, the draw() method of the Circle class would always be called when draw() is invoked on a Circle object. However, with double dispatch, the draw() method of the Circle class will be called if the callee object is a Circle and the parameter is a Color, but the draw() method of the Rectangle class will be called if the callee object is a Rectangle and the parameter is a Color.
Overall, double dispatch allows for more flexibility and dynamic behavior in polymorphism by considering the types of both the caller and callee objects.
Create a class hierarchy of shapes and define methods that operate on those shapes in a visitor class.
Hardware interface classes will typically be a const
register map which is a flyweight pattern. This is also an example of a
singleton pattern as it’s only initialised once regardless of how many
instances are created. Additionally if registers are actually wired to
discrete lines they might be read-only, so the memento pattern is used
to store the internal state.
“In computer science, reflection is the ability of a process to examine, introspect, and modify its own structure and behavior.”
See Wikipedia.
Dependency inversion is a principle in software design that states that high-level modules should not depend on low-level modules, but both should depend on abstractions. This means that the dependencies between components should be inverted, with the higher-level components depending on interfaces or abstract classes that define behavior, rather than directly depending on lower-level components that implement that behavior. This allows for greater flexibility, as components can be easily substituted or changed without affecting the overall structure or functionality of the system. It also promotes modular and decoupled code, making it easier to maintain, test, and extend. Dependency inversion is a key principle in the SOLID design principles for object-oriented programming.
What are the different types of data structures and when would you use each one?
Explain the difference between a stack and a queue.
How do you ensure that your code is efficient and scalable?
Can you describe the process of object-oriented programming and provide an example?
How do you approach debugging and troubleshooting when encountering errors in your code?
What is the difference between a pointer and a reference in C++?
What is the difference between pass by value and pass by reference?
What is the difference between a class and an object in C++?
How does dynamic memory allocation work in C++?
What is the difference between virtual and pure virtual functions in C++?