A Computer Science survival guide for C++/Linux developers
Fri Mar 28 23:04:55 UTC 2025
Let’s start with a haiku written by a computer.
Code runs swiftly,
Syntax rules must be followed,
C++ is complex.
This reference is a distillation of 15+ years of online logbook notes into only the essentials that have continued to remain relevant as a senior software developer today. Just for fun – and where the topic is readily available or established – I have reached out to OpenAI to provide a paragraph or two. Consequently, the exact content and the chapter order will vary each night. Hopefully this will keep repeat visits interesting and also prevents me focusing all my attention on the first few chapters.
If time is tight, try the random daily chapter. And you can also raise a ticket on this repo.
Construction of a building is often used to describe the software design process. But the analogy breaks down the building designer only has one attempt to compile the building. You can’t swap out the foundation halfway through.
Actually, in Royce’s later revisions of the model, he stated that the whole process should be done “twice if possible”.
Validation is concerned with checking that the system will meet the customer’s actual needs. Verification will help to determine whether the software is of high quality, but it will not ensure that the system is useful.
Verification is objective (quality), validation is subjective (customer needs).
Validation is the process of checking whether the specification captures the customer’s needs, while verification is the process of checking that the software meets the specification.
See difference between verification and validation.
Stop doing the things that nobody wants
What to call your test is easy – it’s a sentence describing the next behaviour in which you are interested. How much to test becomes moot – you can only describe so much behaviour in a single sentence. When a test fails, simply work through the process described above – either you introduced a bug, the behaviour moved, or the test is no longer relevant.
Test-driven development (TDD) is a software development process that relies on the repetition of a short development cycle: requirements are turned into specific test cases, then the software is improved to pass the new tests, only. This is opposed to software development that allows software to be added that is not proven to meet requirements.
As a [X] I want [Y] in order to [Z]
For simple readonly types, pass by const value.
void func(const int p) {
}
For large readonly types – i.e., larger than a pointer – pass by const reference. You should also consider if the type is trivially copyable: requiring a shallow or deep copy.
void func(const big_type& p) {
}
If the first thing you do is make a copy, pass by non-const value.
void func(int p) {
}
If you want to modify the caller’s copy, pass by non-const reference.
void func(int& p) {
}
The only difference between a class and a struct is the default access level: public, protected, private.
But typically a struct is used for “plain old data” or memory-mapped data where you don’t want the members to align in a predictable way (virtuals/inheritance adds the complication of a vtable pointer.)
Either way, you can test your assertions at compile time with
static_assert
and the type
traits header.
struct A {
// public: <-- default for structs
int x_;
};
If the data are constrained by an invariant – i.e., not all values are valid – use a class.
class B {
private: // <-- default for classes
int x_; // must be between 0-1000
};
“Generic programming is why we can’t have nice things” – Andrei Alexandrescu
constexpr
– find undefined behaviour at compile
timeTemplate metaprogramming is a technique used in C++ to perform compile-time computation and code generation. It involves creating generic templates that can be instantiated with different types and values to produce code specific to those types and values. This allows for the creation of highly optimized and flexible code, as well as the ability to perform complex computations at compile time.
The process of template metaprogramming involves using templates to perform calculations and generate code at compile time, rather than at runtime. This is achieved through the use of template parameters, which can be types or values. These parameters are used to define the behavior and structure of the template, allowing for different behavior and code to be generated based on the values passed in.
One of the key features of template metaprogramming is the ability to perform compile-time computation. This means that the code generated by the templates is evaluated and executed by the compiler, rather than at runtime by the program. This can result in significant performance improvements, as the code is already generated and optimized before the program is even run.
Template metaprogramming is often used to implement data structures and algorithms, as well as to enable advanced features such as type traits, which allow for the inspection and manipulation of types at compile time. It also enables the creation of generic code that can be used with different types, reducing the need for code duplication.
However, template metaprogramming can be complex and difficult to understand, as it involves using C++ template syntax in non-traditional ways. It also relies heavily on the compiler’s ability to optimize the generated code, which can vary between different compilers.
Despite these challenges, template metaprogramming is a powerful tool that allows for the creation of highly efficient and flexible code. It is widely used in libraries and frameworks, and its popularity continues to grow as developers seek to improve the performance and capabilities of their code.
Templates in the C++ Standard Library are used to create generic data types and functions that can work with multiple data types. This allows for code reuse and efficient algorithm design.
One of the main uses of templates in the C++ Standard Library is to create container classes, such as vectors, lists, and maps. These containers can hold objects of any data type, making them flexible and versatile. The implementation of these containers is based on templates, which allows them to work with any data type without the need for separate implementations for each data type.
Templates are also used to create algorithms, such as sorting, searching, and manipulating data in containers. These algorithms are designed to work with any data type, again increasing code reuse and flexibility.
The C++ Standard Library also includes function templates, which
allow for the creation of generic functions that can work with different
data types. For example, the "find" function in the
Templates are also used in the creation of smart pointers, such as unique_ptr and shared_ptr, which provide memory management for dynamically allocated objects. These smart pointers use templates to work with different data types, making them versatile and efficient.
In summary, templates in the C++ Standard Library allow for the creation of generic data types and functions that can work with any data type, promoting code reuse and efficient algorithm design.
SFINAE (Substitution Failure Is Not An Error) is a principle in C++ that allows a template to be instantiated with a certain type only if the operations and expressions used in the template are valid for that type. If the substitution of a type fails due to an invalid operation or expression, instead of throwing an error, the compiler simply removes that template from the overload resolution process and continues with the next one. This allows for more flexibility in template programming and allows the compiler to choose the most appropriate template for a given type. SFINAE is often used in conjunction with template metaprogramming to create more powerful and flexible templates.
See Linux Debuginfo Formats - DWARF, ELF, dwo, dwp - What are They All? - Greg Law - CppCon 2022.
Section | Description |
---|---|
stack / | Local variables |
heap /\ | Dynamic memory |
.rodata |
Read-only data |
.bss |
Uninitialised data |
.data |
Initialised data |
.text |
Code |
Header: This segment contains information about the executable, such as its entry point, architecture, and section layout.
Text or Code: This segment contains the actual executable code, which is typically machine code instructions that will be executed by the processor.
Data: This segment contains initialized global and static variables used by the program. This can include both data and read-only data.
BSS (Block Started by Symbol): This segment contains uninitialized global and static variables. Unlike the Data segment, this segment does not take up space in the executable file but is allocated when the program is loaded into memory.
Stack: This segment is used for local variables, function parameters, and return addresses. It grows and shrinks dynamically as the program executes.
Heap: This segment is used for dynamically allocated memory, such as through functions like malloc() or new(). It also grows and shrinks as needed during program execution.
Import Table: This segment contains references to external functions and libraries that the program needs to execute.
Export Table: This segment contains information about functions and symbols that can be accessed by other programs.
Resources: This segment contains data such as images, icons, and other resources that are used by the program.
Relocation Table: This segment contains information about memory addresses that need to be modified if the program is loaded into a different memory location than the one it was compiled for.
Debug Information: This segment contains information used for debugging the program, such as symbols and line numbers.
Exception Handling: This segment contains information and code for handling exceptions that may occur during program execution.
Dynamic Linking: This segment contains information and code for dynamically linking to libraries and other external resources during program execution.
Efficient storage and retrieval: The C++ Standard Library containers are designed to be highly efficient in terms of storage and retrieval of data. They use various data structures such as arrays, linked lists, and trees to store data in a way that allows for fast retrieval.
Easy to use: The containers are easy to use and come with a well-defined interface. This makes it easier for developers to write code that is easy to read and maintain.
Portability: The containers are part of the C++ Standard Library, which means they are supported by all major C++ compilers. This makes them highly portable, and code written using these containers can be easily compiled and run on any platform.
Dynamic resizing: Many of the containers, such as vector and deque, have the ability to dynamically resize themselves as more elements are added. This makes it easy to add and remove elements without worrying about managing memory allocation.
Type safety: The containers are type-safe, which means they ensure that only the correct data types are stored in them. This helps prevent errors and bugs in the code.
Built-in algorithms: The containers come with a set of built-in algorithms for performing common operations such as sorting, searching, and merging. This reduces the need for developers to write their own algorithms, saving time and effort.
Versatility: The C++ Standard Library containers offer a wide range of data structures that can be used for various purposes. This includes arrays, lists, maps, sets, and more, giving developers the flexibility to choose the most appropriate container for their specific needs.
High performance: The containers are highly optimized for performance, making them suitable for use in performance-critical applications. They are designed to minimize memory usage and provide fast access and manipulation of data.
Interoperability: The containers can be easily used with other C++ Standard Library components, such as iterators and algorithms, making it easy to integrate them into existing code.
Reliable and well-tested: The containers are part of the C++ Standard Library, which is a well-tested and reliable library that has been used for many years. This ensures that the containers are stable and free from bugs, making them a trustworthy choice for data storage and manipulation.
Vector
List
Deque
Array
Forward List
Set
Multiset
Map
Multimap
Unordered Set
Unordered Multiset
Unordered Map
Unordered Multimap
Stack
Queue
Priority Queue
Bitset
Sequence containers store elements in a sequential manner and allow access to elements by their position in the container. This means that the elements are stored in a specific order and can be accessed by using an index or iterator. Examples of sequence containers include arrays, vectors, lists, and deques.
On the other hand, associative containers store elements in a non-sequential manner and allow access to elements by using a key rather than an index. This means that the elements are not stored in a specific order but are instead organized based on their key value. Examples of associative containers include sets, maps, and hash tables.
In summary, sequence containers use a sequential approach for storing and accessing elements, while associative containers use a non-sequential approach based on keys. The choice between these two types of containers depends on the specific needs and requirements of the program.
Typically parallelism makes you think “threads within a process” but it’s worth considering different points in the processing that could be executed/calculated in parallel.
See wikipedia.
time/program = instructions/program * clockCycles/instruction * time/clockCycles
Amdahl’s law is a mathematical formula that describes the theoretical maximum speedup that can be achieved in a system with multiple processors. It was proposed by computer architect Gene Amdahl in 1967 and is widely used to analyze the performance of parallel computing systems.
The law states that the speedup of a system is limited by the portion of serial code that cannot be parallelized. In other words, even if multiple processors are added to a system, there will always be a limit to how much the overall performance can be improved. This is because there will always be some tasks that must be performed sequentially and cannot be split among multiple processors.
The formula for Amdahl’s law is as follows:
Speedup = 1/((1-P) + (P/N))
Where: - P is the proportion of the program that can be parallelized - N is the number of processors
This formula shows that as the number of processors (N) increases, the speedup will approach the theoretical maximum of 1/(1-P). However, this maximum speedup is never achieved in practice because there will always be some serial code that cannot be parallelized.
For example, if 80% of a program can be parallelized, the theoretical maximum speedup with 10 processors would be 1/(1-0.8) = 5, meaning the program could potentially run 5 times faster with 10 processors. However, in reality, the speedup achieved will always be less than 5 due to the 20% of the code that cannot be parallelized.
Amdahl’s law highlights the importance of identifying and optimizing the serial portions of a program to achieve the best possible performance in parallel computing systems.
It’s very tempting to optimise the parts your are familiar with, but it’s not obvious how much your efforts will benefit the overall process. Spend your time on the bottlenecks.
Less common but another set of principles to consider.
{ “id”: “cmpl-BGD0YfSFFH4s0Vjh5lEWKzmXTorFx”, “object”: “text_completion”, “created”: 1743202950, “model”: “gpt-3.5-turbo-instruct:20230824-v2”, “choices”: [ { “text”: ”
See OSI on Wikipedia.
Layer | Name | Function |
---|---|---|
7 | Application | Provides a user interface and access to network services |
6 | Presentation | Handles data formatting and encryption for secure communication |
5 | Session | Establishes, maintains, and terminates connections between applications |
4 | Transport | Manages end-to-end communication and ensures data delivery |
3 | Network | Routes data between nodes and handles logical addressing |
2 | Data Link | Manages data transfer between adjacent nodes on a network |
1 | Physical | Handles the physical connection between devices and transmits raw data bits |
The TCP three way handshake is a process used to establish a reliable connection between two devices in a TCP/IP network. It is a three-step process that allows the two devices (usually a client and a server) to synchronize and establish a reliable connection before starting to exchange data.
Step 1: SYN (Synchronize)
The client initiates the connection by sending a SYN (synchronize) packet to the server. This packet contains a random sequence number that is used to identify the connection. This sequence number is important for establishing a reliable connection and ensuring that packets are received in the correct order.
Step 2: SYN-ACK (Synchronize-Acknowledgment)
Upon receiving the SYN packet, the server responds with a SYN-ACK (synchronize-acknowledgment) packet. This packet contains an acknowledgment number that is one greater than the sequence number received in the SYN packet. The server also generates its own random sequence number that will be used for the connection. At this point, the server has received the client’s request and is ready to establish a connection.
Step 3: ACK (Acknowledgment)
Finally, the client responds with an ACK (acknowledgment) packet. This packet contains an acknowledgment number that is one greater than the sequence number received in the SYN-ACK packet. This acknowledges the server’s response and completes the three-way handshake. The client and server can now exchange data over the established connection.
Once the three-way handshake is complete, the connection is established and the devices can start sending and receiving data. If any of the three steps fail, the connection will not be established and the devices will try the handshake again until a successful connection is made.
SYN stands for “synchronise”.
=> SYN
<= SYN-ACK
=> ACK
=> HTTP (request)
<= ACK
<= HTTP (response)
=> ACK
=> FIN
<= FIN-ACK
=> ACK
As a Linux developer you need a strong command line game.
bash
is ubiquitous, and a powerful language in itself,
without resorting to something like Python.
git
is awesome on the command line and pretty much
essential. Use it to manage your chores. See how
to undo almost anything with git.
(echo hello; sleep 1) | telnet 127.0.0.1 80
echo hello > /dev/tcp/127.0.0.1/80
echo hello | nc localhost 80
# server
nc -knvlp 3389
# client
bash -i >& /dev/tcp/server_ip/3389 0>&1
git add !(unit.md)
shuf -n 1 readme.txt
From bash 5.
echo $EPOCHREALTIME
1614870873.574544
echo $EPOCHSECONDS
1614870876
The three load average values are 1, 5 and 15 minutes.
uptime
15:29:28 up 20:23, 0 users, load average: 5.08, 1.49, 0.51
Stress your system in different ways.
stress --cpu 8
echo $(nproc)
localhost
127.0.0.1
127.0.0.2
127.0.0.3
127.1
0.0.0.0
0 # wut
mv {,_}.bash_history
watch -d ip a
pushd
equivalentI use this all the time. Rather than pushd
and
popd
to navigate around the file system:
# Navigate to new dir
cd ~/deanturpin.gitlab.io/content/post
# Return whence you came
cd -
Multithreaded concepts are important: e.g., atomics, locks, issues with different designs, how to make things thread safe.
Why use multithreading? For performance and/or separation of concerns.
A routine can be described as reentrant if it can be interrupted and not leave the operation partially complete: the invariants remain true.
A good way to reason about your code is to consider the invariants that must be true at any given point in time. For a linked list, node A’s next pointer must point to node B and node B’s previous pointer must point to node A. However, if you are deleting an element from a linked list there will be a period where this invariant does not hold, and you must acquire mutual access to both nodes whilst the list is being updated; otherwise another thread may see the list in an invalid state.
OpenAI: Invariants refer to logical or mathematical statements that remain true throughout the execution of a program or a system. They are conditions or properties that are always expected to hold true, regardless of the state or input of the system. Invariants are essential for ensuring the correctness and stability of a program.
There are three main types of invariants: preconditions, postconditions, and loop invariants.
Preconditions: These are conditions that must be true before a certain operation or function can be executed. They define the initial state of the system and specify what inputs are valid for the operation to be performed.
Postconditions: These are conditions that must be true after an operation or function has been executed. They define the expected state of the system after the operation is completed.
Loop invariants: These are conditions that must be true before and after each iteration of a loop. They ensure that the loop will terminate and that the expected result will be achieved.
Invariants are important for program correctness because they help identify and prevent errors and bugs in the code. They serve as checks and balances to ensure that the program behaves as expected and that the data remains consistent. Invariants also help in debugging and troubleshooting by providing a reference point for identifying where and why a program may have gone wrong.
In summary, invariants are essential for maintaining the integrity and reliability of a program by ensuring that certain conditions remain true throughout its execution. They are an important concept in software engineering and are used to improve the quality and robustness of programs.
A function is idempotent if it can be applied multiple times without changing the outcome. E.g., repeatedly pressing the “on” switch of an “on/off” pair: it’s still on. See Wikipedia.
tl;dr “Race condition: an unfortunate order of execution causes undesirable behaviour.”
OpenAI: Race conditions in computer science refer to a situation where two or more processes or threads access a shared resource or data at the same time, leading to unexpected and potentially incorrect results. This can occur when the outcome of the execution depends on the order in which the processes or threads access the shared resource.
One example of a race condition is when two processes try to update the same data in a database simultaneously. If the first process reads the data, performs a calculation, and then updates the data, but the second process also reads the data before the first process finishes its update, the second process will use the old data and the results will be incorrect.
Another example is in multi-threaded programming, where multiple threads are running concurrently. If one thread modifies a variable while another thread is accessing or using that variable, the results can be unpredictable and may lead to errors or crashes.
Race conditions can also occur in distributed systems, where multiple nodes are accessing and updating the same data. If proper synchronization mechanisms are not in place, conflicting updates can occur and lead to incorrect data.
The occurrence of race conditions is often unpredictable and difficult to detect, making them a common source of bugs and errors in computer programs. They can be particularly problematic in real-time or safety-critical systems, where incorrect results can have serious consequences. Proper synchronization and handling of shared resources is crucial in preventing race conditions in computer science.
See the Dining philosophers problem.
A real world example: Ken and Barbie walk over to the barbecue. Ken grabs the last remaining sausage and attempts to acquire the hot sauce; meanwhile, Barbie has already secured the hot sauce and is waiting for the sausage to become available. A sausage cannot be enjoyed without hot sauce, so neither can proceed and remain stuck in a deadlock indefinitely.
A livelock occurs when measures are taken to avoid a deadlock, but the measures themselves do not resolve the conflict. For example, Ken and Barbie both agree to put down their respective items and try again; however, they both pick up the same item agains and the deadlock continues.
OpenAI: Both deadlocks and livelocks are situations that occur in multi-threaded programs where threads are unable to continue executing properly. These situations can significantly impact the performance and functionality of the program.
Deadlock: - A deadlock is a situation where two or more threads are blocked and unable to proceed because they are waiting for each other to release resources. - In other words, each thread is holding a resource that the other thread needs, and neither thread can continue until the other releases the resource. - This creates a circular dependency and the threads are stuck in a state of waiting indefinitely. - In a deadlock, the program may appear to be frozen or unresponsive, as none of the threads can make any progress. - Deadlocks can occur in various scenarios, such as when multiple threads are trying to access shared resources or when a thread is waiting for a lock that is already held by another thread.
Livelock: - A livelock is a situation where two or more threads are stuck in a loop, constantly trying to acquire a resource that is currently unavailable. - Unlike a deadlock, in a livelock, the threads are not blocked, but they are unable to make any progress. - In a livelock, the threads may appear to be active and running, but they are not able to complete their tasks. - Livelocks can occur when two threads are trying to be polite and give way to each other, but end up stuck in a loop instead. - Livelocks can also occur in distributed systems when multiple nodes are constantly trying to communicate with each other but are unable to establish a connection.
In summary, both deadlocks and livelocks are situations where threads are unable to continue executing, but they differ in the way they are caused and how they manifest. Deadlocks occur due to resource dependencies, while livelocks occur due to conflicting actions by threads.
Run your code on different numbers of CPUs; it’s interesting how the bottlenecks move around depending on the available resources.
OpenAI: { “id”: “cmpl-BGD0qQc5UUP0LFgEqTXyaXd52DyAn”, “object”: “text_completion”, “created”: 1743202968, “model”: “gpt-3.5-turbo-instruct:20230824-v2”, “choices”: [ { “text”: ” in MySQL 1. Use proper indexing: Make sure that all the tables involved in your queries have proper indexes on the columns used in joins, WHERE, ORDER BY, and GROUP BY clauses. This will help the database engine to execute the queries efficiently and avoid unnecessary locking.
Keep transactions short: Long-running transactions increase the chances of deadlocks. Try to keep transactions as short as possible and commit them as soon as they are no longer needed.
Use consistent transaction ordering: In a multi-user environment, it is important to ensure that transactions are executed in a consistent order. This will prevent conflicting transactions from acquiring locks on the same resources at the same time.
Use row-level locking: MySQL supports multiple types of locking, including table-level and row-level locking. Table-level locking can cause deadlocks if multiple transactions try to access different rows in the same table. Row-level locking allows transactions to lock only the rows they need, reducing the chances of deadlocks.
Use the correct isolation level: MySQL offers different isolation levels that control how transactions interact with each other. The most restrictive isolation level, SERIALIZABLE, ensures that transactions are executed in a serial order, preventing deadlocks. However, it can also impact performance. Choose the appropriate isolation level based on your application’s needs.
Avoid long-running queries: Long-running queries can hold locks on resources for an extended period, increasing the chances of deadlocks. Optimize your queries to run efficiently and avoid unnecessary locking.
Use a deadlock detection tool: MySQL has a built-in tool called SHOW ENGINE INNODB STATUS that can help detect deadlocks. This tool provides information about the current transactions and their locking status, helping you identify and resolve deadlocks.
Use retry logic: In case a deadlock does occur, you can use retry logic in your application to try the transaction again after a short delay. This can help resolve the deadlock and continue the execution of the transaction.
Monitor and tune your database: Regularly monitoring and tuning your database can help identify and resolve any performance issues that may lead to deadlocks. This includes monitoring for long-running queries, optimizing indexes, and adjusting the isolation level if necessary. “,”index”: 0, “logprobs”: null, “finish_reason”: “stop” } ], “usage”: { “prompt_tokens”: 5, “completion_tokens”: 446, “total_tokens”: 451 } }
std::scoped_lock
OpenAI: Mutex and semaphore are both synchronization mechanisms used in multi-threaded or multi-process environments to control access to shared resources. They both help to prevent race conditions, which can occur when multiple threads or processes attempt to access and modify the same resource at the same time.
In summary, mutex and semaphore are both synchronization mechanisms that serve different purposes. Mutex is used for mutual exclusion and is usually more efficient, while semaphore is used for resource allocation and can allow multiple threads to access a resource. It is important to understand the differences between the two in order to choose the appropriate mechanism for a given situation.
See Mutex vs Semaphore.
This is getting into the weeds, but mutexes and the data they protect can happily be decalared consecutively in the code; so consider what this might mean when both mutex and data are in the same cache line.
tl;dr A thread is a branch of execution. A process can consist of multiple threads.
OpenAI: Threads and processes are two concepts in computer science that are used to manage the execution of programs. Both threads and processes are used to achieve multitasking, which allows multiple tasks to be executed simultaneously on a computer system.
Threads:
Definition: Threads are the smallest unit of execution within a process. They are also known as lightweight processes, as they share the same memory space and resources as the parent process.
Creation: Threads are created by the operating system within a process. Each process can have multiple threads, and these threads can be created and destroyed dynamically during the execution of the process.
Communication: Threads within the same process can communicate with each other through shared memory. This allows for efficient and fast communication as no data needs to be transferred between different processes.
Synchronization: Threads can synchronize with each other through various mechanisms, such as locks, semaphores, and barriers. This allows threads to coordinate their actions and avoid conflicts while accessing shared resources.
Resource sharing: Threads within the same process share the same resources, such as memory, files, and devices. This means that multiple threads can access the same resource simultaneously, which can improve performance.
Processes:
Definition: Processes are instances of programs that are being executed by the operating system. Each process has its own memory space and resources, and they are isolated from other processes.
Creation: Processes are created by the operating system when a program is launched. Unlike threads, processes cannot be created or destroyed dynamically during the execution of a program.
Communication: Processes can communicate with each other through inter-process communication mechanisms, such as pipes, sockets, and shared memory. This involves transferring data between different processes and can be slower compared to thread communication.
Synchronization: Processes do not share memory space and resources, so they cannot synchronize with each other directly. Inter-process communication mechanisms can be used to synchronize processes, but this can be slower compared to thread synchronization.
Resource sharing: Processes do not share resources with each other, so they must use inter-process communication mechanisms to share data and coordinate their actions. This can result in slower performance compared to threads.
In summary, threads and processes are both used for multitasking, but they differ in their creation, communication, synchronization, and resource sharing mechanisms. Threads are lighter and more efficient compared to processes, but processes provide more isolation and security. Both threads and processes have their own advantages and are used in different scenarios depending on the requirements of the program.
It’s important to know the common data structures and their characteristics.
std::vector
std::vector
is the go-to container, so let’s give it
some special attention. It has contiguous storage – therefore cache
friendly – and uses the RAII
paradigm: the data are created on the heap and the allocation and
deallocation (new/delete) are handled for you. Interestingly,
std::string
exhibits many of the same characteristics, but
it’s not quite containery enough to qualify.
Estimate how many times the fax
destructor is called
below.
#include <iostream>
#include <vector>
auto x = 0uz;
int main() {
struct fax {
// Constructor and destructor
() { std::cout << x << " ctor\n"; }
fax~fax() { std::cout << "\t" << x << " dtor\n"; }
// Copy constructors
(const fax&) { std::cout << x << " copy ctor\n"; };
fax(fax&&) { std::cout << x << " move ctor\n"; };
fax
// Assignment constructors
& operator=(const fax&) {
faxstd::cout << x << " copy assignment ctor\n";
return *this;
}
& operator=(fax&&) {
faxstd::cout << x << " move assignment ctor\n";
return *this;
};
const size_t id = x++;
};
std::vector<fax> y;
// Reduce copies by allocating up front
// y.reserve(3);
for (size_t i = 0; i < 3; ++i) {
.push_back(fax{});
ystd::cout << "-- capacity " << y.capacity() << " --\n";
}
// fax f1 = fax{};
// fax f2 = std::move(fax{});
}
}
See the program output (below), note how the capacity is growing exponentially (doubling each time).
1 ctor
2 move ctor
2 dtor
-- capacity 1 --
3 ctor
4 move ctor
5 copy ctor
5 dtor
5 dtor
-- capacity 2 --
6 ctor
7 move ctor
8 copy ctor
9 copy ctor
9 dtor
9 dtor
9 dtor
-- capacity 4 --
9 dtor
9 dtor
9 dtor
For each push we call the default constructor to create a temporary object and then call the copy constructor to populate the new element with its data… so far so good. But crucially, when the vector is resized we must also copy all of the existing elements into the new container. Not an issue for this simple case, but if the class is expensive to copy there could be trouble ahead; also, if there’s a bug in the copy constructor, we may be corrupting existing valid data simply by copying it around.
Binary search trees have an amortized complexity of O(log n) unless
the tree is unbalanced. Worst case, if the tree is unbalanced, then it
is O(n). std::set
and std::map
are implemented
using a red-black tree, a type of balanced binary search tree. C++23
introduces std::flat_map
which is implemented using
contiguous storage to make it more cache-friendly.
A binary search tree is a type of binary tree data structure that is particularly useful for efficient searching of elements. Each node in a binary search tree contains a key value, and the left and right child nodes are also binary search trees, with all the keys in the left subtree being smaller than the key of the current node, and all the keys in the right subtree being larger. This property allows for efficient searching, as the search can be narrowed down to a smaller subset of elements at each step.
In addition to efficient searching, binary search trees also allow for efficient insertion and deletion of elements, as the tree structure can be easily rebalanced to maintain the search property. However, if the tree becomes unbalanced, the search efficiency can be greatly reduced.
Binary search trees are commonly used in computer science for implementing other data structures such as maps, sets, and priority queues. They are also used in many algorithms, such as sorting and searching algorithms, due to their efficient search capabilities.
A balanced tree is one of height O(log n), where n is the number of nodes in the tree. It is a sorted hierarchy of data where the left most node is the smallest, and the right most node is the largest.
Below each node of a binary search tree is a mini tree. The top of the tree is the middle element.
e| /e
d| d
c| / \c
b| b
a| / \ /a
9|----- / 9
8| / \8
7| 7
6| \ /6
5|----- \ 5
4| \ / \4
3| 3
2| \ /2
1| 1
0| \0
Hash tables have an amortized complexity of O(1) unless there are collisions. Worst case, if everything is in the same bin, then it is O(n). Additionally, if the proportion of slots – known as “load factor” or “fill ratio” – exceeds a threshold, then the hash table must be resized/rebuilt.
std::unordered_set
and std::unordered_map
are implemented using hash tables.
A heap is a useful data structure when it is necessary to repeatedly remove the object with the highest (or lowest) priority.
Support of random access iterators is required to keep a heap structure internally at all times. A min heap is also a sorted list.
123456789abcedef
1
23
4567
89abcdef
Project down, the top of the tree is a smallest element.
1
2 3
4 5 6 7
8 9 a b c d e f
1
/ \
\
/ \
/
2 3/ \ / \
4 5 6 7/ \ / \ / \ / \
8 9 a b c d e f
A heap and a binary search tree (BST) are both data structures used to store and organize data, but they have some key differences that set them apart.
In summary, the main differences between a heap and a BST are their structure, ordering, search and insertion operations, balance, usage, and implementation. Both data structures have their own advantages and are suitable for different use cases.
Adding/removing at the beginning is O(1), but adding at the end means search the whole list, therefore O(n). Searching is also O(n).
std::forward_list
is a singly linked list.
Like a singly linked list but we can iterate both ways. It stores a pointer to both the beginning and end, therefore operations on the end are also O(1).
std::list
is a doubly linked list.
Container | Average Case Time Complexity | Worst Case Time Complexity | Space Complexity |
---|---|---|---|
std::array |
O(1) | O(1) | O(N) |
std::vector |
O(1) | O(N) | O(N) |
std::deque |
O(1) | O(N) | O(N) |
std::list |
O(1) | O(N) | O(N) |
std::forward_list |
O(1) | O(N) | O(N) |
std::set |
O(log N) | O(log N) | O(N) |
std::multiset |
O(log N) | O(log N) | O(N) |
std::unordered_set |
O(1) | O(N) | O(N) |
std::unordered_multiset |
O(1) | O(N) | O(N) |
std::map |
O(log N) | O(log N) | O(N) |
std::multimap |
O(log N) | O(log N) | O(N) |
std::unordered_map |
O(1) | O(N) | O(N) |
std::unordered_multimap |
O(1) | O(N) | O(N) |
std::stack |
O(1) | O(N) | O(N) |
std::queue |
O(1) | O(N) | O(N) |
std::priority_queue |
O(log N) | O(log N) | O(N) |
However, the conventional CS wisdom for when to use a linked list over contiguous storage hasn’t applied for years: you have to measure. E.g., if a container fits in the cache, a vector might outperform everything else.
What is the difference between a compiler and an interpreter?
Can you explain the concept of data structures and give an example of one?
How does object-oriented programming differ from procedural programming?
Can you explain the concept of recursion and give an example of when it would be useful?
Can you walk me through the steps of the software development life cycle and why each step is necessary?
What is the difference between a pointer and a reference in C++?
How does virtual inheritance differ from multiple inheritance?
Can you explain the concept of memory management in C++?
How do you handle exceptions in C++?
What are the benefits and drawbacks of using templates in C++?
Branchless programming is an important approach in low-latency code. It may offer performance improvements but potentially at the cost of maintainability.
#include <iostream>
int main() {
int x = 10;
int y = 20;
// Using a branch statement
int max_val_branch = (x > y) ? x : y;
std::cout << "Max value using branch statement: " << max_val_branch << std::endl;
// Using branchless programming
int max_val_branchless = y ^ ((x ^ y) & -(x > y));
std::cout << "Max value using branchless programming: " << max_val_branchless << std::endl;
return 0;
}
In summary, branch prediction focuses on predicting the outcome of a specific branch instruction to enable speculative execution, while branch target prediction aims to predict the target address of a branch to reduce the delay in fetching and decoding subsequent instructions. Both techniques are utilized in modern processors to optimize instruction execution and improve overall performance.
OpenAI: Branch prediction is a technique used in computer processors to predict which instruction will be executed next in order to improve the overall performance of the processor. This is necessary because modern processors can execute instructions out of order, meaning that they do not necessarily follow the sequential order of instructions as they appear in the program code.
Branch prediction involves analyzing the program code to identify branches, which are points in the code where the processor has to make a decision and choose between two or more different paths of execution. These branches can occur due to conditional statements, loops, or function calls.
The processor uses historical information and statistical analysis to predict which branch will be taken in the future. This information is typically gathered from previous executions of the same code or from information provided by the programmer. Based on this prediction, the processor will start pre-executing the instructions from the predicted branch before it is actually needed, in the hope that the prediction is correct and this will save time and improve performance.
If the prediction is correct, the processor continues executing the pre-executed instructions and the program runs faster. However, if the prediction turns out to be incorrect, the processor has to discard the pre-executed instructions and start executing from the correct branch, resulting in a performance penalty.
There are various types of branch prediction techniques, including static and dynamic methods. Static branch prediction uses pre-defined rules to predict the branch and is typically used for simple branches. Dynamic branch prediction, on the other hand, uses more complex algorithms and constantly updates its predictions based on the program’s behavior. This is useful for more complex branches that are difficult to predict using static methods.
Overall, branch prediction is an important feature of modern processors that helps improve performance by reducing the amount of time wasted on incorrect branch predictions.
When the
[[likely]]
attribute is applied, it suggests to the compiler that the branch following the attribute is frequently executed. This allows the compiler to prioritize the optimization of code paths that are more likely to be taken, potentially resulting in better performance.
However, it’s important to note that the
[[likely]]
attribute is just a hint, and it does not guarantee any specific behavior or optimization. The behavior of[[likely]]
can vary depending on the compiler and its version. Some compilers may ignore the attribute altogether, while others may consider it as a hint for optimization.
Object-oriented programming (OOP) is a programming paradigm that is based on the concept of objects. It is a way of structuring and organizing code in a logical and modular manner, which allows for better code reusability, maintainability, and scalability.
In OOP, objects are created from classes, which act as blueprints or templates for the objects. These objects have properties (attributes) and behaviors (methods) that define their characteristics and actions.
The four main principles of OOP are encapsulation, abstraction, inheritance, and polymorphism. Encapsulation refers to the concept of bundling data and methods within an object, ensuring that only the necessary information is accessible from the outside. Abstraction involves hiding the implementation details of an object and providing a simplified interface for interacting with it.
Inheritance allows for the creation of new classes from existing ones, inheriting their properties and behaviors. This promotes code reuse and helps in creating hierarchies of related objects. Polymorphism is the ability for objects of different classes to respond to the same method in different ways, based on their specific implementations.
OOP also emphasizes the use of objects and their interactions, rather than just procedures and functions, to solve problems. This approach makes it easier to model real-world entities and systems in the code, leading to more efficient and maintainable solutions.
Some popular programming languages that support OOP include Java, C++, Python, and Ruby.
In C++, the only difference between a class
and a
struct
is the default access level. In a
class
, members are private by default, while in a
struct
, members are public by default. Also, the default
inheritance access level is different: class
uses private
inheritance by default, while struct
uses public
inheritance by default.
Compile-time Polymorphism: This type of polymorphism is achieved during the compilation phase of a program. It is also known as early binding or static binding. It involves the use of function overloading and operator overloading to perform different operations based on the type and number of arguments passed to a function or operator.
Runtime Polymorphism: Also known as late binding or dynamic binding, this type of polymorphism occurs during the execution of a program. It involves the use of virtual functions and function overriding to perform different operations based on the actual object type at runtime. This allows for flexibility and extensibility in code, as different derived classes can have their own implementation of the same function.
Ad-hoc Polymorphism: Also known as overloading polymorphism, this type of polymorphism allows a function or operator to behave differently based on the type of arguments passed to it. It involves the use of function overloading and operator overloading to perform different operations based on the type and number of arguments passed.
Parametric Polymorphism: Also known as generic polymorphism, this type of polymorphism allows a function or data type to handle different types of data without the need for explicitly specifying the type. It allows for code reusability and flexibility by allowing the same function or data type to work with different data types.
Inclusion Polymorphism: Also known as subtype polymorphism, this type of polymorphism allows a derived class to inherit and override the behavior of its base class. It allows for code reuse and extensibility by allowing objects of different classes to be treated as objects of the same base class, thereby providing a common interface for different types of objects. This is often achieved through inheritance and virtual functions.
A standard question at interviews! A very nuanced feature of OOP in C++ but you can now test this at compile time with type traits.
Destructors must be declared virtual when the class contains virtual functions or is part of a polymorphic hierarchy. This ensures that the destructor of a derived class is called if it is deleted via a pointer to the polymorphic base class.
The virtual specifier specifies that a non-static member function is virtual and supports dynamic dispatch. It may only appear in the decl-specifier-seq of the initial declaration of a non-static member function (i.e., when it is declared in the class definition).
In derived classes, you should mark overridden methods as
override
to express intention and also protect from the
base class being changed unwittingly.
A non-virtual class has a size of 1 because in C++ classes can’t have zero size (objects declared consecutively can’t have the same address). A virtual class has a size of 8 on a 64-bit machine because there’s a hidden pointer inside it pointing to a vtable. vtables are static translation tables, created for each virtual-class.
RAII (Resource Acquisition Is Initialization) is a programming technique used in object-oriented languages, such as C++ and Java, to manage the acquisition and release of resources in a safe and efficient manner. It involves wrapping resource acquisition code in a class and using the object’s constructor and destructor to automatically acquire and release the resource.
The key idea behind RAII is that resource acquisition and initialization should happen together, and resource release should happen automatically when the object goes out of scope. This ensures that resources are properly managed and prevents memory leaks or other resource-related errors.
RAII is commonly used for managing dynamic memory allocation, such as with the C++ "new" and "delete" operators, as well as other resources like file handles, database connections, and network sockets. By encapsulating the resource management within a class, RAII allows for a more declarative and intuitive way of handling resources, reducing the likelihood of errors and simplifying the code.
In addition to automatic resource management, RAII also provides exception safety. If an exception is thrown during the object’s construction or initialization, the destructor will be called, ensuring that any acquired resources are properly released.
Overall, RAII is a powerful and widely used technique for managing resources in a safe and efficient manner, making it an important concept for developers to understand and utilize in their code.
Overload resolution is the process of selecting the most appropriate function or method among a set of overloaded functions or methods with the same name but different parameters. This is a common feature in object-oriented programming languages like C++ and Java, where multiple functions or methods can have the same name but different parameters.
There are several strategies used in overload resolution, and the most common ones are:
Number of parameters: This strategy prioritizes functions or methods with the exact number of parameters as specified in the function call. If there are multiple functions with the same number of parameters, the next strategy is used.
Type of parameters: This strategy prioritizes functions or methods that have the exact type of parameters as specified in the function call. If there are still multiple functions with the same type of parameters, the next strategy is used.
Parameter order: This strategy prioritizes functions or methods that have the exact order of parameters as specified in the function call. For example, if a function has parameters (int, char) and another function has parameters (char, int), and the function call has arguments (5, ‘a’), the first function will be chosen.
Default arguments: If a function or method has default arguments, it will be prioritized over functions or methods without default arguments.
Type conversion: If none of the above strategies can determine the most appropriate function or method, the compiler will try to convert the arguments to match the parameters of a function or method. This strategy is known as type conversion or implicit conversion.
Ambiguity resolution: In some cases, there may be multiple functions or methods that can be called with the given arguments, and none of the above strategies can determine the most appropriate one. In such cases, the compiler will throw an error and ask for the function or method to be explicitly specified.
Overall, the overload resolution strategies prioritize functions or methods that have an exact match with the given arguments, followed by those with default arguments or implicit conversions, and finally, an explicit specification is required if there is still ambiguity. ___
Argument-dependent lookup, also known as ADL, or Koenig lookup, is the set of rules for looking up the unqualified function names in function-call expressions, including implicit function calls to overloaded operators. These function names are looked up in the namespaces of their arguments in addition to the scopes and namespaces considered by the usual unqualified name lookup.
Notice begin and end parameters don’t specify a namespace, see ADL.
#include <iostream>
#include <vector>
#include <iterator>
int main() {
const std::vector v{1, 2, 3, 4, 5};
std::copy(cbegin(v), cend(v), std::ostream_iterator<int>(std::cout, "n"));
}
A cache is a small amount of unusually fast memory.
A modern multi-core machine will have a multi-level cache hierarchy, where the faster and smaller caches belong to individual processors. When one processor modifies a value in its cache, other processors cannot use the old value any more; and that memory location will be invalidated in all of the caches. Furthermore, since caches operate on the granularity of cache lines and not individual bytes, the entire cache line will be invalidated in all caches.
A multiprocessor is cache consistent if all writes to the same memory location are performed in some sequential order”.
OpenAI: Processor caches are small memory stores located on the CPU that are used to temporarily hold data and instructions that are frequently accessed by the processor. They are designed to speed up the performance of the CPU by reducing the time it takes to retrieve data from the main memory.
There are three types of processor caches: L1, L2, and L3.
L1 cache: This is the smallest and fastest cache, located directly on the CPU. It is divided into two parts – the instruction cache and the data cache. The instruction cache holds instructions that are frequently used by the CPU, while the data cache holds data that is frequently accessed by the CPU. The L1 cache has the fastest access time, but it is also the smallest, typically ranging from 8KB to 64KB.
L2 cache: This is a larger cache that is located on the CPU or on a separate chip. It is slower than the L1 cache but larger in size, typically ranging from 256KB to 8MB. The L2 cache can hold more data and instructions than the L1 cache, making it more effective in improving CPU performance.
L3 cache: This is the largest cache and is typically located on a separate chip from the CPU. It acts as a buffer between the main memory and the L1 and L2 caches. It is slower than the L1 and L2 caches but still faster than the main memory. The size of the L3 cache can range from 4MB to 16MB, depending on the processor.
The purpose of processor caches is to reduce the number of times the CPU has to access the main memory, which is a much slower process. By storing frequently used data and instructions in the cache, the CPU can retrieve them quickly, thereby improving overall system performance.
Caches also use a technique called caching, where the most recently used data and instructions are stored in the cache, while the least recently used data and instructions are removed. This helps to ensure that the most frequently used data and instructions are always available in the cache, further improving performance.
In summary, processor caches are essential for improving the speed and efficiency of a CPU by providing a temporary storage space for frequently used data and instructions. They play a crucial role in modern computer architecture and are continuously evolving to keep up with the increasing demands of computing.
A multiprocessor is cache consistent if all writes to the same memory location are performed in some sequential order.
OpenAI: Cache coherence refers to the process of ensuring that multiple processors or devices in a computer system have consistent and up-to-date data in their respective caches. In a multi-processor system, each processor has its own cache memory, which is a small but fast memory used to store frequently accessed data. However, when multiple processors are accessing the same data, there is a risk of data inconsistency if one processor updates the data while the other processor still has an outdated copy in its cache.
To maintain cache coherence, a protocol is used to ensure that each processor’s cache has the most recent version of the data. This is typically achieved through a combination of hardware and software mechanisms.
The most common approach is the use of a bus-based snooping protocol, where each processor monitors the bus for any data transactions and checks if the data being transferred matches the data in its cache. If there is a match, the processor will update its cache accordingly. If there is a mismatch, the processor will request the updated data from the other processor or from the main memory.
Another approach is the use of a directory-based protocol, where a centralized directory keeps track of which processors have copies of a particular data block. When one processor wants to access the data, it must first check with the directory to see if it has the most recent version. If not, the directory will coordinate the transfer of the updated data to the requesting processor.
Cache coherence is important for maintaining the integrity and consistency of data in a multi-processor system. It allows for efficient sharing of data between processors and prevents data inconsistencies that can lead to errors and system crashes.
Each cache line has state, tag, data.
See bus snooping.
Used for large CPUs. See wiki.
All Intel-compatible CPUs during the last one/two decades used a write-back strategy for caches (which presumes fetching a cache line first to allow partial writes). Of course that’s the theory, reality is slightly more complex than that.
There are three kinds of cache misses:
OpenAI: Cache misses occur when the CPU requests data from a memory location that is not currently stored in the cache. This means that the requested data must be retrieved from a slower, larger memory, such as the main memory or the hard drive. Cache misses can significantly slow down the performance of a computer, as retrieving data from a slower memory takes more time than retrieving it from the cache. There are two types of cache misses:
Compulsory Cache Miss: This occurs when data is first requested and is not yet present in the cache. Since the cache is initially empty, all initial requests for data will result in a compulsory cache miss.
Capacity Cache Miss: This occurs when the cache is not large enough to hold all the data that is being accessed. This can happen when there is a high demand for data that exceeds the capacity of the cache, causing some data to be evicted and resulting in a cache miss.
Cache misses can also occur due to other reasons, such as conflicts between different data that is being accessed by the CPU at the same time, or when the data has been modified in the main memory but the cache still holds the old version of the data. Cache misses can impact the performance of a computer and reduce the efficiency of the cache, as the CPU has to wait for the requested data to be retrieved from a slower memory location. To minimize the number of cache misses, computer architectures use different techniques such as increasing the size of the cache, optimizing the cache replacement policy, and using smarter data placement strategies.
+ (vec.size() & 1)
The way virtual functions work may cause issues with caching.
But we may be able to use CRTP to avoid the first two.
The curiously recurring template pattern (CRTP) is an idiom in C++ in which a class X derives from a class template instantiation using X itself as template argument. More generally it is known as F-bound polymorphism, and it is a form of static polymorphism.
// Base class template using CRTP
template<typename Derived>
class Base {
public:
void foo() {
static_cast<Derived*>(this)->fooImpl();
}
};
// Derived class using CRTP
class Derived : public Base<Derived> {
public:
void fooImpl() {
// Implementation of foo specific to Derived
std::cout << "Hello from Derived!" << std::endl;
}
};
Typically a cache-oblivious algorithm works by a recursive divide and conquer algorithm, where the problem is divided into smaller and smaller sub-problems. Eventually, one reaches a sub-problem size that fits into cache, regardless of the cache size.
Design to get the most value out of every single cacheline that is read. Split apart the functions and data.
Cache locality refers to the tendency of a computer’s CPU to access data and instructions that are stored close to each other in memory at the same time, rather than accessing data and instructions that are stored far apart.
There are two types of cache locality: temporal locality and spatial locality.
Temporal locality refers to the idea that if a CPU accesses a particular piece of data or instruction, it is likely to access it again in the near future. This is because programs tend to access the same data and instructions repeatedly.
Spatial locality, on the other hand, refers to the idea that if a CPU accesses a particular piece of data or instruction, it is likely to access data or instructions that are nearby in memory as well. This is because programs tend to access data and instructions that are stored close to each other in memory, rather than scattered throughout.
Having good cache locality can greatly improve a computer’s performance, as the CPU can quickly access data and instructions that are already stored in the cache, rather than having to retrieve them from slower main memory. This is why cache memory is designed to store recently accessed data and instructions, as well as data and instructions that are physically close to them in memory.
The Translation Lookaside Buffer (TLB) is a hardware cache that is used in computer systems to improve the efficiency of virtual memory address translation. It is a small, high-speed cache that stores recently used virtual-to-physical address translations.
When a program references a memory location, the virtual memory address needs to be translated into a physical memory address by the operating system. This translation process involves looking up a page table or a similar data structure. Since this translation operation is performed frequently, it can introduce overhead and impact system performance.
The TLB acts as a cache for these translations, reducing the need for repeated lookups in the page table. It stores a subset of the most recently used translations, making it faster to access and retrieve the corresponding physical memory address.
TLBs can be shared between multiple cores or processors, or each core may have its own dedicated TLB.
OpenAI:
Translation lookaside buffer (TLB) is a hardware component in a computer’s central processing unit (CPU) that stores recently used virtual memory addresses and their corresponding physical memory addresses. It acts as a cache for the memory management unit (MMU) and helps to speed up the process of virtual memory translation.
When a program needs to access data from memory, the CPU first checks the TLB to see if the virtual address is already mapped to a physical address. If the mapping is found in the TLB, the physical address is used and the translation process is skipped, saving time and improving performance. However, if the mapping is not found in the TLB, the MMU must perform a full translation of the virtual address to a physical address, which can be a slower process.
The TLB is organized as a small, fast lookup table with a limited number of entries, typically ranging from a few hundred to a few thousand. When the TLB becomes full, the oldest entries are replaced with new ones, using a technique called "least recently used" (LRU) replacement. This means that the TLB will always contain the most frequently used translations, maximizing its effectiveness in speeding up memory access.
Overall, the TLB helps to improve the efficiency of virtual memory management by reducing the number of memory access requests that require a full translation. It is an important component in modern CPUs and plays a crucial role in optimizing memory access and overall system performance.
A memory fence (or barrier) is a programming language construct used to enforce an ordering constraint on memory operations issued by a thread of execution on a CPU. This typically means that operations issued prior to the fence are guaranteed to be performed before operations issued after the fence.
const int num_mailboxes = 32;
std::atomic<int> mailbox_receiver[num_mailboxes];
std::string mailbox_data[num_mailboxes];
// The writer threads update non-atomic shared data
// and then update mailbox_receiver[i] as follows:
[i] = ...;
mailbox_datastd::atomic_store_explicit(&mailbox_receiver[i], receiver_id, std::memory_order_release);
// Reader thread needs to check all mailbox[i], but only needs to sync with one.
for (int i = 0; i < num_mailboxes; ++i)
if (std::atomic_load_explicit(&mailbox_receiver[i],
std::memory_order_relaxed) == my_id)
{
// synchronize with just one writer
std::atomic_thread_fence(std::memory_order_acquire);
// guaranteed to observe everything done in the writer thread
// before the atomic_store_explicit()
(mailbox_data[i]);
do_work}
See wiki and cppreference.
There are several factors that can push something out of the cache:
You should be comfortable explaining the complexity of your code. See the Big O Cheatsheet.
Complexity | Description | Example |
---|---|---|
O(1) | Constant time | Accessing a specific element in an array |
O(log n) | Logarithmic time | Binary search |
O(n) | Linear time | Finding the maximum value in an unsorted array |
O(n log n) | Linearithmic time | Sorting algorithms such as Merge Sort |
O(n^2) | Quadratic time | Nested loops |
O(n^3) | Cubic time | Three nested loops |
O(k^n) | Exponential time | Finding all subsets of a set |
O(n!) | Factorial time | Finding all possible permutations of a set |
Linked lists and arrays are two commonly used data structures in computer science. Both have their own advantages and disadvantages and are used in different situations depending on the specific needs of the program. The average complexity of a data structure refers to the average time or space required to perform operations such as insertion, deletion, and search. In this comparison, we will look at the average complexity of linked lists and arrays in terms of these operations.
Overall, the average complexity for insertion in a linked list is O(n/2) or O(n), and for an array, it is O(n/2) or O(n).
Overall, the average complexity for deletion in a linked list is O(n/2) or O(n), and for an array, it is O(n/2) or O(n).
Overall, the average complexity for search in a linked list is O(n/2) or O(n), and for an array, it is O(n/2) or O(n).
In terms of space complexity, linked lists have a space complexity of O(n) as each node contains a data element and a pointer to the next node. In comparison, arrays have a space complexity of O(n) as well, but they allow for more efficient use of memory as they store data in contiguous memory locations.
In conclusion, the average complexity of linked lists and arrays for insertion, deletion, and search operations is the same, i.e. O(n), but the constant factors may vary depending on the specific operation and the implementation of the data structures. However, linked lists have a better time complexity for insertion and deletion at the beginning or end of the list, while arrays have a better time complexity for searching an element at a specific index. The choice between using a linked list or an array ultimately depends on the specific needs of the program.
A test is not a unit test if it:
See the complete article.
This is all a lot easier since std::thread
was
introduced in C++11. Now we have a platform dependent interface. See the
C++ concurrency support library
examples.
However, remember there is an overhead in creating a thread: if the operation you’re repeating is quick then could it actually take longer to parallelise it? You need to profile: how long does it take to just create a thread?
std::async
Think of it like pushing a calculation into the background.
std::async
executes a function asynchronously and returns a
std::future
that will eventually hold the result of that
function call. It’s quite a nice way to reference the result of a
calculation executed in another thread. Also, it manages the issue of
creating too many threads: they will just be executed sequentially.
Additionally, exceptions thrown in the asynchronous routine destroy the
future and the exception is rethrown by the calling get()
routine. You ought to pass it a strategy:
std::launch::async
or std::launch::deferred
as
a first parameter.
std::thread
You need to call join()
for every thread created with
std::thread
. This can be done by storing your threads as a
vector and joining each of them in a loop.
std::jthread
Like a regular thread but you don’t have to join it in the caller: it actually joins for you in the destructor. Quite nice to not have to remember to join, but joining all your threads can be a convenient synchronisation point.
Many of the Standard Library algorithms can take an execution policy, which is quite an exciting way to parallelise existing code. But remember it offers no protection of shared data: it’s still just a bunch of threads.
std::execution::seq
: execution may not be
parallelizedstd::execution::par
: execution may be parallelizedstd::execution::par_unseq
: execution may be
parallelized, vectorized, or migrated across threads (such as by a
parent-stealing scheduler)std::execution::unseq
: execution may be vectorized,
e.g., executed on a single thread using instructions that operate on
multiple data itemsSome of these have an _if
version that takes a
additional predicate: e.g., std::replace
and
std::replace_if
.
std::sort
std::copy
std::transform
std::accumulate
std::for_each
std::reduce
std::inclusive_scan
std::exclusive_scan
std::transform_reduce
std::remove
std::count
std::max_element
std::min_element
std::find
std::generate
std::mutex
A standard way to protect access to something, but there are multiple ways to unlock it.
Here be dragons! In the shape of deadlocks. There are several
strategies to improve your chances of not become stuck, see the deadlock
chapter for more. You can use std::scoped_lock
with
multiple resources or single, but also I think it expresses intention
better, by virtue of having “scope” in the name.
std::mutex
std::atomic
- an operation that is guaranteed to
operate as a single transactionstd::scoped_lock
std::lock_guard
std::thread
and std::join
std::jthread
(auto join)See the Standard Library concurrency support library.
-Consumer solution is a classic synchronization problem where a producer produces data items and adds them to a shared buffer, while a consumer consumes these items from the buffer. In this solution, we will use the concept of multithreading to implement a producer-consumer problem in C++.
Producer
, we define a shared buffer of a fixed size and initialize it with a mutex and two conditional variables for synchronization. The mutex ensures mutual exclusion between threads, and the conditional variables are used for signaling and waiting. First
#define BUFFER_SIZE 10 // size of the shared buffer int buffer[BUFFER_SIZE]; // shared buffer int in = 0; // index for producer to add item int out = 0; // index for consumer to remove item std::mutex mt; // mutex for mutual exclusion std::condition_variable not_full, not_empty; // conditional variables for signaling and waiting
Next, we define two functions, the producer function and the consumer function, which will be executed by separate threads.
// producer function void producer() { int item; while (true) { // generate a random item item = rand() % 100;
// acquire the mutex lock
std::unique_lock<std::mutex> lock(mt);
// wait if the buffer is full
while ((in + 1) % BUFFER_SIZE == out) {
not_full.wait(lock);
}
// add the item to the buffer
buffer[in] = item;
in = (in + 1) % BUFFER_SIZE;
// signal the consumer that the buffer is not empty
not_empty.notify_one();
// release the mutex lock
lock.unlock();
// sleep for a random time
std::this_thread::sleep_for(std::chrono::milliseconds(rand() % 1000));
}
}
// consumer function void consumer() { int item; while (true) { //
acquire the mutex lock std::unique_lock
// wait if the buffer is empty
while (in == out) {
not_empty.wait(lock);
}
// remove the item from the buffer
item = buffer[out];
out = (out + 1) % BUFFER_SIZE;
// signal the producer that the buffer is not full
not_full.notify_one();
// release the mutex lock
lock.unlock();
// consume the item
std::cout << \"Consumed item: \" << item << std::endl;
// sleep for a random time
std::this_thread::sleep_for(std::chrono::milliseconds(rand() % 1000));
}
}
In the main function, we create two threads, one for the producer function and one for the consumer function, and then join them with the main thread.
int main() { // create threads std::thread producer_thread(producer); std::thread consumer_thread(consumer);
// join threads with the main thread
producer_thread.join();
consumer_thread.join();
return 0;
}
When the program is executed, the producer thread will continuously generate random items and add them to the buffer, while the consumer thread will remove these items from the buffer and consume them. The mutex and conditional variables ensure that the producer and consumer do not access the buffer at the same time, and the buffer remains within its size limit.
This producer-consumer solution ensures synchronization and avoids race conditions between the producer and consumer threads.
In 2014 Randall Munroe estimated that Google stores 10 exabytes of data across all of its operations. However, as a C++ developer, you will only come across at most petabytes of storage; and if CPUs are topping out at gigahertz, then you won’t see anything much faster than a nanosecond.
1 000 kilo | milli .001
1 000 000 mega | micro .000 001
1 000 000 000 giga | nano .000 000 001
1 000 000 000 000 tera | pico .000 000 000 001
1 000 000 000 000 000 peta | femto .000 000 000 000 001
1 000 000 000 000 000 000 exa | atto .000 000 000 000 000 001
1 000 000 000 000 000 000 000 zetta | zepto .000 000 000 000 000 000 001
See list of SI prefixes.
You may be tempted to use binary prefixes to be more precise – kibi, mebi, gibi, etc – but most people won’t know what you’re talking about. Also, manufacturers tend to use 10003 rather than 220 because it makes their performance look better.
See why is everyone in such a rush?
1/1000000000 second == 1 nanosecond
Approximate durations of typical operations (rounded to help remember.)
Action | Duration (nanoseconds) |
---|---|
L1 cache read, variable increment | <1 |
Branch misprediction (how do you measure this?) | 5 |
L2 cache read, std::atomic |
10 |
std::mutex/scoped_lock |
20 |
Fetch from main memory | 100 |
Semaphore acquire | 200 |
Send 2KiB over 1Gbps network | 20,000 |
Create a std::thread |
20,000 |
Send packet from US to Europe and back | 200,000,000 |
The objective of design patterns is to provide a common and well-established solution to recurring design problems in software development. They serve as a set of proven, reusable solutions that can be applied to various design problems to improve the quality, efficiency, and maintainability of software. Design patterns encapsulate expert knowledge and best practices, making it easier for developers to communicate and understand each other’s code. They also promote standardization and consistency in software design, allowing for easier maintenance and future modifications. By following design patterns, developers can save time and effort in the design and development process, resulting in more robust and maintainable software.
The factory design pattern is a creational design pattern that allows for the creation of objects without specifying the exact class of object that will be created. This pattern is useful when there are multiple types of objects that need to be created, but the specific type is not known until runtime.
The main components of the factory design pattern are the factory class, product interface, and concrete product classes. The factory class is responsible for creating the objects and is typically a static method that returns instances of the product interface. The product interface defines the common methods and properties that all concrete product classes must implement. The concrete product classes are the specific types of objects that can be created by the factory.
When implementing the factory design pattern, the client code does not need to know the specific type of object that is being created. Instead, it calls the factory method and receives an instance of the product interface. This allows for a more flexible and decoupled code structure, as the client does not need to be updated if new concrete product classes are added.
The factory design pattern is commonly used in situations where there is a need for object creation based on certain conditions or configurations. It also promotes the principle of "coding to interfaces, not implementations," which can make the code more maintainable and extendable.
One potential drawback of the factory design pattern is that it can become complex and difficult to manage if there are a large number of concrete product classes. In this case, a variation of the pattern called the Abstract Factory pattern may be more suitable.
Store a system configuration in an external text file and construct at runtime using the factory pattern.
The visitor design pattern is a behavioral design pattern that allows for the separation of algorithms or operations from the objects on which they operate. This pattern enables the addition of new operations to existing object structures without modifying the objects themselves.
In this pattern, a visitor is an object that encapsulates the behavior to be performed on the elements of an object structure. It implements different methods to perform different operations on the elements. The object structure, on the other hand, is a collection of objects that need to be visited by the visitor.
The main idea behind this pattern is that the visitor object is responsible for traversing the object structure and performing the desired operation on each element. This eliminates the need to modify the object structure itself when new operations need to be added.
The visitor pattern consists of the following components:
Visitor: This is an interface or an abstract class that declares the methods for visitation. Each method corresponds to an operation that can be performed on the elements of the object structure.
ConcreteVisitor: This class implements the Visitor interface and defines the actual operations to be performed on the elements of the object structure.
Element: This is an interface or an abstract class that defines the accept() method, which accepts a visitor object as a parameter. This method is used to call the appropriate visitor method for the element.
ConcreteElement: This class implements the Element interface and defines the accept() method. It also provides a method to get the data from the element.
ObjectStructure: This class contains a collection of elements to be visited by the visitor.
The steps involved in implementing the visitor pattern are as follows:
Create the Visitor interface or an abstract class with methods for visitation.
Create ConcreteVisitor classes that implement the Visitor interface and define the operations to be performed on the elements.
Create the Element interface or an abstract class with the accept() method.
Create ConcreteElement classes that implement the Element interface and define the accept() method. These classes should also provide a method to get the data from the element.
Create the ObjectStructure class that contains a collection of elements to be visited by the visitor.
In the accept() method of the ConcreteElement classes, call the appropriate visitor method and pass the current element as a parameter.
In the main program, create an instance of the ObjectStructure class and add elements to it.
Create an instance of the ConcreteVisitor class and call the visit() method on the ObjectStructure object, passing the visitor object as a parameter.
The visitor will then traverse the object structure and perform the desired operation on each element.
The visitor design pattern is useful when there are a large number of operations to be performed on a set of objects, and adding new operations will not affect the objects themselves. It also helps in keeping the code clean and maintainable by separating the operations from the objects.
The visitor pattern exploits a feature of subtype polymorphism named “double dispatch.”
Double dispatch in subtype polymorphism refers to a situation where a method is selected based on the dynamic types of two objects instead of just one. This allows for more specific and accurate method calls based on the specific types of both objects involved.
In subtype polymorphism, objects of a subclass can be treated as objects of their superclass. This means that an object of a specific type can be passed as an argument to a method that expects an object of its superclass. However, when this happens, the method that will be called is determined by the type of the object at runtime, not the type declared in the method’s signature.
This is where double dispatch comes into play. When a method is called on an object, the method is selected based on the type of that object. However, in the case of double dispatch, the method is selected based on the types of both the object the method is being called on and the object that is passed as an argument. This allows for more specific and precise method calls, as the method will be selected based on the actual types of both objects involved.
Double dispatch is a powerful tool in subtype polymorphism as it allows for more specific and accurate method calls, leading to more concise and maintainable code. It is commonly used in situations where there are multiple levels of inheritance and the types of objects involved may vary.
Create a class hierarchy of shapes and define methods that operate on those shapes in a visitor class.
Hardware interface classes will typically be a const
register map which is a flyweight pattern. This is also an example of a
singleton pattern as it’s only initialised once regardless of how many
instances are created. Additionally if registers are actually wired to
discrete lines they might be read-only, so the memento pattern is used
to store the internal state.
“In computer science, reflection is the ability of a process to examine, introspect, and modify its own structure and behavior.”
See Wikipedia.
Dependency inversion is a software design principle that states that high-level modules should not depend on low-level modules. Instead, both should depend on abstractions. This allows for flexibility and ease of maintenance in the code. By using dependency inversion, changes made to low-level modules will not affect high-level modules, as long as the interface between them remains the same. This promotes loosely coupled code and allows for easier testing and scalability.
A hash function is a mathematical algorithm that takes in an input (such as a string or file) and produces a fixed-size output (known as a hash or digest) that represents the original input in a unique and reproducible way. This output is typically a sequence of characters and numbers that is much shorter than the input. Hash functions are used in cryptography to ensure the integrity and authenticity of data, as well as in data structures such as hash tables for efficient storage and retrieval of data. They are designed to be one-way functions, meaning that it is easy to compute the hash of an input, but difficult to reverse the process and determine the original input.
The Jonathan Boccara CppCon 105 STL Algorithms in Less Than an Hour is well worth a watch for a quick overview.
Quicksort is the poster boy of sorting algorithms.
The average case time complexity of quicksort is O(nlogn), where n is the number of elements in the array. This means that as the size of the array increases, the time taken to sort it will increase at a rate of nlogn. In the best case scenario, where the pivot element chosen is always the middle element, the time complexity is O(nlogn) as well.
However, in the worst case scenario, where the pivot element chosen is always the smallest or largest element, the time complexity becomes O(n^2). This can happen if the array is already sorted in either ascending or descending order.
Therefore, the time complexity of quicksort can range from O(nlogn) to O(n^2), depending on the input data. In general, quicksort tends to perform well on average, making it a popular choice for sorting algorithms.
Below is a Python implementation just for fun. But how is it implemented in C++?
def quicksort(arr):
# Base case
if len(arr) <= 1:
return arr
else:
# Choose pivot element
= arr[0]
pivot # Partition the array into two subarrays
= [x for x in arr[1:] if x < pivot]
left = [x for x in arr[1:] if x >= pivot]
right # Recursive calls on subarrays
= quicksort(left)
left_sorted = quicksort(right)
right_sorted # Combine the sorted subarrays with the pivot element
return left_sorted + [pivot] + right_sorted
# Example
= [5,2,7,1,9,3]
arr print(quicksort(arr))
# Output: [1,2,3,5,7,9]
Introsort is a sorting algorithm that combines the efficiency of quicksort with the guaranteed worst-case performance of heapsort. It uses a combination of quicksort, heapsort, and insertion sort to achieve the best of both worlds.
The algorithm starts by using quicksort to partition the input array into two smaller subarrays. However, unlike traditional quicksort, introsort sets a limit on how deep the recursion can go in order to prevent worst-case scenarios where the input is already sorted or almost sorted.
Once the depth limit is reached, the algorithm switches to heapsort. Heapsort is a comparison-based sorting algorithm that uses a binary heap data structure to sort elements. It has a worst-case time complexity of O(nlogn), making it a good choice for handling worst-case scenarios.
However, if the input size is small enough, introsort switches to insertion sort. Insertion sort is a simple sorting algorithm that is efficient for small input sizes. It has a best-case time complexity of O(n), making it a good choice for small subarrays.
The combination of these three sorting algorithms allows introsort to be efficient in most cases while still guaranteeing a worst-case time complexity of O(nlogn). This makes it a popular choice for sorting large datasets, especially in languages like C++ where it is the default sorting algorithm in the standard library.
Introsort is in place but not stable: i.e., equivalent
elements are not guaranteed to remain in the same order. However, the
Standard Library offers stable_
versions of various sorting
algorithms.
If additional memory is available,
stable_sort
remains O(n ∗ logn). However, if it fails to allocate, it will degrade to an O(n ∗ logn ∗ logn) algorithm.
https://leanpub.com/cpp-algorithms-guide
The threshold for switching to insertion sort varies for each compiler.
std::list
std::sort
requires random access to the elements, so
std::list
has its own sort method, but it still
(approximately) conforms to O(n log n). It can be implemented with merge
sort as moving elements is cheap with a linked list.
Considerations for choosing an algorithm: size of input, Type of input: e.g., partially sorted, random.
The complexity of a sorting algorithm is a measure of how its runtime or memory usage increases as the size of the input increases. This is typically measured in terms of the "big O" notation, which gives an upper bound on the worst-case time or space complexity of an algorithm. In general, the lower the complexity of a sorting algorithm, the more efficient it is.
Bubble Sort Bubble sort is a simple sorting algorithm that repeatedly steps through the list, compares adjacent elements and swaps them if they are in the wrong order. This process is repeated until the list is sorted. The worst-case time complexity of bubble sort is O(n^2), meaning that as the size of the input increases, the runtime of the algorithm will increase quadratically. This makes it an inefficient algorithm for large inputs.
Selection Sort Selection sort works by repeatedly finding the minimum element in the unsorted portion of the list and placing it at the beginning. This process is repeated until the list is sorted. The worst-case time complexity of selection sort is also O(n^2), making it equally as inefficient as bubble sort.
Insertion Sort Insertion sort works by building a sorted list one element at a time, by comparing each new element with the already sorted elements and inserting it into the correct position. The worst-case time complexity of insertion sort is O(n^2), but it has a better performance than bubble sort and selection sort for small inputs.
Merge Sort Merge sort is a divide and conquer algorithm that splits the list into smaller sublists, sorts them, and then merges them back together in the correct order. The worst-case time complexity of merge sort is O(n log n), making it much more efficient than the previous algorithms for large inputs. However, it also requires additional space for the merging process, making its space complexity O(n).
Quick Sort Quick sort is also a divide and conquer algorithm that works by selecting a pivot element and partitioning the list into two sublists – one with elements smaller than the pivot and one with elements larger than the pivot. This process is then repeated on the sublists until the entire list is sorted. The worst-case time complexity of quick sort is O(n^2), but it has an average case time complexity of O(n log n), making it one of the most efficient sorting algorithms.
Heap Sort Heap sort uses a data structure called a heap to sort the elements. It works by first building a heap from the given list, and then repeatedly removing the root (largest) element and placing it at the end of the list. The worst-case time complexity of heap sort is O(n log n), and its space complexity is O(1), making it an efficient sorting algorithm.
In summary, the complexities of these sorting algorithms can be ranked from least efficient to most efficient as follows: Bubble Sort and Selection Sort (O(n^2)) < Insertion Sort (O(n^2)) < Merge Sort (O(n log n)) < Quick Sort (O(n log n)) < Heap Sort (O(n log n) with O(1) space complexity. However, it is important to note that the actual performance of these algorithms may vary depending on the input data and implementation.
All of these are Θ(n log n) in all cases apart from Timsort has a Ω(n) and Quicksort has a terrible O(n^2) (if we happen to always pick the worst pivot). Quicksort is a good all rounder with O(n log n) average case. But it does have a O(n^2) worst case. It is said that this can be avoided by picking the pivot carefully but an example could be constructed where the chosen pivot is always the worst case.
OpenAI: 1. Increased Memory Capacity: Virtual memory allows a computer to use its secondary storage, such as a hard drive, as additional memory. This can increase the overall capacity of the computer’s memory, allowing it to handle larger and more complex tasks.
Cost Savings: Virtual memory allows for cost savings because it eliminates the need for expensive physical memory upgrades. Instead of purchasing more physical RAM, the computer can use virtual memory to expand its memory capacity.
Multitasking: With virtual memory, a computer can run multiple programs simultaneously without needing to have enough physical memory to handle all of them at once. The computer can store parts of programs that are not currently in use in virtual memory, freeing up physical memory for other tasks.
Improved Performance: Virtual memory can improve performance by allowing the computer to access data that is stored in virtual memory quicker than it could if the data was only stored on secondary storage. This is because virtual memory is managed by the computer’s operating system and is prioritized for quick access.
Better User Experience: Virtual memory can improve the overall user experience by allowing the computer to run more smoothly and efficiently. This means that programs will load faster, and the computer will be less likely to freeze or crash due to a lack of physical memory.
Flexibility: Virtual memory is a flexible solution that can be adjusted to meet the specific needs of a computer. The amount of virtual memory can be increased or decreased as needed, providing a customizable solution for managing a computer’s memory.
Protection Against Program Crashes: In a system with virtual memory, if a program crashes, it will only affect that specific program and not the entire system. This is because virtual memory keeps programs separated and protected from each other, reducing the risk of a system-wide crash.
Memory Management: Virtual memory allows for efficient memory management, as the operating system can allocate and distribute memory resources as needed. This ensures that memory is not wasted and that programs have access to the memory they require for optimal performance.
Compatibility: Virtual memory is compatible with various operating systems and hardware configurations, making it a widely used and accessible solution for managing memory.
Future Proofing: As programs and applications become more complex and require more memory, virtual memory provides a scalable solution for handling these advancements without the need for constant hardware upgrades. This makes virtual memory a beneficial long-term solution for managing memory. ___
Internal linkage refers to the scope of a variable or function within a single source file in a C++ program. This means that the variable or function can only be accessed or used within that specific source file and cannot be accessed by other source files in the program. Internal linkage is achieved by using the "static" keyword before the variable or function declaration.
External linkage, on the other hand, refers to the scope of a variable or function that can be accessed by multiple source files in a C++ program. This means that the variable or function can be declared in one source file and used in another source file. External linkage is achieved by using the "extern" keyword before the variable or function declaration.
Variables and functions with external linkage are stored in the global symbol table, which can be accessed by all source files in the program. This allows for sharing of variables and functions between different parts of the program, making it more efficient and organized.
In summary, internal linkage restricts the scope of a variable or function to a single source file, while external linkage allows for sharing of variables and functions between multiple source files in a program.
Dependencies on static variables in different translation units are, in general, a code smell and should be a reason for refactoring.
If an object or function inside such a translation unit has internal linkage, then that specific symbol is only visible to the linker within that translation unit. If an object or function has external linkage, the linker can also see it when processing other translation units. The static keyword, when used in the global namespace, forces a symbol to have internal linkage. The extern keyword results in a symbol having external linkage.
std::call_once
vs double checked lockingUsed to declare many things with internal linkage.
namespace {
int a = 0;
int b = 0;
int c = 0;
}
This a large, complicated topic. See C++ Move Semantics - The Complete Guide by Nicolai M. Josuttis.
std::move
&&
modifier indicates parameter is an object
that we intend to move from instead of copyingnoexcept
(they are anyway)noexcept
The big things for me are ranges and views, so effortlessly express
and manipulate ranges of data with zero iterators; they read naturally
and are also lazy-evaluated (and gcc 14 brings
ranges::to<>
to the party). Concepts I’ve used a
little, but mostly to narrow the allowed types of an auto parameter:
e.g., std::floating_point auto
.
I’ve attempted to deep dive into templates a few times and it’s
really quite an unpleasant language! However, whilst not a recent
addition, constexpr
is becoming increasingly supported; it
gives you the readability of “normal” code with the potential
performance benefits of compile-time. And it also finds UB! However, you
quickly hit the issue where something isn’t constexpr
so
you have to either implement it yourself or not bother.
Coroutines are quite high-profile but I haven’t got into them yet. Finally, the multidimensional array operator seemed like a game changer but I’ve only used it once.
You’ll have to reach into gcc 14 but std::print
is very
welcome; as are modules (which might actually be available in 13.2 with
-fmodules-fs
flag) but poorly supported and not quite what
I’d like yet.
Notable mentions from C++17 are structured bindings and execution policy.
constexpr
cmathSee the presentation by Marc Gregoire (CppCon 2022).
You’ve gotta track gcc head to avoid disappointment.
data[x, y, z]
and
std::mdspan
uz
literals.contains
for strings and containersconstexpr
for std::optional
and
std::variant
[[assume(true)]]
– introduce UB into your code with
assumptionsstd::print
and std::println
– gcc 14std::flat_map
and
std::flat_set
(contiguous binary search trees) – not in gcc
14import <iostream>;
– modules, gcc 13 with flag
-fmodules-ts
consteval
– immediate functions: only execute at
compile time<generator>
<stack_trace>
std::expected
std::byteswap
A lot of effort has gone into ranges and views in C++23.
starts_with
shift_left
ranges::to<>
– gcc14, convert a range to a vector
(for instance)find_if
contains
contains_subrange
std::ranges::fold_left
– gcc 13.1zip
adjacent
pairwise
chunk
slide
chunk_by
stride
– take every nth elementjoin/join_with
– I’ve had no luck with these outside of
a trivial examplerepeat
iota
– infinite views may be more performant as no
boundary checkslide
– love theseenumerate
– gcc 13.2, easily enumerate your range-based
for loops, you get a pair of index and the element.contains
for maps.starts_with
for stringsstd::jthread
– thread you don’t have to explicitly join
(it’s the same as std::thread
but joins in the destructor),
also has a built-in stop tokenstd::barrier
for thread synchronisation (like
std::latch
but reusable)std::filesystem
– from Booststd::string_view
std::clamp
[[maybe_unused]]
<string_view>
std::byte
Also see cppreference.
C++14 is an extension and improvement of C++11.
0b1111000
auto
return typeauto
in lambda
parameterstemplate <class T> constexpr T bins = T{24'576};
decltype(auto)
constexpr