A debugger is a tool that allows us to stop the execution of a
program and inspect its state at a certain point. We will consider the
builtin python debugger in the following, but the principle applies to
other tools like gdb.
The basic way to use a debugger is to spawn an interactive debug
shell and set a breakpoint (with the syntax
b <location>, where <location>
might be a line number, a function name, etc.). The program will execute
normally until it reaches that breakpoint, where it will pause. At that
point, we can use commands like p <variable> to print
the value of a variable at that point, c to continue the
execution until the next breakpoint, n to move to the next
step in the execution and s to step into a
function call.
The following program is a simple parser and evaluator for arithmetic expressions.
import re
def tokenize(expr):
"""Split an expression into numbers and operators."""
return re.findall(r'\d+|[+\-*()]', expr)
def parse_factor(tokens, pos):
"""Parse a number or parenthesized expression."""
if tokens[pos] == '(':
val, pos = parse_expr(tokens, pos + 1)
return val, pos + 1 # skip ')'
return int(tokens[pos]), pos + 1
def parse_term(tokens, pos):
"""Parse multiplication."""
left, pos = parse_factor(tokens, pos)
if pos < len(tokens) and tokens[pos] == '*':
right, pos = parse_term(tokens, pos + 1)
return left * right, pos
return left, pos
def parse_expr(tokens, pos):
"""Parse addition and subtraction."""
left, pos = parse_term(tokens, pos)
if pos < len(tokens) and tokens[pos] in ('+', '-'):
op = tokens[pos]
right, pos = parse_expr(tokens, pos + 1)
if op == '+':
return left + right, pos
else:
return left - right, pos
return left, pos
def evaluate(expr):
tokens = tokenize(expr)
result, _ = parse_expr(tokens, 0)
return result
tests = [
("2 + 3", 5),
("2 * 3 + 1", 7),
("10 - 3 - 2", 5),
("(10 - 3) * 2", 14),
]
for expr, expected in tests:
result = evaluate(expr)
status = "OK" if result == expected else "ERROR"
print(f"{expr:>16} = {result:<4} expected {expected:<4} [{status}]")
As you can see, the third test fails. Run the program in debug mode
with python -m pdb ex1.py and try to identify where the bug
is coming from.
Below is a barebones C implementation of dynamic vectors.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
int* data;
int size;
int capacity;
} Vec;
Vec* vec_new(int capacity) {
Vec* v = malloc(sizeof(Vec));
v->data = malloc(capacity * sizeof(int));
v->size = 0;
v->capacity = capacity;
return v;
}
void vec_push(Vec* v, int value) {
if (v->size == v->capacity) {
v->capacity *= 2;
int* new_data = malloc(v->capacity * sizeof(int));
memcpy(new_data, v->data, v->size * sizeof(int));
free(v->data);
v->data = new_data;
}
v->data[v->size++] = value;
}
void vec_free(Vec* v) {
free(v->data);
free(v);
}
int main() {
Vec* scores = vec_new(2);
vec_push(scores, 85);
vec_push(scores, 92);
int* top_score = &scores->data[1];
vec_push(scores, 78);
vec_push(scores, 95);
vec_push(scores, 61);
printf("Top score was: %d\n", *top_score);
vec_free(scores);
return 0;
}
If we compile and run it normally:
gcc -o ex2 ex2.c && ./ex2 we see that the output is
wrong. Try to identify the problem and then compile it with the address
sanitization option: gcc -fsanitize=address -g -o ex2 ex2.c
to see if it can automatically detect it.
The following C++ program, which “mock installs” packages together
with their dependencies. It first builds a list of dependencies to
install, and the installs them (simulated here by just printing
“installing …”). A simple compile and run
g++ -std=c++17 -o ex2b ex2b.cpp shows again that some
packages are missed. Same as before, try to discover what is happen, and
then see if
g++ -std=c++17 -fsanitize=address -o ex2b ex2b.cpp && ./ex2b
catches an error with the program.
#include <iostream>
#include <vector>
#include <set>
#include <map>
#include <string>
int main() {
std::map<std::string, std::vector<std::string>> deps = {
{"curl", {"openssl", "nghttp2"}},
{"openssl", {"zlib", "libcrypt"}},
{"nghttp2", {"zlib"}},
{"python", {"zlib", "libffi", "readline"}},
};
std::cout << "Resolving dependencies... \n";
std::vector<std::string> to_install = {"curl", "python", "vim"};
std::set<std::string> seen(to_install.begin(), to_install.end());
for (const auto& pkg : to_install) {
if (deps.count(pkg)) {
for (const auto& dep : deps[pkg]) {
if (seen.insert(dep).second) {
to_install.push_back(dep);
}
}
}
}
std::cout << "Packages to install: " << to_install.size() << " packages" << std::endl;
for (const auto& pkg : to_install) {
std::cout << " installing " << pkg << std::endl;
}
std::cout << "done." << std::endl;
return 0;
}
Valgrind is a tool that runs programs in a managed execution environment where it can detect, among other things, memory errors like leaks or use-after-free’s. The following program contains 3 bugs related to memory-management.
// Compile: gcc -g -o ex3 ex3.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
char* name;
int* grades;
int count;
} Student;
Student* new_student(const char* name, int grades[], int n) {
Student* s = malloc(sizeof(Student));
s->name = malloc(strlen(name));
strcpy(s->name, name);
s->grades = malloc(n * sizeof(int));
memcpy(s->grades, grades, n * sizeof(int));
s->count = n;
return s;
}
float average(Student* s) {
int sum = 0;
for (int i = 0; i <= s->count; i++) {
sum += s->grades[i];
}
return (float)sum / s->count;
}
void free_student(Student* s) {
free(s->name);
free(s);
}
int main() {
int grades[] = {85, 90, 78, 92};
Student* alice = new_student("Alice", grades, 4);
printf("%s: avg = %.1f\n", alice->name, average(alice));
free_student(alice);
return 0;
}
Try first to identify them yourselves. Then, run the program in
valgrind: valgrind --leak-check=full ./ex3 and
try to understand the output and to relate it to what actually happens
in the code.
A profiler allows us to get statistics regarding the resources used
by the program, commonly CPU usage, or memory. An example is the
perf program on Linux (search for installation instructions
specific to your system).
The following is a rather inefficient word frequency counter.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>
std::vector<std::string> tokenize(const std::string& text) {
std::vector<std::string> words;
std::istringstream stream(text);
std::string word;
while (stream >> word) {
for (auto& c : word) c = std::tolower(c);
words.push_back(word);
}
return words;
}
std::vector<std::string> unique_words(const std::vector<std::string>& words) {
std::vector<std::string> seen;
for (const auto& w : words) {
bool found = false;
for (const auto& s : seen) {
if (s == w) { found = true; break; }
}
if (!found) seen.push_back(w);
}
return seen;
}
int count_word(const std::vector<std::string>& words, const std::string& target) {
int n = 0;
for (const auto& w : words) {
if (w == target) n++;
}
return n;
}
int main(int argc, char* argv[]) {
const char* filename = argc > 1 ? argv[1] : "big.txt";
std::ifstream file(filename);
if (!file) { std::cerr << "cannot open " << filename << std::endl; return 1; }
std::string text((std::istreambuf_iterator<char>(file)),
std::istreambuf_iterator<char>());
auto words = tokenize(text);
auto vocab = unique_words(words);
std::vector<std::pair<std::string, int>> freqs;
for (const auto& w : vocab) {
freqs.push_back({w, count_word(words, w)});
}
std::partial_sort(freqs.begin(),
freqs.begin() + std::min<size_t>(10, freqs.size()),
freqs.end(),
[](const auto& a, const auto& b) { return a.second > b.second; });
for (int i = 0; i < 10 && i < (int)freqs.size(); i++) {
std::cout << freqs[i].first << " " << freqs[i].second << std::endl;
}
return 0;
}
Compile it and run the executable through perf to
generate a so-called “flamegraph”:
perf record -g ./ex4 && perf script report flamegraph && firefox flamegraph.html
(change the browser accordingly if needed).
You can also try to run profile a real program on your machine, for
instance
perf record -g openssl speed -seconds 3 sha256 && perf script report flamegraph && firefox flamegraph.html
C++ compiler errors related to templates are notoriously inscrutable. Below is a small example.
/*
* Compile: g++ -std=c++17 -o ex5 ex5.cpp
*/
#include <iostream>
#include <map>
#include <vector>
#include <string>
#include <algorithm>
int main() {
std::map<std::string, std::vector<int>> gradebook = {
{"alice", {90, 85, 92}},
{"bob", {78, 88}},
{"charlie", {95, 70, 80}},
};
std::map<std::string, int> averages;
for (auto& [name, scores] : gradebook) {
int sum = 0;
for (int s : scores) sum += s;
averages[name] = sum / scores.size();
}
std::sort(averages.begin(), averages.end());
std::cout << "Rankings:" << std::endl;
for (auto& [name, avg] : averages) {
std::cout << " " << name << ": " << avg << std::endl;
}
return 0;
}
Try to compile the program and observe the crazy error message. Use an LLM to try to get a human-readable version of the message and how it relates to the program.