This repository documents a small experiment to answer a common question:
Is hand-written Assembly really faster than C/C++?
The goal is not to make a universal claim, but to perform a fair, reproducible comparison on the same machine, using the same algorithm, compiled down to native machine code.
We implement the same array-sum algorithm in two ways:
- C++ (
sum.cpp) - Hand-written x86-64 Assembly (
sum_manual.s)
Both implementations:
- Run on the same AMD x86-64 system
- Are linked into the same benchmarking harness
- Operate on the same data
- Produce the same result
- Are benchmarked repeatedly to reduce noise
-
Cpp/sum.cpp
High-level C++ implementation of array summation. -
Assembly/sum_manual.s
Hand-written x86-64 assembly implementation of the same algorithm, following the System V AMD64 ABI.
-
sum_gcc.s
Compiler-generated assembly output fromsum.cpp(produced using-S). -
sum_gcc.o
Object file (machine code) generated from the compiler-produced assembly. -
sum_manual.o
Object file (machine code) generated from the hand-written assembly.
-
bench_gcc
Benchmark executable linked against the C++ implementation. -
bench_asm
Benchmark executable linked against the hand-written assembly implementation.
-
benchmark.cpp
Common benchmarking harness used by both implementations.
It measures execution time of the sum routine and prints:- Result
- Time taken (in microseconds)
-
run_benchmarking.sh
Script that runs each executable multiple times and reports:- Average time
- Minimum time
- Maximum time
g++ -O3 -march=native -S Cpp/sum.cpp -o Cpp/sum_gcc.sThis step shows what the compiler thinks optimal assembly looks like.
g++ -O3 -march=native -c Cpp/sum.cpp -o Cpp/sum_gcc.oThe .o file contains machine code, not text.
as Assembly/sum_manual.s -o Assembly/sum_manual.oThis converts the hand-written assembly into machine code.
g++ -O3 -march=native benchmark.cpp Cpp/sum_gcc.o -o bench_gccg++ -O3 -march=native benchmark.cpp Assembly/sum_manual.o -o bench_asmAt this point:
- Both executables are complete
- Both contain native machine code
- Both use the same benchmark logic
To inspect the actual machine instructions used:
objdump -d Assembly/sum_manual.o > Assembly/asm.txt
objdump -d Cpp/sum_gcc.o > Cpp/gcc.txtThese .txt files are disassembly dumps used for inspection and screenshots.
They are not used during execution.
The benchmark script runs each executable multiple times:
./run_benchmarking.shFor each implementation it reports:
- Number of runs
- Average execution time
- Minimum execution time
- Maximum execution time
This reduces the impact of:
- OS scheduling noise
- Cache effects
- One-off anomalies
-
Both C++ and Assembly are ultimately translated into machine code
-
The CPU executes instructions, not source languages
-
Hand-written assembly is not automatically faster
-
Modern compilers often generate more optimized code by:
- Reordering instructions
- Unrolling loops
- Exploiting instruction-level parallelism
- Targeting specific microarchitectures
Performance is less about how close a language is to the metal and more about how well intent maps to modern hardware.
- If any source changes, the corresponding executable must be rebuilt
- Benchmark results are only valid if binaries match their source
This experiment demonstrates behavior on one system, for one workload. It does not claim that assembly is always slower — only that zperformance depends on optimization quality, not language choice alone.