Skip to content

A simple benchmarking suite to bust the myth about assembly being a faster language. Check out the LinkedIn Post

Notifications You must be signed in to change notification settings

rugbedbugg/Runtime-Benchmarking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

C++ vs Hand-Written Assembly: A Fair Performance Comparison

This repository documents a small experiment to answer a common question:

Is hand-written Assembly really faster than C/C++?

The goal is not to make a universal claim, but to perform a fair, reproducible comparison on the same machine, using the same algorithm, compiled down to native machine code.


Overview of the Experiment

We implement the same array-sum algorithm in two ways:

  1. C++ (sum.cpp)
  2. Hand-written x86-64 Assembly (sum_manual.s)

Both implementations:

  • Run on the same AMD x86-64 system
  • Are linked into the same benchmarking harness
  • Operate on the same data
  • Produce the same result
  • Are benchmarked repeatedly to reduce noise

Files and Their Roles

Source Files

  • Cpp/sum.cpp
    High-level C++ implementation of array summation.

  • Assembly/sum_manual.s
    Hand-written x86-64 assembly implementation of the same algorithm, following the System V AMD64 ABI.


Intermediate Build Artifacts

  • sum_gcc.s
    Compiler-generated assembly output from sum.cpp (produced using -S).

  • sum_gcc.o
    Object file (machine code) generated from the compiler-produced assembly.

  • sum_manual.o
    Object file (machine code) generated from the hand-written assembly.


Executables

  • bench_gcc
    Benchmark executable linked against the C++ implementation.

  • bench_asm
    Benchmark executable linked against the hand-written assembly implementation.


Benchmarking

  • benchmark.cpp
    Common benchmarking harness used by both implementations.
    It measures execution time of the sum routine and prints:

    • Result
    • Time taken (in microseconds)
  • run_benchmarking.sh
    Script that runs each executable multiple times and reports:

    • Average time
    • Minimum time
    • Maximum time

Build and Execution Pipeline

1. C++ Compilation Path

Generate assembly from C++

g++ -O3 -march=native -S Cpp/sum.cpp -o Cpp/sum_gcc.s

This step shows what the compiler thinks optimal assembly looks like.

Compile to object file

g++ -O3 -march=native -c Cpp/sum.cpp -o Cpp/sum_gcc.o

The .o file contains machine code, not text.


2. Hand-Written Assembly Path

Assemble the assembly source

as Assembly/sum_manual.s -o Assembly/sum_manual.o

This converts the hand-written assembly into machine code.


3. Linking with the Benchmark Harness

C++ version

g++ -O3 -march=native benchmark.cpp Cpp/sum_gcc.o -o bench_gcc

Assembly version

g++ -O3 -march=native benchmark.cpp Assembly/sum_manual.o -o bench_asm

At this point:

  • Both executables are complete
  • Both contain native machine code
  • Both use the same benchmark logic

Object Code Inspection (Optional)

To inspect the actual machine instructions used:

objdump -d Assembly/sum_manual.o > Assembly/asm.txt
objdump -d Cpp/sum_gcc.o        > Cpp/gcc.txt

These .txt files are disassembly dumps used for inspection and screenshots. They are not used during execution.


Benchmarking Methodology

The benchmark script runs each executable multiple times:

./run_benchmarking.sh

For each implementation it reports:

  • Number of runs
  • Average execution time
  • Minimum execution time
  • Maximum execution time

This reduces the impact of:

  • OS scheduling noise
  • Cache effects
  • One-off anomalies

Key Takeaways

  • Both C++ and Assembly are ultimately translated into machine code

  • The CPU executes instructions, not source languages

  • Hand-written assembly is not automatically faster

  • Modern compilers often generate more optimized code by:

    • Reordering instructions
    • Unrolling loops
    • Exploiting instruction-level parallelism
    • Targeting specific microarchitectures

Performance is less about how close a language is to the metal and more about how well intent maps to modern hardware.


Notes

  • If any source changes, the corresponding executable must be rebuilt
  • Benchmark results are only valid if binaries match their source

Disclaimer

This experiment demonstrates behavior on one system, for one workload. It does not claim that assembly is always slower — only that zperformance depends on optimization quality, not language choice alone.

About

A simple benchmarking suite to bust the myth about assembly being a faster language. Check out the LinkedIn Post

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors