In this post I’ll try to clarify some of the misconceptions about C++ which I
often find in various code bases.
People have their habits
C++ is a complex language. Part of this complexity stems from legacy. With
legacy comes source code which is often bad. Code that is difficult to
maintain, which relies on false assumptions. These assumptions have been
reinforced in programmers’ minds early on when the language was in its peak
popularity (which was probably like 20 years ago) or (even worse) have been
adopted from C world by people who still think that C++ is just C with classes.
The language evolved and it’s time to clarify some of these (at least for my
own sake).
First misconception: static local variables as an optimisation attempt
I find this one quite often. The premise here is that since static
local
variables are initialised only once, time is saved when re-entering the
function subsequently. Here’s an example:
1
2
3
4
5
|
int foo_with_static_int() {
// here I save time because `i` will be initialised only once
static const int i = 123;
return i;
}
|
Of source people often defend this approach when the local variable requires
more complex initialisation like i.e.:
1
2
3
4
5
6
7
8
|
int vector_with_static() {
static const std::vector<int> v = {
1,2,3,4,5,6,7,8,9,10
};
// completely arbitrary index
return v[4];
}
|
To prove that this assumption is false, let’s use
google-benchmark. Here’s my test code
(which can be found here as well):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
|
<benchmark/benchmark.h>
int foo_with_static_int() {
static const int i = 123;
return i;
}
int foo_without_static_int() {
const int i = 123;
return i;
}
int vector_with_static() {
static const std::vector<int> v = {
1,2,3,4,5,6,7,8,9,10
};
return v[4];
}
int vector_without_static() {
const std::vector<int> v{
1,2,3,4,5,6,7,8,9,10
};
return v[4];
}
static void BM_foo_with_static_int(benchmark::State& state) {
for (auto _ : state) {
foo_with_static_int();
}
}
static void BM_foo_without_static_int(benchmark::State& state) {
for (auto _ : state) {
foo_without_static_int();
}
}
static void BM_vector_with_static(benchmark::State& state) {
for (auto _ : state) {
vector_with_static();
}
}
static void BM_vector_without_static(benchmark::State& state) {
for (auto _ : state) {
vector_without_static();
}
}
BENCHMARK(BM_foo_with_static_int);
BENCHMARK(BM_foo_without_static_int);
BENCHMARK(BM_vector_without_static);
BENCHMARK(BM_vector_with_static);
BENCHMARK_MAIN();
|
And here are the results:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
Run on (8 X 2500 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 256 KiB (x4)
L3 Unified 6144 KiB (x1)
Load Average: 2.58, 2.00, 1.78
--------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------
BM_foo_with_static_int 4.81 ns 4.81 ns 145309010
BM_foo_without_static_int 4.81 ns 4.81 ns 145371174
BM_vector_without_static 372 ns 372 ns 1885227
BM_vector_with_static 7.38 ns 7.38 ns 93426760
|
So… I guess I was wrong? static
indeed helps? No! This is a debug
build, as soon as you enable optimisations:
meson configure --buildtype release bld
all that premature optimisation attempts disappear:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
Run on (8 X 2500 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 256 KiB (x4)
L3 Unified 6144 KiB (x1)
Load Average: 2.90, 2.02, 1.81
--------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------
BM_foo_with_static_int 0.000 ns 0.000 ns 1000000000
BM_foo_without_static_int 0.000 ns 0.000 ns 1000000000
BM_vector_without_static 0.000 ns 0.000 ns 1000000000
BM_vector_with_static 0.291 ns 0.291 ns 1000000000
|
It’s even visible that code with static
is even worse because compiler cannot
just optimise the whole lookup away and has to assure the lifetime of the
object so, the check if the variable has been already initialised (which
static
unavoidably introduces) will always be there.
It’s very clear what happens once you disassemble to code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
$ objdump --demangle --disassemble-functions="vector_without_static()" ./bld/static/static
./bld/static/static: file format Mach-O 64-bit x86-64
Disassembly of section __TEXT,__text:
00000001000046e0 vector_without_static():
1000046e0: 55 pushq %rbp
1000046e1: 48 89 e5 movq %rsp, %rbp
1000046e4: b8 05 00 00 00 movl $5, %eax
1000046e9: 5d popq %rbp
1000046ea: c3 retq
1000046eb: 0f 1f 44 00 00 nopl (%rax,%rax)
|
… and the version with static
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
|
$ objdump --demangle --disassemble-functions="vector_with_static()" ./bld/static/static
./bld/static/static: file format Mach-O 64-bit x86-64
Disassembly of section __TEXT,__text:
00000001000045c0 vector_with_static():
1000045c0: 55 pushq %rbp
1000045c1: 48 89 e5 movq %rsp, %rbp
1000045c4: 53 pushq %rbx
1000045c5: 50 pushq %rax
1000045c6: 8a 05 a4 be 02 00 movb 179876(%rip), %al
1000045cc: 84 c0 testb %al, %al
1000045ce: 74 11 je 17 <__Z18vector_with_staticv+0x21>
1000045d0: 48 8b 05 81 be 02 00 movq 179841(%rip), %rax
1000045d7: 8b 40 10 movl 16(%rax), %eax
1000045da: 48 83 c4 08 addq $8, %rsp
1000045de: 5b popq %rbx
1000045df: 5d popq %rbp
1000045e0: c3 retq
1000045e1: 48 8d 3d 88 be 02 00 leaq 179848(%rip), %rdi
1000045e8: e8 fd 40 02 00 callq 147709 <dyld_stub_binder+0x1000286ea>
1000045ed: 85 c0 testl %eax, %eax
1000045ef: 74 df je -33 <__Z18vector_with_staticv+0x10>
1000045f1: 48 c7 05 6c be 02 00 00 00 00 00 movq $0, 179820(%rip)
1000045fc: 48 c7 05 59 be 02 00 00 00 00 00 movq $0, 179801(%rip)
100004607: 48 c7 05 46 be 02 00 00 00 00 00 movq $0, 179782(%rip)
100004612: bf 28 00 00 00 movl $40, %edi
100004617: e8 9e 40 02 00 callq 147614 <dyld_stub_binder+0x1000286ba>
10000461c: 48 89 05 35 be 02 00 movq %rax, 179765(%rip)
100004623: 48 8d 35 2e be 02 00 leaq 179758(%rip), %rsi
10000462a: 48 89 c1 movq %rax, %rcx
10000462d: 48 83 c1 28 addq $40, %rcx
100004631: 48 89 0d 30 be 02 00 movq %rcx, 179760(%rip)
100004638: 48 8b 15 51 5c 02 00 movq 154705(%rip), %rdx
10000463f: 48 89 50 20 movq %rdx, 32(%rax)
100004643: 48 8b 15 3e 5c 02 00 movq 154686(%rip), %rdx
10000464a: 48 89 50 18 movq %rdx, 24(%rax)
10000464e: 48 8b 15 2b 5c 02 00 movq 154667(%rip), %rdx
100004655: 48 89 50 10 movq %rdx, 16(%rax)
100004659: 48 8b 15 18 5c 02 00 movq 154648(%rip), %rdx
100004660: 48 89 50 08 movq %rdx, 8(%rax)
100004664: 48 8b 15 05 5c 02 00 movq 154629(%rip), %rdx
10000466b: 48 89 10 movq %rdx, (%rax)
10000466e: 48 89 0d eb bd 02 00 movq %rcx, 179691(%rip)
100004675: 48 8d 3d 44 00 00 00 leaq 68(%rip), %rdi
10000467c: 48 8d 15 7d b9 ff ff leaq -18051(%rip), %rdx
100004683: e8 44 40 02 00 callq 147524 <dyld_stub_binder+0x1000286cc>
100004688: 48 8d 3d e1 bd 02 00 leaq 179681(%rip), %rdi
10000468f: e8 5c 40 02 00 callq 147548 <dyld_stub_binder+0x1000286f0>
100004694: e9 37 ff ff ff jmp -201 <__Z18vector_with_staticv+0x10>
100004699: 48 89 c3 movq %rax, %rbx
10000469c: 48 8d 3d cd bd 02 00 leaq 179661(%rip), %rdi
1000046a3: e8 3c 40 02 00 callq 147516 <dyld_stub_binder+0x1000286e4>
1000046a8: 48 89 df movq %rbx, %rdi
1000046ab: e8 78 3e 02 00 callq 147064 <dyld_stub_binder+0x100028528>
1000046b0: 0f 0b ud2
1000046b2: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
1000046bc: 0f 1f 40 00 nopl (%rax)
|
So, please. Don’t just use static
as a premature optimisation attempt on
function local variables! It won’t buy you any faster code - it’s just wrong.
If you’ve got a local variable that really requires complex initialisation
think about your design, maybe your function should become a class on its own?
Second misconception: iostreams
are bad
There seems to be strong preference to stick with classical printf/FILE*
APIs
rather than reliance on iostreams
. This is difficult to explain from the
objective standpoint. Classical C++ IO APIs are non-portable, making work with
different data types a nightmare. I.e:
1
|
printf("size: %u", list.size());
|
What if size()
returns a 64-bit type? Sure, you can use platform-dependant
formatting macros like:
1
|
printf("size: " PRIu64, list.size());
|
but this looks dodgy and is error prone.
Another argument against streams I hear quite often is that with streams the
formatting directives “stick”. Which is partially true but this can be easily
remedied if required, with a function specifically designed to deal with this:
flags:
1
2
3
|
const auto origFlags = std::cout.flags();
std::cout << std::hex << "0x" << 123 << std::endl;
std::cout.flags(origFlags);
|
You can even create a nice RAII wrapper if needed. iostreams
provide a
coherent API with the rest of the STL. In complex large software projects,
consistency and the design matters the most.
Third misconception: Avoiding exceptions
This again, often is discussed in context of performance.
Exceptions are slow, therefore we should avoid them.
No! The repercussion of this approach is that the error handling within the
system is more or less unspecified. Some parts of the code use error codes,
some return errors via an argument. Nothing is consistent and is a source of
bugs. Before any optimisation attempt is made, one should first evaluate if
the problem exists at all to begin with. Sure, I agree, when C++ was in its
infancy, it may have been the case that exceptions were unacceptably costly.
This could’ve been additionally reinforced by the fact that machines were a lot
slower as well. We’ve made a lot of progress since then though!
The discussion on this one is quite a controversial topic, hence I’m gonna
support myself with core
guidelines
again which I think, goes into the details in this regard.