Wrote a little program that allocates randomly sizes from 0 to 100 MB and compared the results for the default allocator and hoard. Hoard behaves better both as speed and overhead, however not much better. Will publish the results as soon I will have time to put them in a decent format.
I wrote a little benchmark (single threaded) to compare the overhead of allocated memory generated by different allocators for small blocks. The candidates are: thread building blocks scalable allocator, hoard, google’s tcmalloc and the standard malloc. The benchmark allocates a total useful memory of around 300MB in random size blocks. The limit of the sizes is determined by the parameter to the program. I run a batch for each allocator for max sizes: 8, 16, 32, 64 etc. Below are the results:
./standard max_size: 8, time: 17810000 ticks, util size: 273471 kB, virt mem: 1250072 kB, overhead 357%
./tbb max_size: 8, time: 15190000 ticks, util size: 273471 kB, virt mem: 640204 kB, overhead 134%
./google max_size: 8, time: 18760000 ticks, util size: 273471 kB, virt mem: 642636 kB, overhead 134%
./hoard max_size: 8, time: 16430000 ticks, util size: 273471 kB, virt mem: 626300 kB, overhead 129%
./standard max_size: 16, time: 9680000 ticks, util size: 293005 kB, virt mem: 683792 kB, overhead 133%
./tbb max_size: 16, time: 8870000 ticks, util size: 293005 kB, virt mem: 461004 kB, overhead 57%
./google max_size: 16, time: 9270000 ticks, util size: 293005 kB, virt mem: 466380 kB, overhead 59%
./hoard max_size: 16, time: 7830000 ticks, util size: 293005 kB, virt mem: 450428 kB, overhead 53%
./standard max_size: 32, time: 5220000 ticks, util size: 302209 kB, virt mem: 472988 kB, overhead 56%
./tbb max_size: 32, time: 3520000 ticks, util size: 302209 kB, virt mem: 385228 kB, overhead 27%
./google max_size: 32, time: 4650000 ticks, util size: 302209 kB, virt mem: 393292 kB, overhead 30%
./hoard max_size: 32, time: 4320000 ticks, util size: 302209 kB, virt mem: 376508 kB, overhead 24%
./standard max_size: 64, time: 2590000 ticks, util size: 307102 kB, virt mem: 386396 kB, overhead 25%
./tbb max_size: 64, time: 1620000 ticks, util size: 307102 kB, virt mem: 351436 kB, overhead 14%
./google max_size: 64, time: 2530000 ticks, util size: 307102 kB, virt mem: 360396 kB, overhead 17%
./hoard max_size: 64, time: 2230000 ticks, util size: 307102 kB, virt mem: 343676 kB, overhead 11%
./standard max_size: 128, time: 1140000 ticks, util size: 309567 kB, virt mem: 347720 kB, overhead 12%
./tbb max_size: 128, time: 1090000 ticks, util size: 309567 kB, virt mem: 340172 kB, overhead 9%
./google max_size: 128, time: 1580000 ticks, util size: 309567 kB, virt mem: 345932 kB, overhead 11%
./hoard max_size: 128, time: 1160000 ticks, util size: 309567 kB, virt mem: 335740 kB, overhead 8%
./standard max_size: 256, time: 830000 ticks, util size: 310797 kB, virt mem: 329504 kB, overhead 6%
./tbb max_size: 256, time: 620000 ticks, util size: 310797 kB, virt mem: 347340 kB, overhead 11%
./google max_size: 256, time: 900000 ticks, util size: 310797 kB, virt mem: 344908 kB, overhead 10%
./hoard max_size: 256, time: 860000 ticks, util size: 310797 kB, virt mem: 339004 kB, overhead 9%
I was made again aware about Intel’s thread building blocks . The default allocator on hp-ux (pa-risc) allocates 16 byte of memory even for blocks smaller than 8 bytes. If you have a system that allocates hundreds of megabytes of small blocks of usefull memory smaller than 8 byte, this would mean a serious overhead! I tryed hoard (version 2.1.2a) the only one that compiles on hp-ux pa risc, but even hoard allocates 16 bytes.
TBB seems to allocate 8 bytes, but there is a little bottleneck that has to be solved. During initialisation of o multithreaded application the first library to be initialised is libpthread, probably due to some global variable. This requires memory, and if you indirect malloc to tbb allocator, this enters a deadlock, that can be seen on the following callstack:
#0 0xc020ba90 in __ksleep+0x10 () from /lib/libc.2
#1 0xc006023c in __spin_lock+0x10c () from /lib/libpthread.1
#2 0xc0057dc0 in pthread_key_create+0x34 () from /lib/libpthread.1
#3 0xc504cc40 in _Z17initMemoryManagerv () at ../../src/tbbmalloc/MemoryAllocator.cpp:1217
#4 0xc504cda8 in _Z19checkInitializationv () at ../../src/tbbmalloc/MemoryAllocator.cpp:1238
#5 0xc504d25c in scalable_malloc () at ../../src/tbbmalloc/MemoryAllocator.cpp:1343
#6 0xc07cf3b8 in malloc+0x58 () from custom_malloc.cpp
#7 0xc005845c in __pthread_specific_startup+0x20 () from /lib/libpthread.1
#8 0xc00597ec in __pthread_startup+0x1a8 () from /lib/libpthread.1
#9 0xc0027e58 in finish_dld_main+0x1520 () from /usr/lib/dld.sl
#10 0xc00154b8 in _dld_main+0x1c8 () from /usr/lib/dld.sl
#11 0x23b8e08 in __map_dld+0x650 ()
#12 0x23b82dc in $START$+0xd4 ()
#13 0xc006023c in __spin_lock+0x10c () from /lib/libpthread.1
not officially yet announced but there are rumours about the new device… it will also feature dual sim… very practical.. buy a sim from at least 2 providers in your country and configure the calls to go through the right provider..