thread building blocks on hp-ux, continued

Scenario:

legacy application (server) that uses 1Gb of memory.

Using the tbb allocator:

memory usage dropped to 1/3, that is 300MB!

this is not true… the measurement was wrong… however there is a gain of ~17%.

but:

it is impossible to replace completely the malloc, as tbb calls malloc to allocate large blocks used for the small blocks.

(partial) solution:

override the c++ new operator, and redirect it to the tbb allocator.

Recommendation for tbb:

extend the definition of the allocator function with a further formal parameter pointer to the default malloc function.

thread building blocks on hp-ux, the chicken and the egg

I was made again aware about Intel’s thread building blocks . The default allocator on hp-ux (pa-risc) allocates 16 byte of memory even for blocks smaller than 8 bytes. If you have a system that allocates hundreds of megabytes of small blocks of usefull memory smaller than 8 byte, this would mean a serious overhead! I tryed hoard (version 2.1.2a) the only one that compiles on hp-ux pa risc, but even hoard allocates 16 bytes.

TBB seems to allocate 8 bytes, but there is a little bottleneck that has to be solved. During initialisation of o multithreaded application the first library to be initialised is libpthread, probably due to some global variable. This requires memory, and if you indirect malloc to tbb allocator, this enters a deadlock, that can be seen on the following callstack:

#0 0xc020ba90 in __ksleep+0x10 () from /lib/libc.2
#1 0xc006023c in __spin_lock+0x10c () from /lib/libpthread.1
#2 0xc0057dc0 in pthread_key_create+0x34 () from /lib/libpthread.1
#3 0xc504cc40 in _Z17initMemoryManagerv () at ../../src/tbbmalloc/MemoryAllocator.cpp:1217
#4 0xc504cda8 in _Z19checkInitializationv () at ../../src/tbbmalloc/MemoryAllocator.cpp:1238
#5 0xc504d25c in scalable_malloc () at ../../src/tbbmalloc/MemoryAllocator.cpp:1343
#6 0xc07cf3b8 in malloc+0x58 () from custom_malloc.cpp
#7 0xc005845c in __pthread_specific_startup+0x20 () from /lib/libpthread.1
#8 0xc00597ec in __pthread_startup+0x1a8 () from /lib/libpthread.1
#9 0xc0027e58 in finish_dld_main+0x1520 () from /usr/lib/dld.sl
#10 0xc00154b8 in _dld_main+0x1c8 () from /usr/lib/dld.sl
#11 0x23b8e08 in __map_dld+0x650 ()
#12 0x23b82dc in $START$+0xd4 ()
#13 0xc006023c in __spin_lock+0x10c () from /lib/libpthread.1