If you have dozens or hundreds of threads doing work they are probably producing some sort of output. I'm not referring to logger output, but to permanent storage, perhaps in a database. If you cannot store the data as quickly as you produce it, you will eventually run into problems. If you are just barely able to keep up with data storage then your scalability will be limited.
While debugging my massively multithreaded C++ application I would notice times where the application would seem to pause for a few moments. During one of these pauses I halted the application and attached to it with the debugger (GDB). From within GDB I listed (
info threads), switched to (
thread <num>) and looked at the stack (
bt) of each thread running.
I saw something surprising and very telling. Nearly every single thread that was supposed to be performing work was actually blocked on a mutex inside of either
If your application does not scale as your threads increase, you should check the code to make sure there are no hidden mutexes limiting your concurrency.
Let's not overcomplicate things here. If you have 160 threads all trying to run concurrently, even if they are doing little to no work, they are all still doing some work. There's no reason to make the threads work harder than they need to.
This is the beginning of short series of articles related to optimizing massively multithreadded C++ applications. I'm not entirely sure what the exact definition of "massively multithreaded" is, but for our purposes, let's assume at least twice as many threads as CPU cores.
Having so many OS level threads may not be the most efficient way of handling concurrency, but it is legitimate if most of your threads spend most of their time waiting. They may be waiting on a timeout, as in a timer thread that fires on regular intervals, or a message processing thread that is waiting on IO.
In the "C++" example we showed a object that automatically managed a thread's life time and used the thread to calculate the Fibonacci sequence in the background. Using templates we should be able to make a generic version of the Fibonacci calculator thread.