You can use thread pool as a general purpose design pattern for concurrency. When the task is finished you can collect it's results. A thread pool object/structure is used to assign user defined task to those threads for execution. Those threads are treated as an resource. In this pattern you create some threads in advance. You have to be careful doing IO to the same file in threads, though.Ī good exercise for learning concurrent programming in any language would be to work on a thread pool implementation. If you do have some IO going on in your work then you may find that having more threads than processors is a win because while one thread may be blocking waiting for some IO to complete another thread can be doing its computations. You can always have fewer threads than you have processors if that is the case. Usually you will, but if each thread only ends up with 1 computation to do and the computations aren't that difficult to do then you may actually slow things down. There is some overhead with adding threads to do parallel processing, so make sure that you give each thread enough work to make up for it. If all of your processing is purely computational (as suggested by your example function) then you should do well to have only as many threads as you have logical processors. No matter which route you choose you are just dividing up your arrays into chunks which each thread will process. If you prefer not to use openMP you could use either pthreads or clone/wait directly. #pragma omp parallel shared(a,b,c,d) private(i) The C/C++ example on this page is similar to your code: You should have a look at openMP for this.