Parallel computations (multicore SGD)
Attempt at parallelizing for multicore, failed in that the Gccjit
backend computations are bottlenecked by memory accesses.
Further work in this direction would need to e.g. copy the relevant sub-tensors for each of the parallel tasks.