Muddled the code too much, and only gave a marginal performance improvement in the grand scheme of things. Other ways to parallelize MCTS will be nicer to implement and could yield better results.
I'm stupid and forgot it divided the number of iterations by 8, no wonder it was so fast lmao