I don't mean this is a "clever" sort of way, but what difference does it make. Yes I suppose 150ms is a long time in computer terms, and 300ms certainly is, but if it is turning on a motor, which takes seconds to get to full speed, then the time delay dissapears, particularly when there will be some other electronics, e.g a digispeed and inverter in the line.
In answer to your question, and it might be the answer -
M3,M4 and M5 commands are, by their nature, not time dependent in the same way that say, pulses on an axis are, and therefore, might have to "step aside" in the scheme of things, so that more time important things get in the right place.
Have you tried to see what time delays are present on the other outputs - say the coolant. Is the delay a product of the fact that, although one socket, the top four output lines are, actually, a different address to the bottom eight.
I you are using a serial out, such as smooth stepper, then again there will be a preference order that each signal must take.
I have, yesterday, been doing a Macro to bore oout some gears. I did a 'code"M3"' but the macro just ignored it to all intents and purposes (becasue I had a previous "wait til key pressed" instruction before it. It seems that the code is written in any case, and when the key is pressed it just shoots through the instructions.

I ended up writing a plain old Gcode routine