Yes, they are quick! The issue is because Mach4 doesn't know exactly when the last movement was done on the motion controller. Mach4 holds up the M code until the last way point is sent to the motion controller. But there is going to be some latency involved between the time the data was sent and movement actually is done. So the physical motion always lags the as compared to the position in the G code file. This latency is PC/motion controller specific and will be different on just about every system.
The good news is that you can slow up the M codes with a wx.wxMilliSleep(ms) call. If you are consistently 10ms fast, do wx.wxMilliSleep(10) in the first line of the M code.
That is something to try, but it is not the best solution. Eventually, we will have a facility to turn on output in coordination with motion. We may use the LinuxCNC M62 and M63 type of thing. It seems that there is no standard at all to follow. Maybe if we do it like LinuxCNC does, it will "create" a standard. Only LinuxCNC's current form doesn't go far enough, IMHO. They only provide for turning output on or off at the beginning of a move. It would be nice to be able to do the same at the end of a move. Then there is CV to consider! It all gets a bit complicated.
http://linuxcnc.org/docs/html/gcode/m-code.html#sec:M62-M65M62 P1 (Turn output 1 on at the beginning of next move)
M63 P2 (Turn output 2 off at the beginning of next move)
G01 X1 Y1 (The movement. Outputs 1 and 2 will be coordinated with the start of this motion.)
Steve