try replacing doSpinCW() with activateSignal(OUTPUT1) in M3 and doSpinStop() with deactivateSignal(OUTPUT1) in M5. This got both down to around 0.1 secs for me. Then if you like make M1000.m1s and M1001.m1s (or whatever free numbers you like better) with activateSignal(OUTPUT3) and deactivateSignal(OUTPUT3). Obviously you'll need to change your pin to Output3 in ports n pins and call M1000 and M1001 instead of M3 and M5. This got the delays down to around 0.05 secs on my system. Though why different M numbers and output 3 should be faster than M3/M5 and output 1 (especially on the same pin!!!) I don't know.
Hope this helps
Ian