I LOVE Retroshields!
Not because I made them but they makes debugging HW/SW so easy and very educational.
Recently, I was bringing up the Retroshield 6800 prototype. I quickly got MIKBUG and SWTBUG running. Next step was to run FLEX 2.0 using the DSK images on Teensy sdcard. That’s where my life felt like a “going down the drain”… I spent 2 weeks with no progress. Until today. With hind-sight, I literally wasted 2 weeks. At least I learned a great deal in the process.

What did I learn?
- FLEX boot process is freaking SMART.
- Do not overwrite the ROM boot loader that loads the bootloader from disk.
Simple? Yeah, right.
Special Thanks!
I wanted to shout out to Joseph H. Allen for his EXORsim - Motorola EXORciser Simulator. When I couldn’t make progress w/ my 1797 emulation, I brought in his 1771 floppy emulation. I got the same behavior. This meant the bug was somewhere else and that helped me figured it out. Thank you Joseph.
Root-cause
When SWTBUG runs the disk boot routine, it will issue a multi-sector read starting at Track 0/Sector 0 and start copying data starting at $A100+. Multi-sector read means read the rest of the sectors on that track. Total 72 sectors * 256 bytes/sector = 18,432 bytes. In hex, $4800. When floppy disk controller says no more sectors, disk boot routine will jump to $A100, which is the beginning of sector 0.
00444 * MINIFLOPPY DISK BOOT
00445 E28F 7F 8014 DISK CLR $8014 ; Select Disk 0
00446 E292 8D 2E BSR DELAY
00447 E294 C6 0B LDA B #$0B ; Issue RESTORE command (Track0)
00448 E296 8D 25 BSR RETT2
00449 E298 E6 04 LOOP1 LDA B 4,X ; Wait till not BUSY
00450 E29A C5 01 BIT B #1
00451 E29C 26 FA BNE LOOP1
00452 E29E 6F 06 CLR 6,X ; Choose Sector 0
00453 E2A0 8D 1D BSR RETURN
00454 E2A2 C6 9C LDA B #$9C ; Read-Multi command
00455 E2A4 8D 17 BSR RETT2
00456 E2A6 CE A100 LDX #$A100 ; Load Address - A100
00457 E2A9 C5 02 LOOP2 BIT B #2
00458 E2AB 27 06 BEQ LOOP3 ; Wait for DRQ bit
00459 E2AD B6 801B LDA A $801B ; DRQ=1: We can read next byte
00460 E2B0 A7 00 STA A 0,X ; and save to RAM
00461 E2B2 08 INX
00462 E2B3 F6 8018 LOOP3 LDA B $8018 ; Check BUSY bit
00463 E2B6 C5 01 BIT B #1 ;
00464 E2B8 26 EF BNE LOOP2 ; BUSY=1, we are still reading
00465 E2BA 7E A100 JMP $A100 ; BUSY=0, multi-sector read complete. Execute.
00466 E2BD E7 04 RETT2 STA B 4,X
00467 E2BF 8D 00 RETURN BSR RETT1 ; simple delays
00468 E2C1 39 RETT1 RTS
First, Why I think the FLEX boot process is smart is the directory structure is stored on Track0, after the bootloader. A multi-sector read by SWTBUG means the directory structure is also in RAM memory. And that’s what I think the bootloader does, it reads the first file in the directory structure into memory and runs it. This happens to be the FLEX2.SYS.
Bullet item 2, here was the problem.
A100 - E900: Bring all sectors from Track 0
E200 - E3FF: SWTBUG monitor code
There is an overlap ! In real systems, SWTBUG is in EPROM. In Retroshield SW, it is possible to put ROM into Teensy RAM (Shadow-RAM) so I can override it, i.e. change boot vector, or place my custom I/O drivers or disable unnecessary delays. As I usually do, I copied code from another shield and forgot ROM was copied to RAM. As a result, the track 0 multi-sector read was overwriting the SWTBUG bootloader routine. And I scratched my head for a week trying to figure out why the multi-sector read stops in the middle of sector 66, magically all the time, as seen below:
////////////////////////////////////////////////////////////////////
// Monitor Code
////////////////////////////////////////////////////////////////////
#define STORE_ROM_IN_FLASH 0 // 1: Read-only ROM. 0: Shadow-RAM
#ifdef ROM1_START
#if (STORE_ROM_IN_FLASH)
const unsigned char EPROM1[] PROGMEM = {
#else
unsigned char EPROM1[] = {
#endif
#include "eeprom1.h"
};
#endif
FD1771 restore!
Set sector = 0
FD1771 read multiple
Read sector drive=0, track=0, sector=1
Read state 2: 00: 8e
Read state 2: 01: a0
Read state 2: 02: 7f
Read state 2: 03: 20
...
Read state 2: fe: 00
Read state 2: ff: 00
Sector 1 done
Read sector drive=0, track=0, sector=2
Read state 2: 00: 00
Read state 2: 01: 03
Read state 2: 02: 00
Read state 2: 03: 00
...
Sector 65 done
Read sector drive=0, track=0, sector=66
Read state 2: 00: 00
Read state 2: 01: 43
Read state 2: 02: 00
Read state 2: 03: 00
...
Read state 2: ab: 00
Read state 2: ac: 00
Read state 2: ad: 00
FD1771: FIXME: Missing Read Port5, 0000
FD1771: FIXME: Missing Read Port5, 0001
FD1771: FIXME: Missing Read Port5, 0002
FD1771: FIXME: Missing Read Port5, 0003
Read state 2: ae: 00
With hind-sight, when track0 data starts overwriting E2A9, we are executing random code. $A100 + (66-1) sectors * 256 bytes + $AD => $E2AD. (sectors numbers start at 1, not 0).
00457 E2A9 C5 02 LOOP2 BIT B #2
00458 E2AB 27 06 BEQ LOOP3 ; Wait for DRQ bit
00459 E2AD B6 801B LDA A $801B ; DRQ=1: We can read next byte
00460 E2B0 A7 00 STA A 0,X
00461 E2B2 08 INX
00462 E2B3 F6 8018 LOOP3 LDA B $8018 ; Check BUSY bit
00463 E2B6 C5 01 BIT B #1 ; If BUSY=1, we are still reading
00464 E2B8 26 EF BNE LOOP2 ; If BUSY=0, multi-sector read complete.
When $E2B0: STA A 0,X executes to store data at $E2A9: LOOP2 BIT B #2 address, the $E2B8: BNE LOOP2 jumps to whatever the disk had for us to execute.
Once I set that #define to keep SWTBUG image as ROM, it worked right away! You see Retroshield code is complaining CPU is trying to write to ROM section.
#define STORE_ROM_IN_FLASH 1 // 1: Read-only ROM. 0: Shadow-RAM

And BASIC:

Closing
My engineering career taught me two things:
- If engineers in a room can’t agree on the same thing, they must have different assumptions you need to check or resolve.
- if an engineering problem takes more than 1-2 days to solve, then it will be a hairy but interesting problem (more smart people in the room, the more once-in-a-life-time bug).
This bootloader overwriting the boot loader reminded me the NAND bug we fixed on iPod nanos.
QA team reported one ipod out of say 100 would get stuck while playing audio. This would happen like once a day. SW team got involved, but it was very difficult to replicate the issue. We asked the factory to have all the operators take an ipod and try to replicate the issue :) Engineering in Cupertino was also working 24hrs trying to replicate it. Debugging went about a week. Learned a few things but no real root-cause. One day, HW and SW folks were in the room debugging an ipod. Luckily I was there too. When we saw what we saw on the oscilloscope, we let the loudest BOOOYYAAAAHHH I ever heard :) I think we all cried. It turns out, issue was the NAND power supply timing. Whenever NAND was not needed, we turned off NAND power. When the regulator turns off, you need to wait for voltage to go down to 0V (or below critical thresholds, such as NAND Power-On-Reset level (hint, hint)). If you turn the power back on quickly, the rail goes down to 1.5V and then back to 3.3V. In otherwords, your 3.3V NAND is briefly running at out-of-spec 1.5V, and if you don’t reset it, of course it will get confused and not respond to incoming NAND commands. Ahh, good old days of sweat, blood and fun…