Beamhacking: Syncing

As discussed previously, our first step towards higher resolution graphics on the TRS-80 Model 3 is to get in step. We can't very well change graphics mid-beam if we don't know where it is. As luck would have it the so-called clock interrupt that goes off every 1/30th of a second is triggered by the video hardware. Once that interrupt comes along we know the beam is just past the visible portion of the screen. Or to get a little more precise, the interrupt is triggered at the start of line 192.

Our first concern is that our code doesn't get to run the instant the interrupt happens. The Z-80 has to save the program counter, jump to the interrupt handler which does a few things and then calls out to a vector where we can take over. But lots of cycles (or T-states as the Z-80 likes to call them) have gone under the bridge by that point. However, that's not really a problem. Those steps are all fixed — we can either count cycles on the code path once to figure out how much time passes or, more sensibly, give it a try and notice the relative offset as shown by some test program. Either way we're bound to notice the real problem. We'll get slight but significant differences in the Z-80's synchronization with the beam. Sometimes as much as 5 or 6 characters.

That's not a big deal if we're only changing one character per line. But the more certain we are of the position the more time we'll have to change graphics. That's pretty important if we wish to do impressive graphics. Think of the raster beam as a train and you're in a car (the Z-80) at a level crossing. You're fast, but the train is faster. In the mysterious ways of the human animal the quick way to social standing is crossing the tracks as many times as possible without the train killing you. The better you know the train's position the more crossings you can make. The more accurately we synchronize with the beam the better graphics we'll get. No sense in getting anything less than the best result possible.

Well, then, why does the synchronization vary so? Simple. We forgot the very first thing the Z-80 does when it responds to an interrupt. Before it even pushes the program counter it has to finish whatever instruction it was doing. If that happened to be a 16 T-state instruction then it could delay the interrupt as much as 16 T-states (or maybe only 15 but let's not get too picky yet). Keep in mind the train (er, raster) is fast. Every T-state is 5 dots and every character is 8 does wide. 16 T-states is 80 dots which is 10 characters. And a 16 T-state instruction isn't even the slowest one around -- the drift could be worse.

We're in a bit of a conumdrum here. The whole point of an interrupt is to come when you're not expecting it. How can we possibly get the interrupt to happen when we're running a fast instruction? There's a few ways to do it but why pass up on using an instruction you never have: halt. Need to be a little careful to avoid bad luck, but this will do the trick:

Let's not get too excited. 4 T-states is still 2.5 characters — much better than before but can we do better? Improving what happens while waiting for an interrupt isn't possible. There are no instructions faster than 4 T-states. I did find a complicated way to get down to a single T-state. Run a tight loop while waiting for the interrupt. When the interrupt happens, find out where you were in the loop when it went off. Now adjust the phase of the loop (i.e., at what T-state it starts) until you see the interrupt happen at a different instruction. Apply a bit of thinking, modulo arithmetic and you'll see how the phase of the loop can match the phase of the interrupt. But forget about that, there's a much, much easier way.

One of the many ways the Model 3 differs from the Model 1 is the addition of circuitry to eliminate (mostly) memory conflicts between the Z-80 and the video display. If the Z-80 and video display both needed memory at the same time the Z-80 would get to do its thing and the video display would be handed a zero for that character slice. In other words, any time the Z-80 read or wrote video memory there was a reasonable chance it would result in a glitch on the screen. You can see that effect in this video of the classic termites program (but for goodness sake don't watch the whole thing).

That kind of nonsense doesn't happen on the Model 3. The video is given priority and the Z-80 is told to wait. A pretty good arrangement, that. Keeps things tidy. Most importantly it's just the ticket to get into sync with the beam. Just access the screen when the beam is drawing and we'll be suspended until the end of the line. We don't have to be too accurate about the whole thing as long as we know we'll be somewhere within the line. Our new, improved technique goes like this:

But can we do better? I don't believe it is possible but to be sure this synchronization method is not perfect. Based on some experiments it is clear that the relative phase of the dot clock and the Z-80 clock is not guaranteed. Yes, there is one T-state started for each 5 dots, but if you count the dots modulo 5 (0, 1, 2, 3, 4, 0, 1, 2, ...) maybe every T-state starts at dot 1, or perhaps they all start at dot 3. If there is a way to change that phase I haven't found it. We can be confident that any such technique would involve the Z-80 poking at hardware control registers. The Z-80 itself can only step time forward one T-state at a time. But at least we got down to 1 T-state and it appears that amounts to a slop factor of a character. As long as your video timing leaves a character between the Z-80 operation and the display of the data you'll not have any problems.

Here's the code I'm currently using to get in sync width the beam. It's a little different than described but gets the job done.

; Synchronize to within 1 T-State of the beam.  Experiments indicate that
; the dot clock is not locked to the processor clock so you can count on
; some variance in blanking.  But we always seem to hit a fixed starting
; character.  Allowing +/- 1 character on your blanking should be sufficient.

sync1   macro   beamtop

; Ensure video waits are on.
        vwon

        di

; Enable clock interrupts

        ld      a,(4213h)
        or      a,4
        ld      (4213h),a
        out     (0e0h),a

; Hook interrupts

        ld      hl,(4013h)
        ld      (sync1chain+1),hl
        ld      hl,sync1irq
        ld      (4013h),hl

; Clear clock latch, just to be sure

        in      a,(0ech)

        ei
        halt

sync1irq:
        push    af
        in      a,(0e0h)
        and     4
        jr      z,sync1clock
        pop     af
sync1chain:
        jp      0

sync1clock:

; Clear clock latch

        in      a,(0ech)

; Restore old interrupt vector

        ld      hl,(sync1chain+1)
        ld      (4013h),hl

; Disable clock interrupts

        ld      a,(4213h)
        and     a,0fbh
        ld      (4213h),a
        out     (0e0h),a

; Wait until top of video; then use video wait to sync to end of line.
; That's 73 lines plus the IRQ overhead.

        ld      bc,377
sync1wait:
        dec     bc              ;<6>
        ld      a,b             ;<4>
        or      c               ;<4>
        jp      nz,sync1wait    ;<10>

        ei
        ld      a,(15360)
        jp      beamtop

        endm

; Handy macro for turning off video waits

vwoff   macro

        in      a,(255)
        and     a,0dfh
        out     (236),a

        endm

; Handy macro for turning on video waits

vwon    macro

        in      a,(255)
        or      a,20h
        out     (236),a

        endm

Oh, right, you may have noticed something I forgot to mention. Turns out the Model 3 lets you decide whether or not you want video wait states. The code above makes sure video waits are on before we intentionally trigger the wait. It turns them off once we've achieved sync because they're a right pain if you're trying to keep track of cycles and stay in sync. And that's a topic we'll start to address next time.


George Phillips, July 22, 2009, george -at- 48k.ca