From: Christopher Phillips (shrydar_at_jaruth.com)
Date: 2004-11-03 10:57:55
I'm typing this response while away from my somewhat intermittant web connection so I may be a few messages out of date (still waiting for my new house to be connected), but fwiw.. On 1 Nov 2004, at 04:46, Hatch wrote: > > > > I'm thinking that EOR filling will be done by the cct, as the ctt > displays > the > image it EOR fills (takes up no CPU cycles, this was the idea of > someone > on the CSDb forum), there will be a control bit that turn EOR fill on > or off > and 40 addressable bytes for setting the initial byte to EOR with at > the > start > of each column and then stores the result ready for the next row. . > With > this in mind would horizontal formatting be faster for the 3D > calculations? It really depends on whether you use option V or option C, and if option V how the memory is accessed. I EOR-fill horizontally, not vertically (this way I can pattern fill fairly trivially). But doing the eor-fill in hardware is a nice idea, as is the option of clearing the screen (not necessary for the screen itself when doing eor-fill in software, but the eor-buffer still needs clearing, which can still be faster to zero fill the line than to undo the writes done by the h-events. In Effluvium I keep the eor-buffer in zero page, but that's still three cycles per byte...) Be aware that Evans and Sutherland have a patent on clearing the screen as it is displayed - not that they would care. (bloody patents on the blindingly obvious....) thinking out loud here: This is how my fill loop would look if I only had access to an incrementing vram byte: lda eorbuf sta IO ; register on video card that writes to an autoincremented address in vram stx eorbuf ; clear eor buffer for next row of plotting. eor eorbuf+1 sta IO stx eorbuf+1 ... (this whole routine is called once per pixel row, in between updating the horizontal intercepts of the currently active edges and plotting them into the eorbuf) or 10 cycles per byte, 80,000 for a full screen. That's only one cycle per byte faster than no hardware assist at all. Ideally you want to avoid this overhead altogether. How about this idea: memory map an eor-buffer into the IO space - this way the line plotting can be directly into the video card, then you could write to a control register to say 'fill your current pixel row from the eor-buffer, then clear the eor-buffer and increment the pixel row pointer'. This should only take one video-card cycle per byte, or around 1000 c64 cycles for the entire screen - a savings of over four frames! It would be nice to have the option of eor-filling either horizontally or vertically, as coders will be arguing about which is better until the end of time :) > > This would be ideal (cct doing some of the 3d work itself), if I use > C64 > memory I would like to at least use the idol video accesses (When the > raster is outside of the display window) for clearing or byte filling > memory. > Although this isn't 3D work it would clear or fill a portion of memory > without using CPU cycles which would have to speed things up. *nod* > > Are you thinking that the cct could actually do some calculations and > return values to the coder? I was more thinking assisting with the filling, but certainly a circuit that does a 3d rotation and perspective divide would be very useful. The playstation would probably be a useful model for this - you can set up a 3x3 fixed point rotation matrix with an integer translation and screen offset, then feed it x,z,y coordinates in model space and around 20 cycles later you can read back coordinates in screen space. At the moment, my own 3d renderer spends more time clipping edges to the view frustum than it does doing rotations and perspective divisions - each edge that crosses a clip plane spends over 1000 cycles computing xc=(xa-xb)*(0-zb)/(za-zb) yc=(ya-yb)*(0-zb)/(za-zb) where (xa,ya,za) and (xb,yb,zb) are the 16-bit endpoints of the edge in camera-space after skewing the clip-plane onto z=0 (although I do it with a binary search for the point where the edge crosses z=0 rather than doing the multiplications and divisions explicitly) So again, something else that would be nice to have in hardware. Christopher Jam/shrydar Message was sent through the cbm-hackers mailing list
Archive generated by hypermail pre-2.1.8.