ruud.baltissen_at_abp.nl
Date: 2002-04-17 13:37:09
Hallo allemaal, I simply forward some other emails regarding Gideons FPGA implementation of the 6510. At the end you find a small history and some more ins and outs. ============================================================================ == |How about the illegal instructions? As I recall previous postings at least |that time there were no plans to implement them. In this version, almost all 6502 illegal opcodes have been implemented, too. Most of them come forth of 'poorly' decoding the instruction, such as LAX and SAX. The opcodes ending with $3 and $7 and $f (and some $b) also act the same as on the 6502, because it just implies a 'wrong' order of the internal states to be taken. Some other "illegal" opcodes, like the places where STX $nnnn,Y and STY $nnnn,X should have been were called illegal because they didn't work in the original 6502. In this FPGA implementation they do work, so on those opcode places, you'll find STX $nnnn,Y and STY $nnnn,X, and of course the load variants as well. IMHO it doesn't matter that these opcodes act a bit differently from the original 6502 since these were not stable. Anyone who is interested in testing all opcodes: you're welcome. I just don't know how to get an FPGA board to you to test them. Maybe I will ahve a few of them made. |Does this implementation also enable weird stuff like putting bytes on the |bus at certain times to write to areas normally unaccessable? I mean writes |to RAM $00/$01 which is more stable on the later C64s. I don't have any demos, so if you'd like to have the result of some tests, then please send me the 5.25" disk with the demo :) What locations $0 and $1 are concerned; in my implementation the reads are always from the local PIO registers, and the writes go to the PIO registers, but also to the bus, so the rest of the system *does* write those bytes into RAM. I am not sure if this is the case with the original 6510. Anyway - reading the RAM locations $0 and $1 by using sprite collisions etc, doesn't have anything to do with the CPU, since you are reading it through the VIC, so that should work. What some illegal opcodes are concerned; in my last post I wrote that the 'unstable' opcodes of the original 6502 do not work the same on my 6510 implementation. This is true, since Nathan pointed out that there were only 2 that were not stable, I have to broaden this definition a bit. In this 6510, the ones that had a very unusual meaning and hard to comprehend (like the high address byte + 1 anded with some other value, blah blah), *those* will all work differently. Opcodes $x3, $x7, $xF will do the same as on the original chip; guaranteed! So will the opcodes that select A and X together; LAX and SAX. Some other opcodes that did nothing but a "read from the bus" in the original 6502 now do something. Examples: 5C: JMP $nnnn,X 34: BIT $nn,X 3C: BIT $nnnn,X 04, 14, 0C, 1C: Similar to BIT, but than with OR instead of AND These came for free by 'loosening' the decoding a little. That the timing is concerned; there are some differences. From the top of my head: * branches take 2 cycles untaken, 4 taken, no matter if the page boundary is crossed or not. * implied instructions always take 1 cycle instead of 2 (TAX, CLI, etc) * RTS and RTI take one cycle more * Additions/subtractions in decimal mode are less buggy and take one clockcycle more. * In read/modify/write instructions, the wrong value is not written first, like what was the case on the 6502. I hope that this gives some more clarity about what the implementation looks like. ============================================================================ == History: Gideon contacted me in private because, being a C64 fan and working with FPGA's, he had the idea of building a C64 in FPGA. He searched the net and hit my sit so often :) and being Dutch aswell, he decided to contact me. I told him that Jeri was working on the C=1 so in fact he would be inventing the wheel again. On the other side Jeri had to use the 65816 as there was no free (good) core for the 6502 and 65816. So Gideon decided to shift his attention to the processor by producing a better CPU then the 65816. One that actually can replace the original 65816 on the C=1 but also the 6510 or 6502 on other C= computers (just a matter of another interface). We are aiming at a 32 bits CPU with some extra's running at 32 MHz. Maybe one that has a 65816-mode. But one that can run 6502-code any time !!! About the illegal opcodes: Gideon and I are still discussing what to do with them. Let's have a look at the 65816. It has only ONE opcode (WDM / $42) left that can be used for extending the instructionset. This would mean we would end up with 3 byte instructions. Using illegal opcodes means we still will have some two byte instructions but 3 byte ones. Then the facts: 1) who is using illegal opcodes? AFAIK mostly demo's with no other reason than to gain some extra microseconds. 2) Users having a SCPU cannot play these demo's anyway as the 65816 won't recognise the instructions as meant (and therefor can crash). About the differences in timing: Gideon could change the design so the 65GS10 would work exactly like the original 6510. Adding an extra cycle is no problem, but reducing the extra ones is. All operations inside the FPGA are done at the rising edge of the clock. Doing some operations at the falling edge as well would do the trick. But then you end up with operations that sometimes have to activated on the rising edge and other times at the falling edge. And the combination is the problem as (for the moment) the solution costs too many gates compared to the gain. What programs really depend on these timings? Mostly demo's and games. As I said before, we are aming at a 32 MHz CPU. Running the CPU at any other speed then the original frequency would screw up this game/demo anyway. IMHO then those few clockcycles won't make the difference anyway. What about the extra speed for games? SCPU-users ran into this problem allready I think. (I don't have one, so I cannot tell) In fact I think we will run into the same problem with a lot of games as we had with the PC's at the end of the 80's: many games only ran fine at PC's equiped with a 8088 running at 4.77 MHz. (This is IMHO the only reason why PC's were equiped with a "Turbo-button") I don't see any reason why the 65GS32 could not run at 1 MHz. I wonder what game would drop dead on the fact the some instruction aren't time exact. (Hmmm, a "single Stepper" inside a monitor could) About the extra's: - The 65GSxx is capable of addressing SDRAM's directly. This feature is needed so the 65GSxx can run at those high speeds. OK, this isn't a feature you would expect of a CPU but it is "build" inside the same FPGA and therefor considered as part of the CPU. Same comment for other extra's. - A Memory Management Unit. Those people familiar with a SCPU immediatly know why we need this device. The VIC cannot "see" the SDRAM in any way. So the 65GSxx MUST write video-data to the original RAM of the C64. The MMU enables us to tell the CPU wether to use the original RAM or the SDRAM. A special instruction will replace the Zeropage with a set of registers. A simple loop like: ldx #0 L1 lda ROM,X sta $00,X dex bne L1 could fill these registers from ROM or whatever other source. - The CPU is going to be equiped with 32 (?) 32 bits general purpose registers. This means we could perform instructions like "LOAD R1, ($12),R3" but also "LOAD R3, ($12),R1". The idea is to dedicate (part of) the registers to the well known standard registers of the 6502. So "LDA ($12),Y" will in fact do the 8-bits version of the above "LOAD R1, ($12),R2". We also need more instructions. LDA (or LDAB) loads a byte. LDAW will load a word, LDAD a double-word. LDAx (or LDAx16) uses a 16-bit address, LDAx24 a 24-bits address, LDAx32 all 32 bits. In this way it is easy to extend the existing instruction. A problem will be the sheer mass of possible combinations. What about all possibilities with the instruction "LDA ($xx),Y"? This command allone has 36 possible combinations !!! Another problem The 16- and 24-bit address instructions are another problem: what about the unused higher addressbits? One idea is to make them zero. Another idea is to dedicate a register to these instructions to fill in the remaining bits. In this way we can run several virtual 6502-processes parallel to each other. -new instructions: This is a matter of gains and costs. Using the ADC instruction the first time means we need to (re-)set the Carry-flag. The 80x86 has the ADD instruction that does the addition with disregarding the Carry. In our opinion we can do without this instruction as the gain is marginal. The 6502 has no block-instruction. The 65816 has: MVN and MVP. With X varying from 8 to 32 bits, this loop: ldx VALUE L1 lda HERE,X sta THERE,X dex bne L1 can replace such a blockfunction. But I figured out that a blockfunction could move a double-word every 2 cycles against 6 for the above loop. This is a gain of 200% but then: what is the over-all gain? I could be wrong but a compiler does not benefit from this gain, a text-editor could. I can hear some of you think: 6 cycles ???? Yep :) - cache with onboard allignment and pipelining "LDA ..." and "STA ..." are 6 bytes each: 2-byte instruction, 4 addressbytes. "DEX" one byte, "BNE L1" two bytes. Total: 15 bytes = 4 cycles. Add two cycles for the actual read and write and you have 6. Future: Gideons idea is start with the SDRAM interface and MMU first. Without the there is no good way in testing any 32 bit extensions. ___ / __|__ / / |_/ Groetjes, Ruud \ \__|_\ \___| http://Ruud.C64.org Message was sent through the cbm-hackers mailing list
Archive generated by hypermail 2.1.4.