The World of Emulation: Understanding the Challenges and Complexity of Emulating Hardware
Creating an emulator that can run a word processor on an old BBC micro without worrying about timing issues is relatively straightforward. The key is to provide the data when it's needed, rather than relying on precise timing. This approach allows for a more relaxed pace, enabling the emulator to function effectively for simple applications like word processing. However, as we move into more complex systems like gaming and interacting with other hardware components, the importance of accurate timing becomes crucial.
The 6502 CPU is a good example of a relatively straightforward target for emulation. Its instruction set is well-defined, making it easier to understand how to emulate its behavior. In contrast, more advanced CPUs like the 68000, used in the Amiga and Atari ST, require more careful consideration due to their more complex instruction sets and the potential for variations in implementation depending on the hardware's design.
One of the key challenges in emulation is ensuring that all components of the system interact correctly with each other. The timing of instructions, graphics rendering, and memory access must be carefully synchronized to produce an accurate and seamless experience. This requires a deep understanding of how the target hardware works and can lead to significant complexity. Even simple emulators can become complicated when attempting to accurately replicate more complex systems.
To develop a working emulator, one must create a system that can simulate the behavior of the target hardware. This involves writing code that accurately replicates the CPU's instructions, as well as handling memory access, graphics rendering, and other system interactions. The documentation for these systems may be incomplete or outdated, making it necessary to write tests on real hardware to determine how things actually work in practice.
This process can be time-consuming and often requires a significant amount of trial and error. The goal is to create an emulator that produces the same results as the original hardware, but this can be difficult due to various factors such as manufacturing variations or design decisions made by the creators of the system. Even with careful planning and testing, there may be instances where the emulator fails to accurately replicate the behavior of the target hardware.
The importance of accurate timing becomes even more critical when dealing with systems that require precise synchronization between multiple components. The Apple M1 Max, for example, can emulate an x86 64 CPU effectively, but its architecture presents unique challenges due to differences in memory ordering between the ARM and x86 architectures. To overcome these challenges, developers must carefully study the target hardware's behavior and implement accurate emulation of these differences.
In conclusion, creating a working emulator requires a deep understanding of the target hardware and a willingness to delve into complex systems. While simple emulators may be achievable with relative ease, more complex systems demand significant expertise and time. The process involves careful planning, testing, and debugging to ensure that the emulator accurately replicates the behavior of the original hardware.
"WEBVTTKind: captionsLanguage: enover the Christmas period I picked up this game for my PlayStation Atari 50 which is a sort of interactive history of the Atari games consoles misses out the stuck but the Atari games consoles and arcade games and one of the things it does is it emulates the game so you can actually play them there's been a new Jaguar emulator written by Richard Whitehouse which has then been released for the PC as well and so I thought what would be an interesting thing to do would be to actually look at how we would write an emulator for a game system or a computer or anything really first thing to start is actually we need to know what's in the system that we want to emulate we're not going to build a complete emulator because we don't do tutorials on computer file but we'll look at things so let's start with a system and let's keep the Atari theme so let's grab a 2600 start off thinking oh it would be really hard to write an emulator then when you actually start looking at it you think actually this is really easy and then when you try and actually make it work for every single bit of software that's ever been written for it you realize it was hard in the first place so you can get something going that will run simple software quite easily but to get the most out of things like the 2600 or an Amiga or an Atari St or whatever it is even a modern PC lots of software tricks are often used and getting them to emulate accurately can be really really hard and that's where things get interesting but let's break this open literally and see where we get started there we go so it's not pushing it peace and then we've got the circuit board and most of this is just stuff to make it appear on the TV screen so we've got the TV modulator over here a few buttons for input and output sockets for the joysticks on the back a few switches that's power input quite useful doesn't work without it but the main things that we're interested in if we're trying to build an emulator are these three chips here the one in the middle this one is the CPU it's a variant of the 6502 CPU so we'll need to emulate that and by emulate I mean we'll need to write software which does exactly the same thing for the CPU drills won't give them the same code um this one over here is the wonderfully named Riot chip that's short for Ram input output and timers uh chip here and we'll need to emulate that as well and we've got down here the television interface adapter the Tia chip which is one that produces the graphics and so on that actually gets displayed on screen of course the other thing which is the game the cop the Atari is well known for is that you have the cartridges which you slot into here and actually all the cartridge was was a ROM chip which contained the program so your actual program code was in the cartridge so that was there and then the CPU was here and the ram was in the chip here to go actually and you had the wonderfully high amount of 128 bytes of memory we're not going to go into the details of how the 2600 works that's another video and we should probably already be covered part of it but we need to know where those things were so that when we emulate the CPU and we write a particular memory location or read from a particular memory location we know whether we're getting data from the cartridge getting data from the RAM and the riot chip or whether accessing some other Hardware that's attached to it so the first thing that we need to know when building an emulator is what's the memory like this is pretty much exactly the same things you need to know when designing a computer or designing a games column in the first place you need to know where things are going to be so you can build the hardware to decode it we're not building Hardware we're writing software but we need to do the same sort of thing in there exactly that's exactly what we're going to do we're writing software that pretends it's Hardware so on the 2600 we have address zero down at the bottom we have FF FF up here yes I know the 6507 CPU variant in the 2600 only has a limited number of address pins so only goes up to one FFF but in terms of the 6502 it's still try and access the high addresses to fetch things like the reset Vector we'll come back to that so at the top we would have the ROM and that would be a line to the top of it is a ffff f um so that the reset vector and things are in the right place so the low 128 bytes of that are for the Tia chip and you access that with those addresses the next 128 bytes are your RAM 128 bytes of it and then it addressed two zero zero zero or so in HEX you have the rest of the riot chip that you can access there so we now know where things are in memory and we can relatively easily duplicate that in a program the ROM cartridge if we've got the code for it we can load that into an array and we can access the bytes from that relatively easily again the ram we can load that into an array access the bytes right bikes into that when we Access Memory there easy to do as we said when you start off it looks complex as you start to break it down and so I thought oh this looks relatively easily one array two arrays easy to implement then it starts gets harder again for things like the input output device is the Tia the riot your keyboard your mouse will be doing a computer or something they don't necessarily memory locations but when the CPU reads from there you can write software that basically says if it is this location call this routine to process the value that's being written or call this routine to produce the value that's being read from that location so the memory side of things starts off looking straight forward the other thing we then need to implement is of course our CPU and then that's normally in Hardware would talk to the memory via a databus and an address bus so the address bus will contain the address of what it wanted to access and the data will be funneled over the data bus as we said we can emulate that our emulation the CPU can produce an address and then we can then fetch it or store it based on the data that we want to access so the question is the real question is how do we emulate the CPU well to do that we need to understand how the CPU works you need to understand what registers it has what internal values they have for we can store things we need to know what it does when it starts up we need to know what the instructions are and the effect that they have and then we can write a program that does exactly the same thing so let's have a think about the 6502 CPU and the only reason we're using that is because it's dead simple that's based inside the 2600 or something it's also inside the NES the Nintendo Entertainment System the snares use the later variation of it the 658c16 I think it was um and so on and the 6502 has been around for donkeys years pet the Commodore 64 the BBC micro or 6502 based if I remember correctly or variance of it should we say and things so the 6502 based CPUs are dead simple we've got three registers a X and Y and they all store eight bits so we've got an 8-bit for that I've got eight bits for that and we've got eight bits for that we also have a stack pointer which we will call S which is also eight bits long but the 6502 does something Slightly bizarre in that it always prepends a one to it so your address is all four between one zero zero and one FF in hexadecimal in memory which Maps quite nicely to our display here because our Ram starts from FF down to 7f but the way it's implemented is it also appears at 100 to 1 FF so you can use the Rambo for just general purpose things and also for the stack which you need for the CPU to work properly things like calling a subroutine will only work with the cpu's got a stack for that to store the return address in and things it's the way the CPU works when you call a procedure effectively it needs to store where to go back to and it does that by putting it on the stack so that you can then pull it off when it's finished again starting to get slightly complicated because we thought these things only appeared here but actually they now appear at the same thing so there's two addresses that can access the same thing so we need to write our software that emulates the memory to cope with both set of addresses and get the same thing because if you access them either value you get the same result starts to get slightly more complicated still not that complicated yet we'll come to that so we've got our axy we've got our stack pointer which is one and then the value in there we have a program counter and this is 16 bits long I'm here and that stores the address of the next instruction we're going to execute the CPU works by fetching an instruction from memory executing it then fixing the next one and it knows where to get it from based on the program counter here so that's a 16-bit value and then finally we have a set of processor Flags which are generally referred to as p and which are eight bits long and depending on the instructions they get set with specific values so we can break these down for example we have a flag n which is set if the instruction produces a negative result so if you say subtract one number from another and it's negative that bit in the practice of flags will get set to say that it's produced a negative result there's another one which is set if it produces a zero result um there's one that happens if there's a overflow a carry I'm not writing these in the right order because one tells if there's interrupts enabled there's another one that tells it if it's in decimal mode which is a pain in the neck to implement and then there's one that tells it if the break has happened of course you need to make sure if you're implementing it that they were in the right order so that's basically the internal state of the 6502 CPU it is relatively straightforward and we could easily write a program to store this state we could have a variable called a which stores the value in the a register we could have a variable called what do you think Sean uh well I would have called it B but they called it y so is it Y no it's X so yeah we'll have a variable called X to store the x that is a variable called y to store the Y register the variable called s which stores the stack pointer a variable called PC which is 16 bits long which stores the program counter and a variable called P which can store the processor Flags so again relatively straightforward we can create these just use normal variables to implement them then we need to think about what does the CPU do when it starts up with a 6502 what it says happens is no idea what's going to be in the registers except it will go and fetch into the program counter from memory locations fffc and fffd so it'll fetch those two bytes and put them in the program counter as the first address it's going to access which is why our ROM cartridges are mapped in at the very top of memory because then those addresses are in the ROM and the ROM can specify where it wants to start its program in there in the games console and so when we write our program the first thing that we'd want to do is to set the program counter to equal the location in memory at address 0xffc oops I missed an F in there and then we just need to do some bit shifting to get the location at zero xff D it's a little engine CPU hate Little Engine CPUs big ending is the way to go I'm a Motorola fan and we'll shift it left eight bits to get it into the right place so we fetched the address of the first instruction we are emulating what the CPU do the hardware that implements it here when it comes out of reset when it starts we'll take the address that's in memory which will be in the ROM cartridge on this system or the different computer system will be in a different place gets the address there puts into the program counter and then goes and fetches the first instruction so what our program would need to do and we'll write this in pseudo code is fetch instruction at PC so we fetch the byte that's there on the 6502 the instructions are by long on different CPUs there are different widths if you're trying to emulate the x86 CPU uh good luck to you they're an absolute nightmare they can be anything from 1 to 15 bytes long pain in the neck to emulate so we can fetch the instruction there and that gives us what's called the op code we then increment the program counter so it's pointing at the next instruction we then need to emulate that particular instruction and the way we can do that is look at the op code and then if the opcode has value 0 we'll do one instruction if it has its value one we'd do a different instruction it has value two we'll do another instruction so we could use something like a switch statement and just say switch zero do this which one do that switch to do this to have the same effect as the CPU would have when that instruction is executed so let's have a look at a very simple instruction this one has the value A9 which if you were to look at up in the instruction is the load a register with an immediate value which just means it follows it in memory so if we saw the value of A9 we would then write some code that would read the next value from memory which we've already incremented the program counter to point to so we'd read that and then we would take it and write it into our variable a one other thing we need to do is we have these flags that we need to set based on the value of the variable a so for example if this is 65 it's not negative so we'd set that to be false it's not zero so we'd set that to be false and because of the way that LDA is specified in the documentation for the 6 over 2 we don't change any of the other flags in the flags register and the status register so what we'd have to do for every instruction is write a piece of code that would do the same thing for a x y s p c and P that the CPU would do and as long as it has the same effect when we then execute the next instruction because the state's been updated to have the same thing that the hardware would have that instruction will have the same effect now you might think there's 256 values in a byte that means I'm going to implement 256 routines actually it's not quite as bad as that because a lot of the um instructions are similar so for example the one we've got here the A9 routine we can actually break down to a set of bits the end one zero and then you have three bits we'll call them B here which specify how the addressing mode works and we'll have three bits a here that specify what the instruction is so these three bits will tell you how to fetch the value from memory is it an immediate value do you read it from another memory location and these will tell you what do you do with that value so it might be in this case LDA which means we load the accumulator the a register with it it might be that we add the value on to the a register or we subtract the value or so on so actually you can split this down and say well I need to write eight different routines to do the actual operation and eight routines that fetch the value for memory and then 30 years only 16 routines you have to write not what looks like it might be 64. so it's not quite as bad as it seems and some of the other ones will have similar effects as you do that so it starts to look like writing it wouldn't be too hard but we've only considered the CPU and what x it's executing remember we said that the CPU is actually talking to the other chips on the system so it's talking to the right chip and it's talking to the television interface adapter to cause things to appear on the screen to access the joysticks and so on to know what you're doing uh as you're playing the game and this is where things become more complicated because each of the instructions takes a certain amount of time to execute and the hardware is doing things it takes a certain amount of time to execute if you go back and watch the video where I looked at how you wrote software for 2600 one of the things you had to do was change the registers in the television interface adapter at exactly the right Point as it was traveling across the television screen to change color to set things up to appear on the right point in the right line if you mean right it'll be at a different point to where the television interface adapter is and so it'll have a different effect of what appears on screen it might execute the right code but what appears on screen wouldn't be what you'd expect it to be so you actually have to write the code here and the code that emulates the television interface adapter or any other Hardware in your system so that you get them working at the same speed that they're in lockstep if you want to get exactly the same effect now for something like a games console that's really important so that the games work fine if you're just wanting to run a program um say something like a word processor it's probably less important you could probably write an emulator for say a BBC micro that could run a word processor run basic without worrying about that you could provide the data when you got to it which key's been pressed for example you could output the data onto the screen and display it and it would work fine for something like that because it's not so tightly coupled between the timing of things but when you you go to things like gaming and sort of you talking to other bits of hardware and so on then you really have to make sure that the timing is absolutely right and that's where things get interesting because something like the 6502 is relatively straightforward in terms of targeting if you look at something like the 68000 which was used in the Amiga or the Atari St or the Sega Game savior Mega Drive saber Genesis and things you find that the length of time and instruction takes Can Depend depending on the way the hardware is built on the instructions next to it the way the hardware is implemented means that some instructions may take longer at certain times than others on the Amiga for example if the CPU could be stalled while the graphics Hardware fetch values in certain cases and so on and so you had to really make sure that everything's balanced what this means is even though you're emulating a CPU which is about one megahertz or something like this eight megahertz on an Atari St or Amiga say is relatively low spec the actual CPU power you need in the computer that's running the emulator can be considerably higher because you have to make sure each of these things are happening in the right amount of time so they appear on screen at the right amount of time and everything's happening and it can get a lot more complicated very quickly as we said you can create a very simple emulator it's a relatively straightforwardly all you have to do is make sure the right values end up in the registers and in memory at the right time but to actually emulate a more complicated system it gets quite complicated quite quickly and often these things aren't documented so often you will have to sort of write tests that run on the real Hardware assume you can get access to it that test and find out how things actually work out in practice because sometimes the documentation is wrong sometimes the designers of games and things have push this so far that they're beyond what the actual Hardware designers expected things to do so some of the things you end up trying to emulate are specified the CPU and things but actually the way that interacts with the other bits of Hardware some of it's documented and some of it is just sort of a byproduct of the way things have been built and to make sure the software has exactly the same effect in the emulator as it would on the real Hardware you need to make sure that that's those same side effects happen often at the same point in time and that's how writing an emulator becomes more and more complicated you have to do tests to find out how the hardware actually works in practice rather than just the way it looks like it should work and you need to keep all the different parts of the hardware synchronized not necessarily in terms of real time but certainly in terms of in relation to each other so if the real Hardware takes this long to display a line of graphics on screen and the CPU executes that many instructions in that same amount of time then your emulator needs to emulate that number of instructions in the same amount of time it takes the graphics emulator emulation to display those things and so you need to keep track of how many cycles it's often referred to each instruction takes place with and then you can advance the graphics by the same amount and so on as you're going through even if you're writing a simple emulator though there are certain things you need to keep track of for example on the Apple M1 Max the Macos running on there can emulate an x86 64 CPU very effectively and but even there they've had to be careful because the memory model of an armed CPU is different from an x86 CPU what I mean by that is x86 has certain design decisions made which means that in this case this will happen particularly if you have multiple threads running alongside each other there's a strict ordering of when memory will get written back which the arm CPU doesn't Implement so if you're emulating an x86 CPU on an armed CPU you have to make sure that you also emulate that memory ordering and things in there so you can get something going relatively quickly that emulates the CPU but to get it totally accurate you often have to really delve deeply into how the hardware actually works and then Implement that in your software maybe it's going to work for 90 of the internet or 80 for the internet but it's going to die for the other 10 or 20 this is the ossification chance of getting the train within three cycles and then if you go any longer then it's going to take you longer you know yourover the Christmas period I picked up this game for my PlayStation Atari 50 which is a sort of interactive history of the Atari games consoles misses out the stuck but the Atari games consoles and arcade games and one of the things it does is it emulates the game so you can actually play them there's been a new Jaguar emulator written by Richard Whitehouse which has then been released for the PC as well and so I thought what would be an interesting thing to do would be to actually look at how we would write an emulator for a game system or a computer or anything really first thing to start is actually we need to know what's in the system that we want to emulate we're not going to build a complete emulator because we don't do tutorials on computer file but we'll look at things so let's start with a system and let's keep the Atari theme so let's grab a 2600 start off thinking oh it would be really hard to write an emulator then when you actually start looking at it you think actually this is really easy and then when you try and actually make it work for every single bit of software that's ever been written for it you realize it was hard in the first place so you can get something going that will run simple software quite easily but to get the most out of things like the 2600 or an Amiga or an Atari St or whatever it is even a modern PC lots of software tricks are often used and getting them to emulate accurately can be really really hard and that's where things get interesting but let's break this open literally and see where we get started there we go so it's not pushing it peace and then we've got the circuit board and most of this is just stuff to make it appear on the TV screen so we've got the TV modulator over here a few buttons for input and output sockets for the joysticks on the back a few switches that's power input quite useful doesn't work without it but the main things that we're interested in if we're trying to build an emulator are these three chips here the one in the middle this one is the CPU it's a variant of the 6502 CPU so we'll need to emulate that and by emulate I mean we'll need to write software which does exactly the same thing for the CPU drills won't give them the same code um this one over here is the wonderfully named Riot chip that's short for Ram input output and timers uh chip here and we'll need to emulate that as well and we've got down here the television interface adapter the Tia chip which is one that produces the graphics and so on that actually gets displayed on screen of course the other thing which is the game the cop the Atari is well known for is that you have the cartridges which you slot into here and actually all the cartridge was was a ROM chip which contained the program so your actual program code was in the cartridge so that was there and then the CPU was here and the ram was in the chip here to go actually and you had the wonderfully high amount of 128 bytes of memory we're not going to go into the details of how the 2600 works that's another video and we should probably already be covered part of it but we need to know where those things were so that when we emulate the CPU and we write a particular memory location or read from a particular memory location we know whether we're getting data from the cartridge getting data from the RAM and the riot chip or whether accessing some other Hardware that's attached to it so the first thing that we need to know when building an emulator is what's the memory like this is pretty much exactly the same things you need to know when designing a computer or designing a games column in the first place you need to know where things are going to be so you can build the hardware to decode it we're not building Hardware we're writing software but we need to do the same sort of thing in there exactly that's exactly what we're going to do we're writing software that pretends it's Hardware so on the 2600 we have address zero down at the bottom we have FF FF up here yes I know the 6507 CPU variant in the 2600 only has a limited number of address pins so only goes up to one FFF but in terms of the 6502 it's still try and access the high addresses to fetch things like the reset Vector we'll come back to that so at the top we would have the ROM and that would be a line to the top of it is a ffff f um so that the reset vector and things are in the right place so the low 128 bytes of that are for the Tia chip and you access that with those addresses the next 128 bytes are your RAM 128 bytes of it and then it addressed two zero zero zero or so in HEX you have the rest of the riot chip that you can access there so we now know where things are in memory and we can relatively easily duplicate that in a program the ROM cartridge if we've got the code for it we can load that into an array and we can access the bytes from that relatively easily again the ram we can load that into an array access the bytes right bikes into that when we Access Memory there easy to do as we said when you start off it looks complex as you start to break it down and so I thought oh this looks relatively easily one array two arrays easy to implement then it starts gets harder again for things like the input output device is the Tia the riot your keyboard your mouse will be doing a computer or something they don't necessarily memory locations but when the CPU reads from there you can write software that basically says if it is this location call this routine to process the value that's being written or call this routine to produce the value that's being read from that location so the memory side of things starts off looking straight forward the other thing we then need to implement is of course our CPU and then that's normally in Hardware would talk to the memory via a databus and an address bus so the address bus will contain the address of what it wanted to access and the data will be funneled over the data bus as we said we can emulate that our emulation the CPU can produce an address and then we can then fetch it or store it based on the data that we want to access so the question is the real question is how do we emulate the CPU well to do that we need to understand how the CPU works you need to understand what registers it has what internal values they have for we can store things we need to know what it does when it starts up we need to know what the instructions are and the effect that they have and then we can write a program that does exactly the same thing so let's have a think about the 6502 CPU and the only reason we're using that is because it's dead simple that's based inside the 2600 or something it's also inside the NES the Nintendo Entertainment System the snares use the later variation of it the 658c16 I think it was um and so on and the 6502 has been around for donkeys years pet the Commodore 64 the BBC micro or 6502 based if I remember correctly or variance of it should we say and things so the 6502 based CPUs are dead simple we've got three registers a X and Y and they all store eight bits so we've got an 8-bit for that I've got eight bits for that and we've got eight bits for that we also have a stack pointer which we will call S which is also eight bits long but the 6502 does something Slightly bizarre in that it always prepends a one to it so your address is all four between one zero zero and one FF in hexadecimal in memory which Maps quite nicely to our display here because our Ram starts from FF down to 7f but the way it's implemented is it also appears at 100 to 1 FF so you can use the Rambo for just general purpose things and also for the stack which you need for the CPU to work properly things like calling a subroutine will only work with the cpu's got a stack for that to store the return address in and things it's the way the CPU works when you call a procedure effectively it needs to store where to go back to and it does that by putting it on the stack so that you can then pull it off when it's finished again starting to get slightly complicated because we thought these things only appeared here but actually they now appear at the same thing so there's two addresses that can access the same thing so we need to write our software that emulates the memory to cope with both set of addresses and get the same thing because if you access them either value you get the same result starts to get slightly more complicated still not that complicated yet we'll come to that so we've got our axy we've got our stack pointer which is one and then the value in there we have a program counter and this is 16 bits long I'm here and that stores the address of the next instruction we're going to execute the CPU works by fetching an instruction from memory executing it then fixing the next one and it knows where to get it from based on the program counter here so that's a 16-bit value and then finally we have a set of processor Flags which are generally referred to as p and which are eight bits long and depending on the instructions they get set with specific values so we can break these down for example we have a flag n which is set if the instruction produces a negative result so if you say subtract one number from another and it's negative that bit in the practice of flags will get set to say that it's produced a negative result there's another one which is set if it produces a zero result um there's one that happens if there's a overflow a carry I'm not writing these in the right order because one tells if there's interrupts enabled there's another one that tells it if it's in decimal mode which is a pain in the neck to implement and then there's one that tells it if the break has happened of course you need to make sure if you're implementing it that they were in the right order so that's basically the internal state of the 6502 CPU it is relatively straightforward and we could easily write a program to store this state we could have a variable called a which stores the value in the a register we could have a variable called what do you think Sean uh well I would have called it B but they called it y so is it Y no it's X so yeah we'll have a variable called X to store the x that is a variable called y to store the Y register the variable called s which stores the stack pointer a variable called PC which is 16 bits long which stores the program counter and a variable called P which can store the processor Flags so again relatively straightforward we can create these just use normal variables to implement them then we need to think about what does the CPU do when it starts up with a 6502 what it says happens is no idea what's going to be in the registers except it will go and fetch into the program counter from memory locations fffc and fffd so it'll fetch those two bytes and put them in the program counter as the first address it's going to access which is why our ROM cartridges are mapped in at the very top of memory because then those addresses are in the ROM and the ROM can specify where it wants to start its program in there in the games console and so when we write our program the first thing that we'd want to do is to set the program counter to equal the location in memory at address 0xffc oops I missed an F in there and then we just need to do some bit shifting to get the location at zero xff D it's a little engine CPU hate Little Engine CPUs big ending is the way to go I'm a Motorola fan and we'll shift it left eight bits to get it into the right place so we fetched the address of the first instruction we are emulating what the CPU do the hardware that implements it here when it comes out of reset when it starts we'll take the address that's in memory which will be in the ROM cartridge on this system or the different computer system will be in a different place gets the address there puts into the program counter and then goes and fetches the first instruction so what our program would need to do and we'll write this in pseudo code is fetch instruction at PC so we fetch the byte that's there on the 6502 the instructions are by long on different CPUs there are different widths if you're trying to emulate the x86 CPU uh good luck to you they're an absolute nightmare they can be anything from 1 to 15 bytes long pain in the neck to emulate so we can fetch the instruction there and that gives us what's called the op code we then increment the program counter so it's pointing at the next instruction we then need to emulate that particular instruction and the way we can do that is look at the op code and then if the opcode has value 0 we'll do one instruction if it has its value one we'd do a different instruction it has value two we'll do another instruction so we could use something like a switch statement and just say switch zero do this which one do that switch to do this to have the same effect as the CPU would have when that instruction is executed so let's have a look at a very simple instruction this one has the value A9 which if you were to look at up in the instruction is the load a register with an immediate value which just means it follows it in memory so if we saw the value of A9 we would then write some code that would read the next value from memory which we've already incremented the program counter to point to so we'd read that and then we would take it and write it into our variable a one other thing we need to do is we have these flags that we need to set based on the value of the variable a so for example if this is 65 it's not negative so we'd set that to be false it's not zero so we'd set that to be false and because of the way that LDA is specified in the documentation for the 6 over 2 we don't change any of the other flags in the flags register and the status register so what we'd have to do for every instruction is write a piece of code that would do the same thing for a x y s p c and P that the CPU would do and as long as it has the same effect when we then execute the next instruction because the state's been updated to have the same thing that the hardware would have that instruction will have the same effect now you might think there's 256 values in a byte that means I'm going to implement 256 routines actually it's not quite as bad as that because a lot of the um instructions are similar so for example the one we've got here the A9 routine we can actually break down to a set of bits the end one zero and then you have three bits we'll call them B here which specify how the addressing mode works and we'll have three bits a here that specify what the instruction is so these three bits will tell you how to fetch the value from memory is it an immediate value do you read it from another memory location and these will tell you what do you do with that value so it might be in this case LDA which means we load the accumulator the a register with it it might be that we add the value on to the a register or we subtract the value or so on so actually you can split this down and say well I need to write eight different routines to do the actual operation and eight routines that fetch the value for memory and then 30 years only 16 routines you have to write not what looks like it might be 64. so it's not quite as bad as it seems and some of the other ones will have similar effects as you do that so it starts to look like writing it wouldn't be too hard but we've only considered the CPU and what x it's executing remember we said that the CPU is actually talking to the other chips on the system so it's talking to the right chip and it's talking to the television interface adapter to cause things to appear on the screen to access the joysticks and so on to know what you're doing uh as you're playing the game and this is where things become more complicated because each of the instructions takes a certain amount of time to execute and the hardware is doing things it takes a certain amount of time to execute if you go back and watch the video where I looked at how you wrote software for 2600 one of the things you had to do was change the registers in the television interface adapter at exactly the right Point as it was traveling across the television screen to change color to set things up to appear on the right point in the right line if you mean right it'll be at a different point to where the television interface adapter is and so it'll have a different effect of what appears on screen it might execute the right code but what appears on screen wouldn't be what you'd expect it to be so you actually have to write the code here and the code that emulates the television interface adapter or any other Hardware in your system so that you get them working at the same speed that they're in lockstep if you want to get exactly the same effect now for something like a games console that's really important so that the games work fine if you're just wanting to run a program um say something like a word processor it's probably less important you could probably write an emulator for say a BBC micro that could run a word processor run basic without worrying about that you could provide the data when you got to it which key's been pressed for example you could output the data onto the screen and display it and it would work fine for something like that because it's not so tightly coupled between the timing of things but when you you go to things like gaming and sort of you talking to other bits of hardware and so on then you really have to make sure that the timing is absolutely right and that's where things get interesting because something like the 6502 is relatively straightforward in terms of targeting if you look at something like the 68000 which was used in the Amiga or the Atari St or the Sega Game savior Mega Drive saber Genesis and things you find that the length of time and instruction takes Can Depend depending on the way the hardware is built on the instructions next to it the way the hardware is implemented means that some instructions may take longer at certain times than others on the Amiga for example if the CPU could be stalled while the graphics Hardware fetch values in certain cases and so on and so you had to really make sure that everything's balanced what this means is even though you're emulating a CPU which is about one megahertz or something like this eight megahertz on an Atari St or Amiga say is relatively low spec the actual CPU power you need in the computer that's running the emulator can be considerably higher because you have to make sure each of these things are happening in the right amount of time so they appear on screen at the right amount of time and everything's happening and it can get a lot more complicated very quickly as we said you can create a very simple emulator it's a relatively straightforwardly all you have to do is make sure the right values end up in the registers and in memory at the right time but to actually emulate a more complicated system it gets quite complicated quite quickly and often these things aren't documented so often you will have to sort of write tests that run on the real Hardware assume you can get access to it that test and find out how things actually work out in practice because sometimes the documentation is wrong sometimes the designers of games and things have push this so far that they're beyond what the actual Hardware designers expected things to do so some of the things you end up trying to emulate are specified the CPU and things but actually the way that interacts with the other bits of Hardware some of it's documented and some of it is just sort of a byproduct of the way things have been built and to make sure the software has exactly the same effect in the emulator as it would on the real Hardware you need to make sure that that's those same side effects happen often at the same point in time and that's how writing an emulator becomes more and more complicated you have to do tests to find out how the hardware actually works in practice rather than just the way it looks like it should work and you need to keep all the different parts of the hardware synchronized not necessarily in terms of real time but certainly in terms of in relation to each other so if the real Hardware takes this long to display a line of graphics on screen and the CPU executes that many instructions in that same amount of time then your emulator needs to emulate that number of instructions in the same amount of time it takes the graphics emulator emulation to display those things and so you need to keep track of how many cycles it's often referred to each instruction takes place with and then you can advance the graphics by the same amount and so on as you're going through even if you're writing a simple emulator though there are certain things you need to keep track of for example on the Apple M1 Max the Macos running on there can emulate an x86 64 CPU very effectively and but even there they've had to be careful because the memory model of an armed CPU is different from an x86 CPU what I mean by that is x86 has certain design decisions made which means that in this case this will happen particularly if you have multiple threads running alongside each other there's a strict ordering of when memory will get written back which the arm CPU doesn't Implement so if you're emulating an x86 CPU on an armed CPU you have to make sure that you also emulate that memory ordering and things in there so you can get something going relatively quickly that emulates the CPU but to get it totally accurate you often have to really delve deeply into how the hardware actually works and then Implement that in your software maybe it's going to work for 90 of the internet or 80 for the internet but it's going to die for the other 10 or 20 this is the ossification chance of getting the train within three cycles and then if you go any longer then it's going to take you longer you know your\n"