'Accidental' CrossCompiler - Computerphile

**Title: The Evolution of Cross-Compilers and Intermediate Codes: A Journey Through Hardware Challenges**

---

### Introduction to Cross-Compilation and Hardware Bridging

Today, we delve into a fascinating narrative about cross-compilation and the challenges of bridging hardware components in an era where compatibility was not as seamless as it is today. This story follows the development of the first cross-compiler, a tool that allows code written for one machine to run on another. The journey began with an accidental discovery while working on a project involving a Z80 chip and a Linotronic 202 typesetter—a device known for its non-standard parallel port.

The challenge was significant: the Linotronic 202 required data to be sent via its own proprietary parallel interface, which posed a problem for systems that could not directly communicate with it. To address this, a single-board Z80 chip was commissioned. This chip would handle data as serial input but output it as parallel, ensuring compatibility with the Linotronic's port.

### The Role of Steve Marchant and His Bridging Board

Steve Marchant, an electronics engineer, played a crucial role in this endeavor by building the bridging board. Despite being "hardly high tech," this board was essential for facilitating communication between the Z80 chip and the Linotronic 202. Steve adhered to the philosophy that everything should be done in assembler rather than C, which he deemed unnecessary for such a low-level task.

Steve's contributions extended beyond hardware; he also provided a monitor port for debugging purposes. This port, referred to as a VDU (Video Display Terminal) or simply a dumb terminal, allowed developers to see the flow of characters into and out of the system—a lifeline during the debugging process. His generosity was evident in the 2k RAM he allocated for error messages, which was a significant allocation at the time.

### Memory Constraints and Their Implications

The limited memory on the Z80 board posed a challenge, with only 2k of RAM available. This restriction forced developers to be resourceful. Julian, who took over the project later, found himself in a bind when error messages filled up most of the available RAM, leaving barely 40 bytes free. The solution was to request more memory from Steve, who responded by adding another 2k of RAM. This split the memory into two sections: one for character buffering and another for storing error messages.

### Programming Challenges and Solutions

The programming environment was far from ideal. The Whitesmiths C compiler reduced code to assembler, allowing developers to insert assembler instructions directly into their C code. This hybrid approach allowed them to write high-level code while handling low-level details, ultimately resulting in a large assembler-level program that was managed through a shell script.

### Physical Transportation of Code

The process of transferring the compiled code to the target board involved a physical step. The code was burned onto an EEPROM (Electrically Erasable Programmable Read-Only Memory), which was then inserted into the target board. This method, while effective, was laborious and prone to failure, requiring developers to physically walk over to the machine each time they wanted to test their code.

### Philosophical Reflections on Cross-Compilation

Reflecting on this setup, the developer pondered the limitations of 4k memory for an 8-bit microprocessor like the Z80. He imagined the ease of having more memory but acknowledged that such a luxury was not feasible at the time. This led to broader questions about cross-compilation and the feasibility of hosting a compiler directly on the Z80, which would likely be too slow but theoretically possible.

### Ken Thompson's Approach: Invading Enemy Territory

The story shifts focus to Ken Thompson, who faced a similar challenge with the Linotronic 202. Instead of creating a bridging board, he chose a different path—invading the "enemy territory" by taking over the Naked Mini computer that controlled the typesetter. Ken's approach was to port his B interpreter onto this machine, allowing him to run his cross-compiler and eventually bypass the intermediate hardware entirely.

### Steve Bourne's Contribution: Algol 68C and Z Code

Steve Bourne, known for his work on the Bourne shell in Unix, shared insights from his own experiences at Cambridge. He discussed the challenges of generating code for Z80 boards using an IBM 360 mainframe. His solution involved creating a stripped-down version of the Algol 68C compiler to fit into limited memory, which required the full compiler to temporarily balloon in size before producing its final output.

Steve emphasized the difficulties of porting compilers to new architectures and proposed an intermediate code approach. He introduced "Zed Code," a low-level intermediate representation that allowed compilers to generate Z code instead of machine-specific binary. This approach reduced the need for rewriting entire compilers for each new architecture, requiring only an interpreter for the target machine.

### The Quest for a Universal Intermediate Language

Steve's vision of a universal intermediate language, or "Z code," was ambitious but faced challenges due to architectural differences. His colleagues at Bell Labs preferred the term "Z code," while Steve remained loyal to his British roots, referring to it as "zed code." Despite its potential, the quest for a one-size-fits-all intermediate language proved elusive, with architectures remaining too varied to accommodate a single universal solution.

### Conclusion: The Role of Intermediate Codes

Reflecting on this journey, we see how intermediate codes play a crucial role in cross-compilation. By emitting an intermediate representation like Z code, developers can focus on writing the front-end (syntax analysis) once and only need to write an interpreter for the new architecture to complete the process. This approach significantly simplifies the task of porting compilers across different machines.

### Final Thoughts

As we conclude this exploration into cross-compilation and intermediate codes, it's clear that while universal solutions are challenging, the lessons learned from pioneers like Steve Marchant, Ken Thompson, and Steve Bourne continue to shape how we approach compiler development. The challenges faced in the early days of computing remind us of the importance of innovation and adaptability in overcoming technical limitations.

---

**Note:** This article is a transcription and expansion of the provided video content, maintaining the original narrative while organizing it into a coherent structure for readability.

"WEBVTTKind: captionsLanguage: enwhat we're going to look at today is to pick up the story that we've been involved in for quite a while now of accidentally discovering that one's written across compiler the first ever one i wrote and i almost didn't realize was getting into it for those of you who've been following the story so far i've been trying to motivate t diagrams as a way of understanding compilers in general to illustrate the idea of moving compilers around or coping with hardware that won't talk to one another we did a before arduino and raspberry pi we say everybody speaks very quickly to each other nowadays was it like that then not a chance there wasn't a component a universal component you could put in the middle that would bridge a to b in a very very powerful way for the infamous liner type 202 jailbreak problem that we've covered a lot in videos this was the one step before laser printers came on the scene the trouble was that the 202 type setter wanted everything down its own non-standard parallel port what i had to do was to commission a single board z80 chip so fairly low level character at a time stuff was coming down here it was been sucked into the z80 board as uh serial because that was simpler but output as parallel to be compatible with a parallel input port on the linotronic 202 typesetter i would point out that our bridging board here was hardly high tech steve marchant who built it for me was a typical electronics engineer you shouldn't be messing about with c everything should be done in assembler i'm being very generous with you i'm giving you a 2k rom to put your program text in and if you can generate that out of c and not overflow fine but also bless him he did give us an absolute lifeline which comes off the top of this diagram he did provide a monitor port for vdu if you're in the uk vdt if you're elsewhere in the world video display terminal dumb terminal just showing you the flow of characters into and out from and that saved our lives many a time in debugging what was going wrong the problem is that if you're going to put error messages out on the vdt they have to be stored somewhere he provided us very generously with 2k of ram which is beyond the dreams so yes it the 2k ram got very full because i remember julian who took this project over from me coming and saying we would get on very well if only we could do without the error messages in the monitor but we can't so we phoned up steve who uh was not very sympathetic and said it was all due to our being computer scientists and not electronics engineers and he had given us a thing for very good reasons where at the low end of memory was our 2k of ram sitting on top of that was the 2k of rom in the memory space which was actually reading the characters preparing them for parallel listening for responses all this kind of stuff so when julian and i said can you give us more ram you know what he did he gave us 2k more around sitting on top of that law so our ram holding was split into two never mind said julian quite rightly he said we'll keep the lower level of around for the actual buffering of the characters we'll keep all the error messages in the upper bit he said you know frankly we're only about 40 bites short of what we need but we have to have more anyway we did our plod interfacing program wrote it in c but the whitesmiths system reduced everything to assembler so they had their own z80 assembler so you could do assembler inserts and these could be linked in to your c generated low level code all in one big happily fam family but in the end it got united into a single big assembler level programming around the assembler he was scarcely aware of doing it because by this time it was all packaged up into a shell script and you just sort of ignored the detail were happy for it to happen physically what happened was you prepared your program you turned on the prom burner it's a eeprom an eeprom is a programmable read-only memory electrically driven as a you made a one-off effort and you burnt your program in there and you took it off your target board you plugged it in you hoped it worked if it didn't work then there was an electrically erasable option to it and you could reuse it i don't know i mean eventually i think would have got tired but i don't think we ever had to use more than one but anyway that was the way it worked so it was a physical transportation mechanism you walked over you put it in to the driver board and you thought this time i've got it right it's going to work well looking at diagrams like this even at the time i began to get sort of quite philosophical about this and think well you know there's so many questions you could ask here here am i sitting here producing my cross compiled z80 code and it is cross-compiled because it's a different binary for the z80 than it is on the pdp-11 where it's actually been produced wouldn't it be wonderful if only i could have far more memory for data and for program on that board i mean 4k may be enough for an electronics engineer but really let's get serious it's pathetic i also thought well we do this and we come choose to come in an interface at the one character down a parallel port level now we knew actually the liner tron 202 typesetter was front ended not by a pdp-11 but by another 16-bit mini-computer it was called the naked mini it was made by a firm called computer automation it was a 16-bit machine generally speaking it was slower than the pdp-11 would have been but it was adequate enough for the clunky electromechanical backing and forwarding of bromine that the 202 had to indulge in liner type had bought in an off-the-peg sort of solution i think the reason why it was called the naked mini actually was that you could get a version with really no frills and furballs no case it was a it was a component you could build into a bigger system and all that and everything you added onto was an extra probably including the keyboard knowing that a lot but you see the more courageous attitude which bell labs did but they have how shall we say more experience better skills and better lawyers than we had is to say blow this that's not really a very good place to intervene why don't we mount an invasion and invade the naked mini and take it over completely and that's what ken thompson did of course in the great jailbreak got himself on there his favorite implementation language was the bee language which remember bcpl b is the first interpreted language still typeless c came along developed mainly by dennis but consulting ken to be the system implementation language so the first ken's first reaction at getting strange machine is right i'll look up how the assembler works look up how the code is right down below and overnight i will port my uh b interpreter and get it working on here and of course within a few weeks it got absolutely everything in there working off his cross compiler cross interpreter and everything and in many ways that was the way to do it because he could then decide do i port the whole software mechanism onto that machine or would i be better to say i know enough now to drive it directly from the pdp 1144 and bypass that largely so a lot of time was spent with his software decoding what the heck the 202 was up to in terms of rendering fonts and it took a long time to figure out what they were doing with the characters and so on so it still prompts the issue do you invade the enemy territory or do you stand off in the background and at arm's length using tongs just gently feed it the odd serial line character and we had to we just had to take that letter view but with my thoughts about wouldn't life be easier if only i had a bigger z80 memory could i actually host the compiler on that would it be hopelessly slow yes it probably would but in theory could i do it although it's an 8-bit micro and appeared coincidentally or not in the sinclair zx80 zx81 around about that time there was a lot of people asking could we cross compile stuff for the zx80 and cross-compile it and run it and at cambridge at the time there was a gentleman who will be known to many of you because he became famous in the eunuchs world steve bourne steve bourne is perhaps best known for writing the canonical-born shell that comes as part of unix but before he took the job at bell labs he was a phd student and a researcher at cambridge and he wrote a system called algol 68c which is dialect of the algol 68 language and his big aim since alcohol 68 was his baby was i can only imagine because there's reports of these experiments going on can i generate code on here for the z80 and what is the best way for me to do this so all these z80 boards we're working with with clive sinclair and his boys you know it would be lovely to be able to prepare lots of code for them what's the best way to do this well they had a great big ibm 360 machine and what they discovered was their precious algol 68 compiler and they went through all the tricks we've covered in previous episodes of getting it to compile itself or getting it to compile a stripped-down version of itself what they found was that in the process of preparing that stripped down version if i've got this right the actual compiler itself had to get very big for a while in order that it could generate a smaller version that would fit on the z80 and when you say well what's very big 100k and you're limited by the architecture on a z80 to 64k but no solution you had to cross compile yeah and and they did but i think also which is perhaps even more interesting in fact i went to see steve at the time and i don't know that i've actually seen him since this really was in the very early 80s i remember it well i think he was back on vacation from bell labs and he said you know the more you look at this business of transporting or porting as it's now called a compiler to a new environment the more you realize what a mess we get ourselves into we've got this language which gets compiled down close to binary on this machine then we want to do it on another machine like the spectrum and if we're not careful we start all over again so we must do the whole chain must get everything ready and all that but he said you know the hardest bit in writing a compiler is not all the syntax analysis and deciding what to do it's actually doing it in the code generator against an environment that can often seem very very hostile indeed wouldn't it be great if one could have a sort of assembler level vaguely intermediate code that everybody could use now he was one of the early people to realize this and he said and i've invented this thing called zed code so my compilers now don't go to a specific binary in one step they emit z code and then for the new machine you don't have the whole big compiler to transport what you're saying is the z code is the back end the syntax analysis whatever is the front end all you need to do for a new architecture is write a z code compilable interpreter i said said code he said yes my new colleagues at bell labs want to call it z code and i keep telling them i'm from the uk it's my baby it's zed code this approach at the time became so well known and so disgust that it even had its own title the uncle problem uncle i think he stood for universal computer language the forlorn hope was that it might emerge from all those doing compilers that there would be one superbly capable intermediate language which every single compiling system in the world could subscribe to and be happy with its facilities and so on so the search for the uncoal solution predictably it doesn't work like that there's always enough difference in architectures and maybe even today that it is not easy to have a one-size-fits-all so i think this is a good place to draw a line for the moment and so we've got to investigate how do intermediate codes work how do they help you pour to compiler from machine a to machine b we vaguely see that it can obviously help because all you're having to write to get a foothold is an intermediate code interpreter for the new back end and that gets you started what's your favorite computer shortcut key oh ctrl c ctrl c because uh it just cancels the things that i've done wrong control c because it breaks my programs basically when they when they've when they've got entered infinitely square two to five hundred pixels squared usually training them takes a lot of memory oncewhat we're going to look at today is to pick up the story that we've been involved in for quite a while now of accidentally discovering that one's written across compiler the first ever one i wrote and i almost didn't realize was getting into it for those of you who've been following the story so far i've been trying to motivate t diagrams as a way of understanding compilers in general to illustrate the idea of moving compilers around or coping with hardware that won't talk to one another we did a before arduino and raspberry pi we say everybody speaks very quickly to each other nowadays was it like that then not a chance there wasn't a component a universal component you could put in the middle that would bridge a to b in a very very powerful way for the infamous liner type 202 jailbreak problem that we've covered a lot in videos this was the one step before laser printers came on the scene the trouble was that the 202 type setter wanted everything down its own non-standard parallel port what i had to do was to commission a single board z80 chip so fairly low level character at a time stuff was coming down here it was been sucked into the z80 board as uh serial because that was simpler but output as parallel to be compatible with a parallel input port on the linotronic 202 typesetter i would point out that our bridging board here was hardly high tech steve marchant who built it for me was a typical electronics engineer you shouldn't be messing about with c everything should be done in assembler i'm being very generous with you i'm giving you a 2k rom to put your program text in and if you can generate that out of c and not overflow fine but also bless him he did give us an absolute lifeline which comes off the top of this diagram he did provide a monitor port for vdu if you're in the uk vdt if you're elsewhere in the world video display terminal dumb terminal just showing you the flow of characters into and out from and that saved our lives many a time in debugging what was going wrong the problem is that if you're going to put error messages out on the vdt they have to be stored somewhere he provided us very generously with 2k of ram which is beyond the dreams so yes it the 2k ram got very full because i remember julian who took this project over from me coming and saying we would get on very well if only we could do without the error messages in the monitor but we can't so we phoned up steve who uh was not very sympathetic and said it was all due to our being computer scientists and not electronics engineers and he had given us a thing for very good reasons where at the low end of memory was our 2k of ram sitting on top of that was the 2k of rom in the memory space which was actually reading the characters preparing them for parallel listening for responses all this kind of stuff so when julian and i said can you give us more ram you know what he did he gave us 2k more around sitting on top of that law so our ram holding was split into two never mind said julian quite rightly he said we'll keep the lower level of around for the actual buffering of the characters we'll keep all the error messages in the upper bit he said you know frankly we're only about 40 bites short of what we need but we have to have more anyway we did our plod interfacing program wrote it in c but the whitesmiths system reduced everything to assembler so they had their own z80 assembler so you could do assembler inserts and these could be linked in to your c generated low level code all in one big happily fam family but in the end it got united into a single big assembler level programming around the assembler he was scarcely aware of doing it because by this time it was all packaged up into a shell script and you just sort of ignored the detail were happy for it to happen physically what happened was you prepared your program you turned on the prom burner it's a eeprom an eeprom is a programmable read-only memory electrically driven as a you made a one-off effort and you burnt your program in there and you took it off your target board you plugged it in you hoped it worked if it didn't work then there was an electrically erasable option to it and you could reuse it i don't know i mean eventually i think would have got tired but i don't think we ever had to use more than one but anyway that was the way it worked so it was a physical transportation mechanism you walked over you put it in to the driver board and you thought this time i've got it right it's going to work well looking at diagrams like this even at the time i began to get sort of quite philosophical about this and think well you know there's so many questions you could ask here here am i sitting here producing my cross compiled z80 code and it is cross-compiled because it's a different binary for the z80 than it is on the pdp-11 where it's actually been produced wouldn't it be wonderful if only i could have far more memory for data and for program on that board i mean 4k may be enough for an electronics engineer but really let's get serious it's pathetic i also thought well we do this and we come choose to come in an interface at the one character down a parallel port level now we knew actually the liner tron 202 typesetter was front ended not by a pdp-11 but by another 16-bit mini-computer it was called the naked mini it was made by a firm called computer automation it was a 16-bit machine generally speaking it was slower than the pdp-11 would have been but it was adequate enough for the clunky electromechanical backing and forwarding of bromine that the 202 had to indulge in liner type had bought in an off-the-peg sort of solution i think the reason why it was called the naked mini actually was that you could get a version with really no frills and furballs no case it was a it was a component you could build into a bigger system and all that and everything you added onto was an extra probably including the keyboard knowing that a lot but you see the more courageous attitude which bell labs did but they have how shall we say more experience better skills and better lawyers than we had is to say blow this that's not really a very good place to intervene why don't we mount an invasion and invade the naked mini and take it over completely and that's what ken thompson did of course in the great jailbreak got himself on there his favorite implementation language was the bee language which remember bcpl b is the first interpreted language still typeless c came along developed mainly by dennis but consulting ken to be the system implementation language so the first ken's first reaction at getting strange machine is right i'll look up how the assembler works look up how the code is right down below and overnight i will port my uh b interpreter and get it working on here and of course within a few weeks it got absolutely everything in there working off his cross compiler cross interpreter and everything and in many ways that was the way to do it because he could then decide do i port the whole software mechanism onto that machine or would i be better to say i know enough now to drive it directly from the pdp 1144 and bypass that largely so a lot of time was spent with his software decoding what the heck the 202 was up to in terms of rendering fonts and it took a long time to figure out what they were doing with the characters and so on so it still prompts the issue do you invade the enemy territory or do you stand off in the background and at arm's length using tongs just gently feed it the odd serial line character and we had to we just had to take that letter view but with my thoughts about wouldn't life be easier if only i had a bigger z80 memory could i actually host the compiler on that would it be hopelessly slow yes it probably would but in theory could i do it although it's an 8-bit micro and appeared coincidentally or not in the sinclair zx80 zx81 around about that time there was a lot of people asking could we cross compile stuff for the zx80 and cross-compile it and run it and at cambridge at the time there was a gentleman who will be known to many of you because he became famous in the eunuchs world steve bourne steve bourne is perhaps best known for writing the canonical-born shell that comes as part of unix but before he took the job at bell labs he was a phd student and a researcher at cambridge and he wrote a system called algol 68c which is dialect of the algol 68 language and his big aim since alcohol 68 was his baby was i can only imagine because there's reports of these experiments going on can i generate code on here for the z80 and what is the best way for me to do this so all these z80 boards we're working with with clive sinclair and his boys you know it would be lovely to be able to prepare lots of code for them what's the best way to do this well they had a great big ibm 360 machine and what they discovered was their precious algol 68 compiler and they went through all the tricks we've covered in previous episodes of getting it to compile itself or getting it to compile a stripped-down version of itself what they found was that in the process of preparing that stripped down version if i've got this right the actual compiler itself had to get very big for a while in order that it could generate a smaller version that would fit on the z80 and when you say well what's very big 100k and you're limited by the architecture on a z80 to 64k but no solution you had to cross compile yeah and and they did but i think also which is perhaps even more interesting in fact i went to see steve at the time and i don't know that i've actually seen him since this really was in the very early 80s i remember it well i think he was back on vacation from bell labs and he said you know the more you look at this business of transporting or porting as it's now called a compiler to a new environment the more you realize what a mess we get ourselves into we've got this language which gets compiled down close to binary on this machine then we want to do it on another machine like the spectrum and if we're not careful we start all over again so we must do the whole chain must get everything ready and all that but he said you know the hardest bit in writing a compiler is not all the syntax analysis and deciding what to do it's actually doing it in the code generator against an environment that can often seem very very hostile indeed wouldn't it be great if one could have a sort of assembler level vaguely intermediate code that everybody could use now he was one of the early people to realize this and he said and i've invented this thing called zed code so my compilers now don't go to a specific binary in one step they emit z code and then for the new machine you don't have the whole big compiler to transport what you're saying is the z code is the back end the syntax analysis whatever is the front end all you need to do for a new architecture is write a z code compilable interpreter i said said code he said yes my new colleagues at bell labs want to call it z code and i keep telling them i'm from the uk it's my baby it's zed code this approach at the time became so well known and so disgust that it even had its own title the uncle problem uncle i think he stood for universal computer language the forlorn hope was that it might emerge from all those doing compilers that there would be one superbly capable intermediate language which every single compiling system in the world could subscribe to and be happy with its facilities and so on so the search for the uncoal solution predictably it doesn't work like that there's always enough difference in architectures and maybe even today that it is not easy to have a one-size-fits-all so i think this is a good place to draw a line for the moment and so we've got to investigate how do intermediate codes work how do they help you pour to compiler from machine a to machine b we vaguely see that it can obviously help because all you're having to write to get a foothold is an intermediate code interpreter for the new back end and that gets you started what's your favorite computer shortcut key oh ctrl c ctrl c because uh it just cancels the things that i've done wrong control c because it breaks my programs basically when they when they've when they've got entered infinitely square two to five hundred pixels squared usually training them takes a lot of memory once\n"