Hello World (Assemblers, Considered Harmful!) - Computerphile

The Development of Assembly Language and its Role in Computer Science

As I recall, working with the early computer systems was quite an experience. It's amazing how far we've come since then. The development of assembly language is a fascinating story that highlights the ingenuity and innovation of pioneers in the field.

In my work with the EDAC system, I had to create a simple assembler to translate machine code into human-readable instructions. This process was quite labor-intensive, as I had to manually enter each instruction and calculate the addresses accordingly. However, I soon realized the need for an index register to speed up this process. Luckily, David Willer, one of my colleagues, came up with a solution that incorporated an index register, which greatly improved the efficiency of our assembler.

The EDAC system's assembler was not a full-fledged assembly language, but it did provide some basic features, such as binary translation and relocation. However, this was still a two-pass process, which made it prone to errors. I remember having to jump back and forth between different parts of the program, recalculating addresses until I found a suitable location for the label. This was time-consuming and frustrating, especially when dealing with complex programs.

One of my colleagues, David Hartley, developed an assembler for the EDVAC system, which incorporated some new features that improved its efficiency. However, even this assembler had its limitations, as it was not designed to handle labels and conditional statements seamlessly. I recall reading about a grad student who wrote an assembler, only to be criticized by his professor, who deemed it "for the weak-brained" and implied that real programmers should be able to program directly in absolute binary.

The role of assembly language in computer science is multifaceted. On one hand, it provides a simple and efficient way to communicate with machine code, which is essential for low-level programming. On the other hand, it can also be seen as a hindrance to true innovation, as it limits the programmer's ability to express themselves freely.

John von Neumann, a renowned computer scientist, was an early advocate of assembly language. He believed that true programmers should be able to work in absolute binary, without the need for an assembler or any other intermediate layer. This philosophy is reflected in his concept of the "ultimate laptop," which he envisioned as a device capable of performing any task directly, without the need for assembly code.

In contrast, David Willer's approach was more pragmatic, recognizing the limitations and potential pitfalls of assembly language while still acknowledging its utility. His development of an index register-based assembler paved the way for future innovations in computer science, including the creation of modern high-level languages like Fortran and COBOL.

The story of assembly language is a testament to the dynamic nature of computer science, where innovation and experimentation are constantly driving progress. From the early pioneers who developed the first assemblers to the present day, the role of assembly language has evolved significantly, but its fundamental importance remains unchanged. As we continue to push the boundaries of what is possible with computers, it's essential to understand the history and development of this crucial component of our programming toolkit.

"WEBVTTKind: captionsLanguage: eni think as far as i know it was brian kernington and dennis ritchie who first introduced it to me i don't know if he goes back earlier than that but certainly in in the c book there it is printf hello world you know and the use of backslash n to denote a new line at the end of it and all that it's now really become a part of comps our legend the first thing you do when you show that you've mastered a new language be it python or whatever you know yes oh yes here's how to do hello world of course hello world is a characters-based challenge and from what we now know about characters in modern computers at least been stored in addressable bites does it sort of follow then that hello world would be somewhat easier on a bike-based machine oh yes it will be a lot easier on a bike-based machine but there's other things as well so as perhaps an illustration of just how horrible it could be and given that we have done some stuff on edsack already let's go and do that if you haven't seen the other headset stuff i think you'll be able to follow what i'm doing anyway and you could always go back later and pick up some more background about headsack but when we were on this headsack simulator the last time we actually did run the program that martin campbell kelly supplies with it and he got fed up of doing hello world he said i'll just do a brief version it says hi we did that thanks to a combined programming effort now by those in this room i have here the new version hello world srdfb.txt and there it is it's quite a lot longer of course than the previous one was so is each of those lines using a word then yes edsack was designed around the most minimalist set of things it was basically the story was if it's possible to do with what we've got already don't start inventing new flavors of instructions so all you've got here is this is the stuff of course for setting up where the load point is and where the relative offsets of these addresses is relative to 64. the at symbol at the end signals to david wheeler's initial orders that what comes here is a relative address so what it's saying is letter o not a zero output the character which you will find in the memory location 16 further on than 64 is so all these offsets 16 17 18 19 20 are all relative to 64. so in actual fact then it turns out that address 80 holds the very first thing you want to output and of course 16 on from 64 well if 64 is here this is where the actual data starts the zf and the things like that correspond to what are nowadays called assembler directives it's not always the case that these things go one for one into occupying a word some of them are messages to the assembler all this stuff up here is basically saying i want you to remember 64 and start locating everything relative to that because if we looked specifically at the line numbers on the left there that wouldn't be the place you're trying to get to right no this stuff up here is what would probably be done in modern assemblers by saying something like org equals 64. in other words that isn't a program instruction it's telling you the assembler please start me at 64. and it's for your own internal knowledge it's not to be translated into a program instruction so the zf says stop stop execution but in the meantime what we're expecting is that the thing that is 16 on from 64 will actually get us to here for star ref what does star f do star is a short code for saying put yourself in letter shift veterans of five-hole paper tape well no you've got to make sure that you're in letter shift to print meaningful messages the other possible shift is figure shift and all hell breaks loose if you start forgetting to shift out it's just like the shift key on a typewriter that's where it comes from historically can you use that as a very very simplistic code yeah yes possibly um anyway so turn into letter shift and look this makes sense now can you see h f in one word of single length word f means this is a single length word yeah 18 bits actually the op code field for those who've got the headset tutorial the obco op code field is occupied by an h but the o command will output these as if they were characters and meant to be characters they've got to be in the opcode field but the o command says look in the opcode field regard it as not a bordeaux character remember maurice wilkes had invented subtly different but never mind and it's so you end up coming to say oh it's the letter h i am to output with this o instruction with a relative address offset on it and you go all the way look here h-e-l-l-o what's exclamation mark look it up in the edsack tutorial as i had to do that's the marker you put in if you want to force an explicit space between hello and world which we did and we finally what about at f and ampersand f after the d of hello world well let's take a guess we're trying to be neat and tidy make it look good that's the code for give me a carriage return give me a line feed and then we say end of the whole thing end execution and this is a marker also to initial orders you can stop relocating this program for me i'm done okay so that since it's on top now oh fingers crossed on what do we do we do start don't we we noticed that way back up at the top we put in a stop just to make sure because with our incredible knowledge of headstock binary sean and i can see straight away that that of course is hello world isn't it i mean i we're not kidding david wheeler would know that i said hello world i'll tell you something else sean after only half a day's familiarity with this john von neumann would know that that was hello world he found it so comfortable to remember the details of the binary you know i'm sure we would i really do so here we go then let's do a single ep single instruction uh single shot it's sometimes called nowadays right there we are it's still blinking we turned into letter shift with that click next click h oh isn't this wonderful aren't we demon programmers e l l o space yes w o r l d carriage return line feed so that was pretty painful although the t64k gives you relocatability you could change that to be t256k so if you wanted to shove the whole thing up memory and then maybe turn it into a subroutine you want to push it somewhere else in memory so the bulk relocation against the base address is taken care of by digital orders but you've still got to get the offsets right and it's painful it's utterly utterly painful we're now going to jump forward into safe bite addressed territory for handling characters and the arm 32-bit arm chip which we use for teaching assembler programming here to our first years um yeah it is a 32-bit word broken up into four bytes eight bit bytes which of course use ascii not ibm stick fine so down at the assembler level though for the arm then what does the bite addressability give us and what other things have happened between the edsak era and this era where we're talking late 80s 90s this sort of thing what else has happened to make this thing so much more compact so much easier to understand and so much more flexible well let's go in through step by step comments anything after a semiconductor comment i put a comment up at the top saying to put out the hello world we've used the so-called software interrupts the system calls as provided by the university of manchester's komodo arm development environment which is what we use so when we get to actually printing the character out don't get worried by swi i mean software interrupt to ask the operating system to print something for me or something like that so let's start up here programs on the arm will cheerfully expect if you don't tell them otherwise that they will start executing at line one of your program and go madly on i put this data for hello world up at the top of the listing not at the bottom as i could have done but the rule then is if i declare hello world here as being a piece of text and this def b here means just define a bunch of bytes and you put them in quotes like you would in c and even taking over some of its story from c it even allows you to ask for a new line to be put in there with massage n and the only difference is whereas c implicitly plugs its strings with a null character at the end arm doesn't do that for you you must explicitly put in a null character at the end of your string if that is your stop indicator but in order to stop the arm chip executing hello world as if it was bit pants for instructions which you don't want you want to jump past it i'll put in here look an unconditional branch domain branch domain oh now this is wonderful you don't have to say branch to an absolute address and be like david wheeler and john neumann and have them all in your head you just say let's label it maine and this thing called an assembler will work out what main means in terms of the address you want to jump to isn't that wonderful vol neumann stares at you and says that's for the weak brain who can't keep track of their addresses you know anyway so we branch to maine and the first thing it says very self-evidently really is get me the start address of the text string and put that start address into register one next thing we notice as long promised modern cpus have 15 or 16 special purpose registers to make life bearable exact didn't it only had the accumulator and if you wanted other storage places you had to start parking it in memory in all sorts of horrible ways so that helps us straight away r1 is going to be our so-called index register it's going to start off by pointing at the address of h now i don't know what the byte address of h is it might even be relatively zero here is the first thing that happens in this program but whatever it is the actual byte address of h is now in register one here is the crux of the whole thing ldr byte load into a register the byte specified as follows here i say r0 that's the register i want to load it into but where does it come from in square brackets r1 that says look in r1 and you will find an address of the start of that string i don't want you to load the address into r0 i want you to load the character that is at that address into r0 it's in direction and that is indicated by that square bracket not putting the address that's in r1 into r0 and following the pointer from r1 and say oh that's the letter h at the moment and that's what i put into our zero and here's the other cute thing at the end wouldn't those pioneers have given the world for this is to say and when you've done that please for next time around the loop increment that r1 address by one so if it was pointing at 18 shall we say to start with it's 19 now for next time around the loop so you keep on going around that loop and here's the thing where you check whether you've hit the null character compare the contents of register 0 which will be a character contents against literally zero which is what the null character is now is the answer yes or no is it equal or not equal to zero and here's another lovely thing about the uh arm chip that steve and i love dearly this is the 32-bit arm chip i think in the 64-bit one they've it's not so important to do it nowadays they have a thing in the 32-bit one called conditional execution which can save you often using a branch instruction which are relatively expensive in pipeline terms so here we've got swine which is wonderful software interrupt zero says punch out this character for me on the display on the screen but ne says but do that only if the last thing you did didn't yield equal not equal well we're checking for the null character so as long as it wasn't the null character it'll say no i'm not equal to the null character and you print it out and out it comes character by character after that of course you loop back to go around and print another character remembering that the hash one has incremented your address pointer along that string so you keep on going around here you don't have to remember what address loop is you don't know the assembler knows it fixes it up for you and then right at the very end the way to say stop execution i've done it swiping flavor 2 on this emulated environment says stop it completely the development of that from ed sack you think oh my god i am so pleased i've got that and martin the inventor of the ed sex administrator here i emailed him the other day and he came back to me and said yes the need for an index register was realized so quickly that that's why my emulator is early 49 to latest 1950 because in late 1950 david will and everybody say my golly we need an index register and they built one in so in a way then this is what is happening it's that the pioneers were using their early machinery to lead the way and to say what extra facilities do we need to make life tolerable for us now there is the hardware facility of having the index registers and they'd be just become standard kit after this every out of time inject registers but also what interests me is the role of a proper assembler initial orders 2 is not a full-blown assembler it helps you a little bit by turning decimal addresses into binary but you have to remember that that letter a that you put in the leading five bits could be the character a but if you're regarding this is an instruction that's an ad instruction so but then initial orders to is relocating it's relocating doing a bit of binary translation it's a single pass process it's wonderful the problem with assembler is it has to be a two pass process the trouble always is that labor if you jump back to labels you've already seen you will know already what address that will be because when you jump forward how do i know where the heck that label down there's going to be i don't even want to calculate it i want the assembly to say oh i'm on location 186 now how handy but then it can't fix up the addresses till it knows and has counted its way through the program so then it says right i will now output you a definitive thing that you can put in through david wheeler's initial orders too because i've made it so much easier for you because i've allowed labels one doesn't think of labels as being a structuring convention and yet at this low level they are in a way because this is saying loop it starts here another label oh it ends here please calculate the addresses of of what's happening there and fix it up for me and so you might say well all right didn't everybody say we must have assemblers it's the modern way to do things there were very mixed views about this and i don't think headset got an assembly until headset ii when another friend of mine david hartley did i think a macro assembler for exact two not ends that one because a story here related to von neumann as well i don't know whether it was his edvack or his version of edvac he had in his basement called joniac apparently who really berated a grad student who wrote an assembler assemblers for the weak brained who cannot work out their own addresses you do realize that in running this assembler of yours punching out a paper tape i'm behind you in the queue i don't get my turn next you come to me and say ah but this is ready to load now in the second phase binary you're wasting time if you're so weak brained you can't program me in absolute i'm putting words in his mouth but this was essentially it he no doubt had dreams in absolute binary there was no problem with john von neumann about coping as close to binary as possible he could keep it all in his head and he would i think have found initial orders on edsat about yes nice and helpful single pass not slowing down things enough but an assembler you're wasting time on this machine by doing assemblers i mean it's really really brings it home to those of us who always joked about you know real programmers use assembler the answer from certainly from john von neumann possibly from david willer but he wouldn't have been as extreme as that is real programmers use absolute binary as he talks about having the ultimate laptop and in fact when he means the ultimate laptop he's not talking about the limit this is the important thing there's the engineering limits and then there's the pure physical limits his ultimate laptop is a plasma at um just a at a stupidly high temperaturei think as far as i know it was brian kernington and dennis ritchie who first introduced it to me i don't know if he goes back earlier than that but certainly in in the c book there it is printf hello world you know and the use of backslash n to denote a new line at the end of it and all that it's now really become a part of comps our legend the first thing you do when you show that you've mastered a new language be it python or whatever you know yes oh yes here's how to do hello world of course hello world is a characters-based challenge and from what we now know about characters in modern computers at least been stored in addressable bites does it sort of follow then that hello world would be somewhat easier on a bike-based machine oh yes it will be a lot easier on a bike-based machine but there's other things as well so as perhaps an illustration of just how horrible it could be and given that we have done some stuff on edsack already let's go and do that if you haven't seen the other headset stuff i think you'll be able to follow what i'm doing anyway and you could always go back later and pick up some more background about headsack but when we were on this headsack simulator the last time we actually did run the program that martin campbell kelly supplies with it and he got fed up of doing hello world he said i'll just do a brief version it says hi we did that thanks to a combined programming effort now by those in this room i have here the new version hello world srdfb.txt and there it is it's quite a lot longer of course than the previous one was so is each of those lines using a word then yes edsack was designed around the most minimalist set of things it was basically the story was if it's possible to do with what we've got already don't start inventing new flavors of instructions so all you've got here is this is the stuff of course for setting up where the load point is and where the relative offsets of these addresses is relative to 64. the at symbol at the end signals to david wheeler's initial orders that what comes here is a relative address so what it's saying is letter o not a zero output the character which you will find in the memory location 16 further on than 64 is so all these offsets 16 17 18 19 20 are all relative to 64. so in actual fact then it turns out that address 80 holds the very first thing you want to output and of course 16 on from 64 well if 64 is here this is where the actual data starts the zf and the things like that correspond to what are nowadays called assembler directives it's not always the case that these things go one for one into occupying a word some of them are messages to the assembler all this stuff up here is basically saying i want you to remember 64 and start locating everything relative to that because if we looked specifically at the line numbers on the left there that wouldn't be the place you're trying to get to right no this stuff up here is what would probably be done in modern assemblers by saying something like org equals 64. in other words that isn't a program instruction it's telling you the assembler please start me at 64. and it's for your own internal knowledge it's not to be translated into a program instruction so the zf says stop stop execution but in the meantime what we're expecting is that the thing that is 16 on from 64 will actually get us to here for star ref what does star f do star is a short code for saying put yourself in letter shift veterans of five-hole paper tape well no you've got to make sure that you're in letter shift to print meaningful messages the other possible shift is figure shift and all hell breaks loose if you start forgetting to shift out it's just like the shift key on a typewriter that's where it comes from historically can you use that as a very very simplistic code yeah yes possibly um anyway so turn into letter shift and look this makes sense now can you see h f in one word of single length word f means this is a single length word yeah 18 bits actually the op code field for those who've got the headset tutorial the obco op code field is occupied by an h but the o command will output these as if they were characters and meant to be characters they've got to be in the opcode field but the o command says look in the opcode field regard it as not a bordeaux character remember maurice wilkes had invented subtly different but never mind and it's so you end up coming to say oh it's the letter h i am to output with this o instruction with a relative address offset on it and you go all the way look here h-e-l-l-o what's exclamation mark look it up in the edsack tutorial as i had to do that's the marker you put in if you want to force an explicit space between hello and world which we did and we finally what about at f and ampersand f after the d of hello world well let's take a guess we're trying to be neat and tidy make it look good that's the code for give me a carriage return give me a line feed and then we say end of the whole thing end execution and this is a marker also to initial orders you can stop relocating this program for me i'm done okay so that since it's on top now oh fingers crossed on what do we do we do start don't we we noticed that way back up at the top we put in a stop just to make sure because with our incredible knowledge of headstock binary sean and i can see straight away that that of course is hello world isn't it i mean i we're not kidding david wheeler would know that i said hello world i'll tell you something else sean after only half a day's familiarity with this john von neumann would know that that was hello world he found it so comfortable to remember the details of the binary you know i'm sure we would i really do so here we go then let's do a single ep single instruction uh single shot it's sometimes called nowadays right there we are it's still blinking we turned into letter shift with that click next click h oh isn't this wonderful aren't we demon programmers e l l o space yes w o r l d carriage return line feed so that was pretty painful although the t64k gives you relocatability you could change that to be t256k so if you wanted to shove the whole thing up memory and then maybe turn it into a subroutine you want to push it somewhere else in memory so the bulk relocation against the base address is taken care of by digital orders but you've still got to get the offsets right and it's painful it's utterly utterly painful we're now going to jump forward into safe bite addressed territory for handling characters and the arm 32-bit arm chip which we use for teaching assembler programming here to our first years um yeah it is a 32-bit word broken up into four bytes eight bit bytes which of course use ascii not ibm stick fine so down at the assembler level though for the arm then what does the bite addressability give us and what other things have happened between the edsak era and this era where we're talking late 80s 90s this sort of thing what else has happened to make this thing so much more compact so much easier to understand and so much more flexible well let's go in through step by step comments anything after a semiconductor comment i put a comment up at the top saying to put out the hello world we've used the so-called software interrupts the system calls as provided by the university of manchester's komodo arm development environment which is what we use so when we get to actually printing the character out don't get worried by swi i mean software interrupt to ask the operating system to print something for me or something like that so let's start up here programs on the arm will cheerfully expect if you don't tell them otherwise that they will start executing at line one of your program and go madly on i put this data for hello world up at the top of the listing not at the bottom as i could have done but the rule then is if i declare hello world here as being a piece of text and this def b here means just define a bunch of bytes and you put them in quotes like you would in c and even taking over some of its story from c it even allows you to ask for a new line to be put in there with massage n and the only difference is whereas c implicitly plugs its strings with a null character at the end arm doesn't do that for you you must explicitly put in a null character at the end of your string if that is your stop indicator but in order to stop the arm chip executing hello world as if it was bit pants for instructions which you don't want you want to jump past it i'll put in here look an unconditional branch domain branch domain oh now this is wonderful you don't have to say branch to an absolute address and be like david wheeler and john neumann and have them all in your head you just say let's label it maine and this thing called an assembler will work out what main means in terms of the address you want to jump to isn't that wonderful vol neumann stares at you and says that's for the weak brain who can't keep track of their addresses you know anyway so we branch to maine and the first thing it says very self-evidently really is get me the start address of the text string and put that start address into register one next thing we notice as long promised modern cpus have 15 or 16 special purpose registers to make life bearable exact didn't it only had the accumulator and if you wanted other storage places you had to start parking it in memory in all sorts of horrible ways so that helps us straight away r1 is going to be our so-called index register it's going to start off by pointing at the address of h now i don't know what the byte address of h is it might even be relatively zero here is the first thing that happens in this program but whatever it is the actual byte address of h is now in register one here is the crux of the whole thing ldr byte load into a register the byte specified as follows here i say r0 that's the register i want to load it into but where does it come from in square brackets r1 that says look in r1 and you will find an address of the start of that string i don't want you to load the address into r0 i want you to load the character that is at that address into r0 it's in direction and that is indicated by that square bracket not putting the address that's in r1 into r0 and following the pointer from r1 and say oh that's the letter h at the moment and that's what i put into our zero and here's the other cute thing at the end wouldn't those pioneers have given the world for this is to say and when you've done that please for next time around the loop increment that r1 address by one so if it was pointing at 18 shall we say to start with it's 19 now for next time around the loop so you keep on going around that loop and here's the thing where you check whether you've hit the null character compare the contents of register 0 which will be a character contents against literally zero which is what the null character is now is the answer yes or no is it equal or not equal to zero and here's another lovely thing about the uh arm chip that steve and i love dearly this is the 32-bit arm chip i think in the 64-bit one they've it's not so important to do it nowadays they have a thing in the 32-bit one called conditional execution which can save you often using a branch instruction which are relatively expensive in pipeline terms so here we've got swine which is wonderful software interrupt zero says punch out this character for me on the display on the screen but ne says but do that only if the last thing you did didn't yield equal not equal well we're checking for the null character so as long as it wasn't the null character it'll say no i'm not equal to the null character and you print it out and out it comes character by character after that of course you loop back to go around and print another character remembering that the hash one has incremented your address pointer along that string so you keep on going around here you don't have to remember what address loop is you don't know the assembler knows it fixes it up for you and then right at the very end the way to say stop execution i've done it swiping flavor 2 on this emulated environment says stop it completely the development of that from ed sack you think oh my god i am so pleased i've got that and martin the inventor of the ed sex administrator here i emailed him the other day and he came back to me and said yes the need for an index register was realized so quickly that that's why my emulator is early 49 to latest 1950 because in late 1950 david will and everybody say my golly we need an index register and they built one in so in a way then this is what is happening it's that the pioneers were using their early machinery to lead the way and to say what extra facilities do we need to make life tolerable for us now there is the hardware facility of having the index registers and they'd be just become standard kit after this every out of time inject registers but also what interests me is the role of a proper assembler initial orders 2 is not a full-blown assembler it helps you a little bit by turning decimal addresses into binary but you have to remember that that letter a that you put in the leading five bits could be the character a but if you're regarding this is an instruction that's an ad instruction so but then initial orders to is relocating it's relocating doing a bit of binary translation it's a single pass process it's wonderful the problem with assembler is it has to be a two pass process the trouble always is that labor if you jump back to labels you've already seen you will know already what address that will be because when you jump forward how do i know where the heck that label down there's going to be i don't even want to calculate it i want the assembly to say oh i'm on location 186 now how handy but then it can't fix up the addresses till it knows and has counted its way through the program so then it says right i will now output you a definitive thing that you can put in through david wheeler's initial orders too because i've made it so much easier for you because i've allowed labels one doesn't think of labels as being a structuring convention and yet at this low level they are in a way because this is saying loop it starts here another label oh it ends here please calculate the addresses of of what's happening there and fix it up for me and so you might say well all right didn't everybody say we must have assemblers it's the modern way to do things there were very mixed views about this and i don't think headset got an assembly until headset ii when another friend of mine david hartley did i think a macro assembler for exact two not ends that one because a story here related to von neumann as well i don't know whether it was his edvack or his version of edvac he had in his basement called joniac apparently who really berated a grad student who wrote an assembler assemblers for the weak brained who cannot work out their own addresses you do realize that in running this assembler of yours punching out a paper tape i'm behind you in the queue i don't get my turn next you come to me and say ah but this is ready to load now in the second phase binary you're wasting time if you're so weak brained you can't program me in absolute i'm putting words in his mouth but this was essentially it he no doubt had dreams in absolute binary there was no problem with john von neumann about coping as close to binary as possible he could keep it all in his head and he would i think have found initial orders on edsat about yes nice and helpful single pass not slowing down things enough but an assembler you're wasting time on this machine by doing assemblers i mean it's really really brings it home to those of us who always joked about you know real programmers use assembler the answer from certainly from john von neumann possibly from david willer but he wouldn't have been as extreme as that is real programmers use absolute binary as he talks about having the ultimate laptop and in fact when he means the ultimate laptop he's not talking about the limit this is the important thing there's the engineering limits and then there's the pure physical limits his ultimate laptop is a plasma at um just a at a stupidly high temperature\n"