The Year 2000 Bug: A Lesson in Preparedness and Assumptions
The year 2000 bug, also known as Y2K, was a widespread concern that led to significant efforts to address potential issues with computer systems and software. The problem arose from the way dates were represented and processed by computers, particularly those using 32-bit integers. In many cases, computers stored time as a number of seconds since January 1, 1970, which meant that if this number exceeded 2,147,483,647 (the maximum value for a 32-bit integer), it would wrap around to zero, indicating a date in the past.
This problem was first identified by a team at IBM, led by Bob Metzger, who realized that the year 2000 would not be followed by the expected sequence of years. They published their findings in a paper titled "Year 2000 Problem" in 1994, warning of potential problems with date-related calculations and data storage. The problem was further highlighted when it became apparent that some computer systems were set to test every ten days, which meant that they would detect a problem if the current date was greater than the next test date.
One notable example of this issue occurred in 1985, when a team at Microsoft realized that the company's programming practices could lead to problems with the year 2000. The team discussed their concerns with government officials and suggested that they should be aware of the potential issues and take steps to address them. This conversation led to increased awareness of the problem among governments and computer experts.
The problem was not limited to the year 2000, however. Similar issues have occurred in other systems over the years. For example, a BBC Micro emulator was used to test a program that counted the number of CPU cycles executed in a 32-bit integer. The program crashed after 18 minutes due to the limitations of the integer representation, demonstrating the potential for problems with date-related calculations.
Another classic example is Wind early versions of Windows NT4, which would crash after 49.7 days due to its counter counting milliseconds since the machine was powered on. In these cases, it became clear that software designers and programmers needed to be aware of potential issues with date representation and processing.
A similar problem has been identified in Unix systems, which store time as a number of seconds since January 1, 1970. If this number exceeds 2,147,483,647, it will wrap around to zero, potentially causing problems. However, most modern versions of Unix and Linux have moved to using 64-bit values to represent time internally, making this problem less likely to occur.
Despite the potential for these types of issues, they often arise from a combination of factors, including how data is represented in code, assumptions made about date-related calculations, and software running longer than expected. To mitigate these risks, it is essential to document and address potential problems proactively, such as using larger integers or more robust date processing algorithms. By taking these steps, we can minimize the likelihood of unexpected issues arising in the future.
The year 2000 bug serves as a reminder of the importance of preparedness and attention to detail when designing and developing software. It highlights the need for programmers and developers to consider potential problems with date representation and processing, even if they may seem unlikely or far-fetched. By taking these precautions, we can reduce the likelihood of unexpected issues and ensure that our systems remain stable and functional over time.
"WEBVTTKind: captionsLanguage: enso 20 years ago people were trying to celebrate the millennium a year early um and that's another video for nothing but that's an off by one issue right yeah that's enough by one error that was not the millennium book so there had been for about probably about 15 years beforehand people in the computer science trade realized there was a potential problem with the way computer software was written and they didn't know which bits of software this could affect they didn't know where this software was running whether it was controlling critical infrastructure or working out someone's pension results or whatever it could be doing or just sort of an arcade game down the street but there was a potential that there was a bug in this code some people might call it feature depending on how you look at it that could have been catastrophic i remember people saying oh cash machines are going to spew out loads of money and planes are going to fall out of the sky and all this sort of stuff but it never happened it was all a big ruse wasn't it uh no people saw there was a risk they then said well okay this could be a potential danger let's identify where it's going to hit let's fix the problems before it's a potential danger and so actually what happened on january the 1st 2000 is that there was a few minor issues um i think hsbc had problems with credit card transactions over that period and there might have been some issues with some wind sensors in airports and things but there was nothing major that happened and so most people think it was just a sort of ruse and sort of a fuss over nothing because people had gone and fixed the problem the heart of the problem with the millennium bird comes down to well i think there's three things you can look at it how did people represent data um what assumptions did they make about things having represented data that way and perhaps the other one that was surprising was the longevity of software so i think the classic one to think about this is if i write down a date i'm going to use day month year in these examples so sorry this is what i'm going to use so let's suppose we have a person let's call him tom whose birthday is that how old is he so 4th of november 19. so i'm saying 1919 is a hundred and maybe about a month old okay right okay what about someone let's call it joan who was born then how old are they 9th 11th so yeah maybe a little bit younger than him so this person is let's say around 100 years old and this person is about let's say about six weeks this is the root of the problem there was a in the millennium bug is that we write down years in an abbreviated form so rather than writing down in this case 1919 and in this case writing down 20 19 we just write down 4 11 19 as their date of birth and so we and as humans we can generally work these things out we can look at the records if we're looking at a list of people whose pensions we need to pay out we will assume that they were born and they're over 100 and so on now go back in time a bit to when people were writing computer software in the 60s 70s 80s and 90s and memory at those time was at a premium today we've got machines with 16 gigabytes of ram even something like the raspberry pi has up to four gigabytes these days embedded on it but you go back to the 60s when people started to write software they only had a few kilobytes and so one of the things they wanted to do was try and use that memory as wisely as possible now if you're writing software in 1965 so let's just pick a date when people are writing software so what people said is okay we don't need to waste time writing 1919 1963 1939 whatever we could save memory by just storing this as 19 63 39 and so on so they did when they wrote that software that was fine and they didn't expect this software to still be running 40 years later probably still running 60 years later now and so on so this was a fine optimization to make what people realize is well actually we're coming up to the point where we're going to go from 1999 which would store as 99 to 2000 which would store a zero zero and then we'd have 2001 which would store as 01 and so on now as we just saw for a human we've got context about what the data is we can make an educated guess so the interesting thing though is that is around this point here where we go from storing things as 99 to zero zero so up until this point the numbers are all increasing so 98 would have been less than 99 which should have been less than 2000 but people didn't know how the computers would react to suddenly going from a number which they assumed would be bigger when they wrote the software to one that was significantly lower i mean was this all in binary as well did it matter was it were these questions so i mean again this was something else you had to take into account software being written a lot of different systems you had binary coded decibel in use on ibm mainframes and things you had languages like cobol in place which perhaps people didn't know how to program people just didn't know at the time what the situation was until you actually go and do the research you look at the software systems that are in critical systems and you say well okay is this going to cause a problem and you need to work it out and if it is you go and write a fix which is what they did and if it isn't then you've got a problem suppose that we're writing software that assumes that we've got two digits that represent the year so we'd probably write code to print it out i'm going to use c as an example there's printf percent d slash percent d slash 19 percent d and then i print day month yeah and if that was just an integer variable then that would print out 19.99 fine but it would print out in two thousand as one nine one hundred which you would see instantly was a mistake but suppose i'd done the same sort of thing and i've said okay i'm gonna store just two dates and i'm only gonna use two bcd digits to store it two binary coded decimal so i've got 99 and i add five to that and that gives me zero four and i then have some software that says is this less than this 99 plus 5 should be greater than 99 but because we've wrapped around it's no longer true and so you get interesting issues happen there and the problem was people just didn't know which bits of software would happen what would happen if those things wrapped around i mean you think about it if you've got software controlling a nuclear reactor that is using sort of this sort of format to store the date and you say i want to test things every um 10 days and you say okay is current date greater than date of next test and you've wrapped around then you're going to be continuously testing something because the date is greater than what you've currently wrapped around to which may not be a good thing to do to a nuclear reactor i'm not saying they don't have any problems with nuclear reactors but this is the sort of thing that you needed to check and so people realized this was a problem i think there was a comment on it and used that in 1985 and people actually said to the governments that we need to look at this it could potentially be a big problem perhaps you wouldn't have had planes falling out the sky but as the horror movies like to portray it but it had the potential to cause significant problems i mean it's probably more likely you'd get problems that people's pensions were no longer being paid because it decided that they hadn't been born yet and things like that we talk about it in terms of the year 2000 burger or the millennium but actually you started to see these problems earlier i mean someone who was born in 1894 in 1994 would have been 100 um but could also have been as we saw there three months old and again you would have started to see problems showing there so there were signs that this could be a problem and again people started to take attempts to correct it and those attempts could be simple as saying well actually if it's a date after the current date then we know but then you get to the point where things and you have to put assumptions into the assumptions that we make as humans and so on and of course it's not a problem that was just limited to the year 2000 there's lots of other instances where it it has occurred or similar things might occur in the future i mean i've i introduced one myself i can remember texting a friend saying i've just reintroduced the year 2000 bug and so i was working on a bbc micro emulator and i was brought some software that counted the number of cpu cycles executed in a 32-bit integer and i was surprised that my program kept crashing after about 18 minutes until i worked out if you got 2 million cycles every second after 18 minutes you would have incremented that counter enough times to overflow assigned 32-bit integer so i then decided well the easiest way to fix this was to convert the value to a 64-bit integer because that would allow the program to run for several several million years or something another classic example was wind early versions of windows windows nt4 would crash after 49.7 days because they had a counter which counted the number of milliseconds since the machine was switched on and sort of it wrapped around and again you have similar problems happening and things so it's a problem that's shown in multiple places one that's potentially to come is in unix unix stores time as the number of seconds is the first of january 1970 and if as certain unix systems did originally you use a 32-bit value to store that number then on in 2038 that number will wrap around back to the beginning of 1970 again and so that could cause problems now fortunately people have seen this and most modern versions of unix and linux have moved using 64 bits to represent the time internally so it won't be a problem again it'll be sufficiently far in the future that i suspect we won't still be running computer systems running unix and if we are or we can convert it to 128 bit and recompile the software so the millennium burg or things very similar to that are seen in lots of different places um but it really did boil down to those three things it was how we represented data in the the programs had been written to save memory by just using two digits or to assume that it was started from the beginning of the 1900s um there was the fact that people then made assumptions about how those dates would work that a a date wouldn't always get bigger it had been monotonically increasing as we'd say and wouldn't suddenly wrap around but then it did and then there was the fact that software often ends up running for a lot longer than we expect and so perhaps when we're writing software we ought to think about the future problems and make sure we sort of code around those assumptions or at least document them and make say hang on this will work until 2038 or whenever it might be this row moves two so this goes to here this goes to here this goes round back to here and so on and this moves three right which is another way of saying it moves that way but you know so this one goes right back and pass the token on and i've got it so i can load the value in add the value for my register store it back and pass the token onso 20 years ago people were trying to celebrate the millennium a year early um and that's another video for nothing but that's an off by one issue right yeah that's enough by one error that was not the millennium book so there had been for about probably about 15 years beforehand people in the computer science trade realized there was a potential problem with the way computer software was written and they didn't know which bits of software this could affect they didn't know where this software was running whether it was controlling critical infrastructure or working out someone's pension results or whatever it could be doing or just sort of an arcade game down the street but there was a potential that there was a bug in this code some people might call it feature depending on how you look at it that could have been catastrophic i remember people saying oh cash machines are going to spew out loads of money and planes are going to fall out of the sky and all this sort of stuff but it never happened it was all a big ruse wasn't it uh no people saw there was a risk they then said well okay this could be a potential danger let's identify where it's going to hit let's fix the problems before it's a potential danger and so actually what happened on january the 1st 2000 is that there was a few minor issues um i think hsbc had problems with credit card transactions over that period and there might have been some issues with some wind sensors in airports and things but there was nothing major that happened and so most people think it was just a sort of ruse and sort of a fuss over nothing because people had gone and fixed the problem the heart of the problem with the millennium bird comes down to well i think there's three things you can look at it how did people represent data um what assumptions did they make about things having represented data that way and perhaps the other one that was surprising was the longevity of software so i think the classic one to think about this is if i write down a date i'm going to use day month year in these examples so sorry this is what i'm going to use so let's suppose we have a person let's call him tom whose birthday is that how old is he so 4th of november 19. so i'm saying 1919 is a hundred and maybe about a month old okay right okay what about someone let's call it joan who was born then how old are they 9th 11th so yeah maybe a little bit younger than him so this person is let's say around 100 years old and this person is about let's say about six weeks this is the root of the problem there was a in the millennium bug is that we write down years in an abbreviated form so rather than writing down in this case 1919 and in this case writing down 20 19 we just write down 4 11 19 as their date of birth and so we and as humans we can generally work these things out we can look at the records if we're looking at a list of people whose pensions we need to pay out we will assume that they were born and they're over 100 and so on now go back in time a bit to when people were writing computer software in the 60s 70s 80s and 90s and memory at those time was at a premium today we've got machines with 16 gigabytes of ram even something like the raspberry pi has up to four gigabytes these days embedded on it but you go back to the 60s when people started to write software they only had a few kilobytes and so one of the things they wanted to do was try and use that memory as wisely as possible now if you're writing software in 1965 so let's just pick a date when people are writing software so what people said is okay we don't need to waste time writing 1919 1963 1939 whatever we could save memory by just storing this as 19 63 39 and so on so they did when they wrote that software that was fine and they didn't expect this software to still be running 40 years later probably still running 60 years later now and so on so this was a fine optimization to make what people realize is well actually we're coming up to the point where we're going to go from 1999 which would store as 99 to 2000 which would store a zero zero and then we'd have 2001 which would store as 01 and so on now as we just saw for a human we've got context about what the data is we can make an educated guess so the interesting thing though is that is around this point here where we go from storing things as 99 to zero zero so up until this point the numbers are all increasing so 98 would have been less than 99 which should have been less than 2000 but people didn't know how the computers would react to suddenly going from a number which they assumed would be bigger when they wrote the software to one that was significantly lower i mean was this all in binary as well did it matter was it were these questions so i mean again this was something else you had to take into account software being written a lot of different systems you had binary coded decibel in use on ibm mainframes and things you had languages like cobol in place which perhaps people didn't know how to program people just didn't know at the time what the situation was until you actually go and do the research you look at the software systems that are in critical systems and you say well okay is this going to cause a problem and you need to work it out and if it is you go and write a fix which is what they did and if it isn't then you've got a problem suppose that we're writing software that assumes that we've got two digits that represent the year so we'd probably write code to print it out i'm going to use c as an example there's printf percent d slash percent d slash 19 percent d and then i print day month yeah and if that was just an integer variable then that would print out 19.99 fine but it would print out in two thousand as one nine one hundred which you would see instantly was a mistake but suppose i'd done the same sort of thing and i've said okay i'm gonna store just two dates and i'm only gonna use two bcd digits to store it two binary coded decimal so i've got 99 and i add five to that and that gives me zero four and i then have some software that says is this less than this 99 plus 5 should be greater than 99 but because we've wrapped around it's no longer true and so you get interesting issues happen there and the problem was people just didn't know which bits of software would happen what would happen if those things wrapped around i mean you think about it if you've got software controlling a nuclear reactor that is using sort of this sort of format to store the date and you say i want to test things every um 10 days and you say okay is current date greater than date of next test and you've wrapped around then you're going to be continuously testing something because the date is greater than what you've currently wrapped around to which may not be a good thing to do to a nuclear reactor i'm not saying they don't have any problems with nuclear reactors but this is the sort of thing that you needed to check and so people realized this was a problem i think there was a comment on it and used that in 1985 and people actually said to the governments that we need to look at this it could potentially be a big problem perhaps you wouldn't have had planes falling out the sky but as the horror movies like to portray it but it had the potential to cause significant problems i mean it's probably more likely you'd get problems that people's pensions were no longer being paid because it decided that they hadn't been born yet and things like that we talk about it in terms of the year 2000 burger or the millennium but actually you started to see these problems earlier i mean someone who was born in 1894 in 1994 would have been 100 um but could also have been as we saw there three months old and again you would have started to see problems showing there so there were signs that this could be a problem and again people started to take attempts to correct it and those attempts could be simple as saying well actually if it's a date after the current date then we know but then you get to the point where things and you have to put assumptions into the assumptions that we make as humans and so on and of course it's not a problem that was just limited to the year 2000 there's lots of other instances where it it has occurred or similar things might occur in the future i mean i've i introduced one myself i can remember texting a friend saying i've just reintroduced the year 2000 bug and so i was working on a bbc micro emulator and i was brought some software that counted the number of cpu cycles executed in a 32-bit integer and i was surprised that my program kept crashing after about 18 minutes until i worked out if you got 2 million cycles every second after 18 minutes you would have incremented that counter enough times to overflow assigned 32-bit integer so i then decided well the easiest way to fix this was to convert the value to a 64-bit integer because that would allow the program to run for several several million years or something another classic example was wind early versions of windows windows nt4 would crash after 49.7 days because they had a counter which counted the number of milliseconds since the machine was switched on and sort of it wrapped around and again you have similar problems happening and things so it's a problem that's shown in multiple places one that's potentially to come is in unix unix stores time as the number of seconds is the first of january 1970 and if as certain unix systems did originally you use a 32-bit value to store that number then on in 2038 that number will wrap around back to the beginning of 1970 again and so that could cause problems now fortunately people have seen this and most modern versions of unix and linux have moved using 64 bits to represent the time internally so it won't be a problem again it'll be sufficiently far in the future that i suspect we won't still be running computer systems running unix and if we are or we can convert it to 128 bit and recompile the software so the millennium burg or things very similar to that are seen in lots of different places um but it really did boil down to those three things it was how we represented data in the the programs had been written to save memory by just using two digits or to assume that it was started from the beginning of the 1900s um there was the fact that people then made assumptions about how those dates would work that a a date wouldn't always get bigger it had been monotonically increasing as we'd say and wouldn't suddenly wrap around but then it did and then there was the fact that software often ends up running for a lot longer than we expect and so perhaps when we're writing software we ought to think about the future problems and make sure we sort of code around those assumptions or at least document them and make say hang on this will work until 2038 or whenever it might be this row moves two so this goes to here this goes to here this goes round back to here and so on and this moves three right which is another way of saying it moves that way but you know so this one goes right back and pass the token on and i've got it so i can load the value in add the value for my register store it back and pass the token on\n"