Legacy Code Conversion - Computerphile

**The Challenges of Legacy Code Translation**

As we continue to develop and maintain complex software systems, we often encounter legacy code that was written in outdated programming languages such as COBOL, Visual Basic, and others. Translating this code into modern languages can be a daunting task, requiring significant resources and expertise.

**Understanding Semantics is Key**

One of the biggest challenges in translating legacy code is understanding the semantics of the original language. This involves not only reading the manual but also having a deep understanding of the language's syntax, semantics, and any nuances that may have been lost or overlooked. "At least we have to know we have to read the manual," says [name], highlighting the importance of documentation in legacy code translation.

**The Complexity of Legacy Code**

Legacy code is often complex and nuanced, making it difficult to translate accurately. Many systems were originally written for older hardware, which can pose a challenge when trying to adapt them to modern platforms. "There's clearly was originally written for old Hardware what what's the problem there then," asks [name], pointing out that abstracting data structures such as files into more general concepts can help mitigate this issue.

**Business Barriers to Change**

Despite advances in technology, many organizations remain hesitant to adopt new code, preferring to stick with their existing legacy systems. "That's probably yeah that's probably the biggest barrier to re-engineering," says [name], noting that companies often don't trust the new code and can't guarantee its accuracy or reliability.

**Mathematical Proof Can Provide Assurance**

In certain cases, mathematical proof can provide assurance that old and new code will perform the same function. This is particularly important in numerical processing applications where precision and accuracy are crucial. "You know with certain you know absence of exception certain conditions like absence of exceptions and overflow," explains [name], highlighting the importance of careful analysis and testing.

**The Power of Modern Languages**

Modern languages such as Java, C#, and Python offer significant advantages over legacy languages in terms of power, flexibility, and maintainability. "The new language is a much more powerful um which means that the new programs tend to be shorter and cl," notes [name], highlighting the benefits of adopting modern languages for new development.

**Model-Driven Engineering**

To avoid the challenges associated with legacy code translation, some organizations have adopted model-driven engineering approaches. This involves writing specifications from scratch, using tools such as UML and OCL, which can provide a solid foundation for new code development. "This was the um this was a sort of philosophy of model driven engineering," explains [name], noting that this approach has been successful in certain sectors, such as avionics and automobile manufacturing.

**Abstraction is Key**

One key strategy for working with legacy code is abstraction. By abstracting data structures and other components into more general concepts, developers can create a clean separation between old and new code. "You can call them go-to's if you like you can call them jumps you can call them branches all essentially resolve into being the same thing," notes [name], highlighting the benefits of abstraction in legacy code translation.

**Conclusion**

Legacy code translation remains a significant challenge in modern software development, requiring careful consideration of semantics, business needs, and technical limitations. By understanding the complexities involved and adopting strategies such as model-driven engineering and abstraction, developers can create effective solutions for working with legacy code.

"WEBVTTKind: captionsLanguage: enso here on on the left you've got an example of a an actual cobal program and this is the kind of structure in which you might see in Legacy cobal with lots of go-to's and it would take even a cobal expert at some time to figure out what that's doing the automatic abstraction is this it has the same functions as the previous code as the original version but now the functionality is made explicit as in a normal procedural code which any modern day program they would understand that in turn can be translated to Java which also retains the same procedure names as the original code but now this is Java and again it's using um a normal kind of program structure rather than go-to's go-to's have been eliminated by this transformation considered harmful that's why well yes yeah but they're still there's still everywhere in Kobo and Visual Basic programs in the last couple of years anyway my work has been on the recovery of design from Legacy code and Legacy code covers business applications and things that are software that's usually quite business critical which exists in very old languages Legacy languages for example Kobo which dates from the late 1950s or Visual Basic again from the early 60s mid 60s and businesses corporations have quite key parts of their their business locked in in this software and some of it is almost unman attainable it's been changed many many times over the years so that no person really understands what it's doing or how it's doing what it's doing um there have been major problems in recent years whereas for example the covered Financial measures in the United States were delayed because it depended on some cobal software for which there was simply no available people to do the the maintenance and to change the code um banks have spent millions of dollars tens of millions of dollars in modernizing their code or attempting to modernize their code so the the work that I'm doing uh attempts to facilitate the modernization attempts to facilitate the migration of Legacy software by first of all taking the Legacy software which might exist in varied amounts of storage and various formats taking that Legacy code abstracting it to a specification in the international standard languages heal mail and ocl you get a precise description of what the functionality is and also a diagrammatic specification which can be used for various other purposes to you know explain what's going on in the code and then to produce a modernized version by using forward engineering tools to generate code in New languages for example Java python Swift c-sharp and so on so that now the business the corporations they have a better platform for their system for the future um now the original thing about what we do is that we can provide some guarantees of functionality preservation because there's no point translating a language from one translator program from one language to another language unless you can have some assurance that the functionality is the same um and so by using a a rich semantic modeling and Rich semantic extraction from the source code to the abstraction we can provide some guarantees and we've done some case studies with a finance company where we re-engineered their old VB6 and we managed to translate that to to python with no change in functionality this was doing Bond pricing and Analysis um there's 2000 lines of code why would you need to do anything to them because surely they're written they run that's that isn't it well but the environment changes things change and one problem with COBOL is that it has very rigid data formats um it doesn't have integers strings doubles and whatever you know modern programming language types it has byte formats so hard-coded in the program is for example Paul a customer ID is four digits But as time goes on you have more than 9900 and whatever the customers you're going to need to extend that customer number your whole program may need to change because of this they're very inflexible data format um I mean that's a very minor change but but in general um people want to add new functionalities they want to adjust functionalities and the whole world is evolving the whole environment of programs is evolving business environment is evolving more and more rapidly and so inevitably there comes a point where you need to change the program you write the spec as to what the original program does you then write a program to conform to that spec or is it automatic what's what well the the idea is that it's automatic because we're talking about you know tens of thousands of lines of code and it's too much effort for a human to do the source code is passed using a tool called antler which is a provides parsers for lots of languages in particular for Cobalt 85 and Visual Basic six perhaps the unique feature of our work is a language called csdl which takes those pastries and generates text it can generate text in any language but we generate text in umlcl for the abstraction that's an automated process the human what I do is write that cstl so I write the translation rules or abstraction rules from the The Source pastry into text in umlucl so we you know we take a module definition in Visual Basic and we turn that into a class do you need to be able to understand the original languages or is it yes yes of course the because the semantic um because it's meant to be a semantic representation so we want to you know abstraction add statement or move statement and kobold to an assignment in in ocl we need to know all the the great variety of AD statements that cobal provides what they actually mean or what they're supposed to mean of course in the standard um so we need to understand the semantics we don't necessarily need to be an expert in the language I I'm not an expert in running Global programs that was before my my time uh but um at least we have to know we have to read the manual we have to understand the semantics is this something that will get to a point where all the old programs have been translated and we don't need to worry about anything else or is there something going forward that will eventually be having to translate python into something new Etc well well I mean the the a lot of work now is translating python 2 into python 3. um and there was a there was there was a case of a major Bank um in in America I think they spent five years translating their um their transaction processing system from python 2 to Python 3. and unfortunately that that's not an isolated case there's there's many other cases like this so um I don't see this problem is going to go away anytime soon is this something that just works or is it ongoing research can you tell me when it's it's ongoing because um these languages are are quite large languages cobal Visual Basic um so we've only done part of these languages um we've done perhaps the the most important core parts of those languages but there is a great deal more that could be done is there a hardware issue here as well I mean obviously lots of old software is clearly was originally written for old Hardware what what's the problem there then yes that is a that is an issue but what I've tended to do is abstract um the the data such as files into a quite General um file concept of essentially text-based files or or binary files which that could then be used with any modern programming environment um that I think is probably the the the most appropriate way rather than trying to model in in great detail an old disk system or something like that and is there a question of businesses trusting the new stuff or wanting to rely on the old stuff well that's probably yeah that's probably the biggest barrier to re-engineering um that people have attempted re-engineering for for decades now it's start the ideas started even in the 70s but the biggest barrier is that companies don't trust the new code and they can't have any assurance that the old code and the new code do the same thing so we can using mathematical proof give some Assurance at least for certain kinds of program for example those doing numerical processing um and uh you know with certain you know absence of exception certain conditions like absence of exceptions and overflow and so on if you're doing that kind of computation in in Visual Basic it's going to end up the same kind of thing same processing essentially as in python or Java the new language is a much more powerful um which means that the new programs tend to be shorter and cl and they're closer to the abstraction so the the the the step from the old code to the abstraction is the big step once you get to the abstraction then going to a language like Java or c-sharp or python that's a smaller step because uml and ocl a relatively recent things they came about in 2000 and they're oriented towards modern object-orientated languages so the the forward engineering step is not a is not a it's not a very difficult step and of course it's been done by a model model driven engineering tools for for a long time how do people avoid this happening in the first place what do you bet the farm on I guess well in modern engineering community we say you should maintain specifications and you should write your specification and and when you need a new bit of code then you generate the code from the specification this was the um this was a sort of philosophy of model driven engineering um now that hasn't happened at least for General software in some sectors like avionics and automobile sectors where they have to be concerned with um you know loss of life and so on then that they they have adopted model driven engineering approaches and a lot of their systems exist in the form of models and what this abstraction process gives you is that kind of specification uh it may not be the kind of specification that you you would have written from the beginning of course because it's still got the the structure and the the organization of the Legacy code in there um but at least you can get some abstractions you can for example abstract record structures as classes from your cobal so that then if you did want to retain that specification and work from that specification you could do you can call them go-to's if you like you can call them jumps you can call them branches all essentially resolve into being the same thing the and only this one it's still a panda to cover those two it's no longer a pandaso here on on the left you've got an example of a an actual cobal program and this is the kind of structure in which you might see in Legacy cobal with lots of go-to's and it would take even a cobal expert at some time to figure out what that's doing the automatic abstraction is this it has the same functions as the previous code as the original version but now the functionality is made explicit as in a normal procedural code which any modern day program they would understand that in turn can be translated to Java which also retains the same procedure names as the original code but now this is Java and again it's using um a normal kind of program structure rather than go-to's go-to's have been eliminated by this transformation considered harmful that's why well yes yeah but they're still there's still everywhere in Kobo and Visual Basic programs in the last couple of years anyway my work has been on the recovery of design from Legacy code and Legacy code covers business applications and things that are software that's usually quite business critical which exists in very old languages Legacy languages for example Kobo which dates from the late 1950s or Visual Basic again from the early 60s mid 60s and businesses corporations have quite key parts of their their business locked in in this software and some of it is almost unman attainable it's been changed many many times over the years so that no person really understands what it's doing or how it's doing what it's doing um there have been major problems in recent years whereas for example the covered Financial measures in the United States were delayed because it depended on some cobal software for which there was simply no available people to do the the maintenance and to change the code um banks have spent millions of dollars tens of millions of dollars in modernizing their code or attempting to modernize their code so the the work that I'm doing uh attempts to facilitate the modernization attempts to facilitate the migration of Legacy software by first of all taking the Legacy software which might exist in varied amounts of storage and various formats taking that Legacy code abstracting it to a specification in the international standard languages heal mail and ocl you get a precise description of what the functionality is and also a diagrammatic specification which can be used for various other purposes to you know explain what's going on in the code and then to produce a modernized version by using forward engineering tools to generate code in New languages for example Java python Swift c-sharp and so on so that now the business the corporations they have a better platform for their system for the future um now the original thing about what we do is that we can provide some guarantees of functionality preservation because there's no point translating a language from one translator program from one language to another language unless you can have some assurance that the functionality is the same um and so by using a a rich semantic modeling and Rich semantic extraction from the source code to the abstraction we can provide some guarantees and we've done some case studies with a finance company where we re-engineered their old VB6 and we managed to translate that to to python with no change in functionality this was doing Bond pricing and Analysis um there's 2000 lines of code why would you need to do anything to them because surely they're written they run that's that isn't it well but the environment changes things change and one problem with COBOL is that it has very rigid data formats um it doesn't have integers strings doubles and whatever you know modern programming language types it has byte formats so hard-coded in the program is for example Paul a customer ID is four digits But as time goes on you have more than 9900 and whatever the customers you're going to need to extend that customer number your whole program may need to change because of this they're very inflexible data format um I mean that's a very minor change but but in general um people want to add new functionalities they want to adjust functionalities and the whole world is evolving the whole environment of programs is evolving business environment is evolving more and more rapidly and so inevitably there comes a point where you need to change the program you write the spec as to what the original program does you then write a program to conform to that spec or is it automatic what's what well the the idea is that it's automatic because we're talking about you know tens of thousands of lines of code and it's too much effort for a human to do the source code is passed using a tool called antler which is a provides parsers for lots of languages in particular for Cobalt 85 and Visual Basic six perhaps the unique feature of our work is a language called csdl which takes those pastries and generates text it can generate text in any language but we generate text in umlcl for the abstraction that's an automated process the human what I do is write that cstl so I write the translation rules or abstraction rules from the The Source pastry into text in umlucl so we you know we take a module definition in Visual Basic and we turn that into a class do you need to be able to understand the original languages or is it yes yes of course the because the semantic um because it's meant to be a semantic representation so we want to you know abstraction add statement or move statement and kobold to an assignment in in ocl we need to know all the the great variety of AD statements that cobal provides what they actually mean or what they're supposed to mean of course in the standard um so we need to understand the semantics we don't necessarily need to be an expert in the language I I'm not an expert in running Global programs that was before my my time uh but um at least we have to know we have to read the manual we have to understand the semantics is this something that will get to a point where all the old programs have been translated and we don't need to worry about anything else or is there something going forward that will eventually be having to translate python into something new Etc well well I mean the the a lot of work now is translating python 2 into python 3. um and there was a there was there was a case of a major Bank um in in America I think they spent five years translating their um their transaction processing system from python 2 to Python 3. and unfortunately that that's not an isolated case there's there's many other cases like this so um I don't see this problem is going to go away anytime soon is this something that just works or is it ongoing research can you tell me when it's it's ongoing because um these languages are are quite large languages cobal Visual Basic um so we've only done part of these languages um we've done perhaps the the most important core parts of those languages but there is a great deal more that could be done is there a hardware issue here as well I mean obviously lots of old software is clearly was originally written for old Hardware what what's the problem there then yes that is a that is an issue but what I've tended to do is abstract um the the data such as files into a quite General um file concept of essentially text-based files or or binary files which that could then be used with any modern programming environment um that I think is probably the the the most appropriate way rather than trying to model in in great detail an old disk system or something like that and is there a question of businesses trusting the new stuff or wanting to rely on the old stuff well that's probably yeah that's probably the biggest barrier to re-engineering um that people have attempted re-engineering for for decades now it's start the ideas started even in the 70s but the biggest barrier is that companies don't trust the new code and they can't have any assurance that the old code and the new code do the same thing so we can using mathematical proof give some Assurance at least for certain kinds of program for example those doing numerical processing um and uh you know with certain you know absence of exception certain conditions like absence of exceptions and overflow and so on if you're doing that kind of computation in in Visual Basic it's going to end up the same kind of thing same processing essentially as in python or Java the new language is a much more powerful um which means that the new programs tend to be shorter and cl and they're closer to the abstraction so the the the the step from the old code to the abstraction is the big step once you get to the abstraction then going to a language like Java or c-sharp or python that's a smaller step because uml and ocl a relatively recent things they came about in 2000 and they're oriented towards modern object-orientated languages so the the forward engineering step is not a is not a it's not a very difficult step and of course it's been done by a model model driven engineering tools for for a long time how do people avoid this happening in the first place what do you bet the farm on I guess well in modern engineering community we say you should maintain specifications and you should write your specification and and when you need a new bit of code then you generate the code from the specification this was the um this was a sort of philosophy of model driven engineering um now that hasn't happened at least for General software in some sectors like avionics and automobile sectors where they have to be concerned with um you know loss of life and so on then that they they have adopted model driven engineering approaches and a lot of their systems exist in the form of models and what this abstraction process gives you is that kind of specification uh it may not be the kind of specification that you you would have written from the beginning of course because it's still got the the structure and the the organization of the Legacy code in there um but at least you can get some abstractions you can for example abstract record structures as classes from your cobal so that then if you did want to retain that specification and work from that specification you could do you can call them go-to's if you like you can call them jumps you can call them branches all essentially resolve into being the same thing the and only this one it's still a panda to cover those two it's no longer a panda\n"