#200 50 Years of SQL _ Don Chamberlin Computer Scientist and Co-Inventor of SQL

The World of Databases: Exciting New Directions and Emerging Trends

As I reflect on my career, I am reminded that being in the right place at the right time was crucial to my success. The database revolution has been unfolding rapidly over the last half century, and I feel privileged to have taken part in it. SQL did not cause this revolution; rather, it was a natural consequence of economic forces and advances in hardware.

The advent of faster and cheaper hardware made three things possible: a clean and elegant data model like COD's relational model, a high-level non-procedural language like SQL, and an optimizing compiler that brought these elements together. These three items support each other, much like a three-legged stool, and have made today's database systems possible.

One exciting new direction in database research is the NoSQL movement. This movement is inspired by web applications that require massively scalable databases. NoSQL systems often relax one or more of the constraints of traditional relational databases, such as rigid schemas or transactional guarantees. For example, relational databases typically have homogeneous tables with rows having the same attributes, whereas NoSQL systems may allow for partial schemas, nested tables, or document formats like XML or JSON.

NoSQL systems also offer varying levels of consistency and availability, often tolerating "eventual consistency" rather than strict ACID compliance. This allows for greater flexibility and scalability in modern web applications. The development of schema-less databases has opened up new possibilities for data modeling and storage, enabling developers to create more agile and responsive systems.

A notable example of this trend is SQL Plus+, a backward-compatible extension of SQL that originated at the University of San Diego. Developed by Professor Yannis Papakonstantinou, SQL Plus+ has been implemented in open-source form from the Asterisk Data project at UC Irvine. This language operates on JSON collections, which can be viewed as nested tables, allowing for seamless compatibility with earlier versions of SQL.

The emergence of SQL Plus+ is a significant development in the world of databases, offering a new and exciting direction for developers to explore. As the database landscape continues to evolve, it's likely that we'll see even more innovative solutions emerge, addressing the growing demands for scalability, flexibility, and performance in modern data systems.

In conclusion, my career has been marked by being in the right place at the right time, and I feel fortunate to have played a part in shaping the database revolution. The future of databases holds much promise, with NoSQL systems, schema-less designs, and high-level languages like SQL Plus+ poised to drive innovation and change in the industry. As we look to the future, it's essential to recognize the importance of collaboration, creativity, and perseverance in shaping the next generation of database technologies.

"WEBVTTKind: captionsLanguage: enthe database revolution has just been unfolding rapidly over the last half century and I was really privileged to take a part in it SQL didn't cause this revolution it was caused by economics hi Don welcome to the show hi Richie thank you pleasure to be here I'd love to start by just finding out how you first became interested in databases well I'll start from the beginning in in 1970 I was finishing my Graduate Studies at Stanford and I took my first professional job with IBM at the Watson Research Center in New Yorktown Heights New York uh I moved from California to New York in the winter which was not a move I'd recommend if you enjoy warm weather uh a few months later my friend Bray Boyce also completed his graduate work at Purdue and he joined IBM at the same location where I was well Yorktown was the uh Central research facility of IBM and the mission of IBM research is to study technologies that might influence IBM's future products and in 1970 uh there was kind of Revolution going on the cost of of computing was coming down very quickly and lots of companies were putting data online for the first time and this seemed like a business opportunity so uh the group that Ray and I were in was assigned to study the state-of-the-art in database management with an eye to influencing IBM's future products okay and what was this uh first database that you got interested in well we studied something called the dbtg report so let me tell you where that came from and why we were interested in it uh in the early 1970s the most respected person in the databased industry was a guy named Charles Bachman of General Electric known to his friends as Charlie and Charlie had actually invented the concept of a database management system he was the first one to call for a separate software layer to manage data that was shared by multiple applications and this was a pretty important concept and for inventing the database management system basically uh Charlie received the ACM touring award which is the most prestigious award in computer science well Charlie had actually built a database management system that was called IDs stood for integrated data store and IDs stored information in the form of records and connection connections between records you could think of them as pointers uh in in in IDs a program could navigate through what Charlie called Data space moving from one record to another by following these pointers to find an answer to a question and in fact when Charlie gave his touring award lecture he titled it the programmer as Navigator well one of the most popular business programming langu languages at the time was Cobalt and there was a movement to add database management functions to Cobalt and a committee was formed for this purpose called the database task group was abbreviated dbtg Charlie was a member of dbtg and the group published a report in 1971 uh defining a set of commands for navigating in data space based on Charlie's ideas well the dbtg report was was pretty important at the time Ray and I spent some time studying it it was wonderfully complicated it had currency indicators set of current selection rules it had a fine command with seven different versions it could do a pretty good job of answering questions that were anticipated in the database design uh but for unanticipated questions sometimes you were out of luck uh Ray and I wrote a review of the dbtg report and we suggested some incremental improvements uh we thought if we could manage if we could manage to understand something as complicated as dbtg our careers would be off to a good start that's funny um I love the analogy there of uh working with data as being navigating I think it's a phrase is not often used but you do spend so much time trying to work out how different bits of data are connected together and um it sounds like this idea of data being connected is leading towards the idea of relationships between data and relational databases so can you talk me through how relation databases came about and how you got interested in them well sure all this came about because of a paper written by Ted C Ted was a scientist at IBM's research laboratory in San Jose California and in June of 1970 he published what became a very famous paper called a relational model of data for large shared data banks the basic point of Ted's paper was that Charlie bman had gotten it all wrong and that navigating through data space was a bad idea uh Ted thought that database queries should not look like programs that tell the computer what to do he wanted to express queries in a highlevel nonprocedural language he liked to say tell me what you want not how to find it Well I read Ted's paper as part of my learning process in getting up to speed on the state-ofthe-art and databases and on first reading I wasn't much impressed with this paper uh Ted was basically a mathematician and his paper contained a lot of mathematical Dragon it defined a relation as a subset of the cartisian product of a set of domains and it introduced Concepts like data Independence and normalization and operators like permutation and projection and join and uh my impression of all this was that card's paper was interesting from a theoretical point of view but I couldn't see that it was really grounded in Practical engineering but I kept on hearing more about this relational data model I heard about a symposium that was going to be held in Miami Beach in December of 1972 and it was going to feature a tutorial on relational database well traveling to Miami in the winter had a certain appeal uh so I got permission to attend this um uh the Symposium was called the coins 72 Symposium conference on Information Systems and I actually met Ted card for the first time on the beach at Fountain Blue Hotel uh I attended the tutorial which was taught by Chris D and I have to describe it as a conversion experience for the first time I began to understand the Simplicity and power and elegance of Ted's relational approach uh queries that took a whole page in of code in dbtg uh could often be expressed in a single line in a relational approach uh so when I returned to New York uh I wasn't interested in dbtg anymore uh I was piing up a new interest in relational query languages that's fascinating and I love that even though it was a hugely influential paper your first reaction to Ted work with like oh it's just mathematical nonsense all theoretical no practical applications but once you sort of see it's in action it is actually incredibly powerful so I love how um it just translates into something uh more practical and more real um I think the first sort of real implementation of this was the systemar project that you worked on is that right well there were actually several things going on uh more or less at the same time in different places in IBM and also at other different companies uh there was the Ingress uh project for example at uh at Berkeley but uh I'm going to talk mainly about system R because that was the project that I was associated with uh there were a lot of people who saw the power and simplicity of cod's approach uh but the whole idea depended on a high level of query language with an optimizing compiler that could turn it into efficient code and the question was that sounds like a good idea but was it just science fiction or was it really ready for prime time and in 1973 IBM decided to answer this question by building an industrial strength relational system just to prove it could be done and this was done at IBM research they created a project for this purpose and called it system R well sstr was located in San Jose because that's where Ted Cod was and there were about 14 people including Ry and myself who were gathered from several IBM sites all over the country uh to come together and work on this system R project well I wasn't very happy about moving to New York at that time I'm sorry about moving from New York to San Jose at that time I had just bought a house and my wife had a good job teaching High School uh I felt kind of disrupted but I made the move and uh went to San Jose to join the system R project and that turned out to be the best decision I ever made um working on IBM's first relational database system uh really turned out to be the opportunity of life um it's interesting how these cross country moves have been problematic each time but yeah stford I've been moved that's the Legend um and then from here from system art it seems like this is leaning towards uh the really important paper that you wrote around uh the squel programming language so where did that idea come from well to tell you the truth Ray and I liked some parts of card's ideas better than others we really like this idea of a nonprocedural query with the SL and tell me what you want and not how to find it what we didn't like was the mathematical jargon in Ted's papers we wanted to design a language for a new class of user uh we called them casual users uh we thought a casual user is a professional who needs access to data but he doesn't want to be a computer programmer and he doesn't even want to rely on a computer program he might be a urban planner or a financial analyst or a insurance company executive and he might have questions that vary from day to day and he might want his results pretty quickly well the database systems of the 1970s just didn't meet these requirements so uh to serve this casual user Ry and I wanted to design a new language and we set certain goals for it number one we wanted to use the term tables instead of relations everybody knows what a table is number two we wanted to base the language on ordinary English words like select and goal number three uh the language should have no special symbols and it should be easy to type on a keyboard and goal number four which is maybe the most challenging one we wanted it to have something that we called the walk up and read property meaning in simple cases a user with no special training should be able to understand a query just by reading it well those were the goals that we set for for ourselves uh and we we called this uh new language sequel which was a acronym for a Structured English query language it's amazing how the things you were worrying about back then 50 years ago are are things that were still worrying about now so for example there's a big push at the moment to make data more accessible to everyone regardless of whether you have a technical background or not and I find it fascinating that this is something you worried about when you were first designing the SQL langage that people who didn't have this um strong mathematical background could still make use of the technology so you mentioned the idea of walk up and read so uh people just walk up look at the code and it makes sense to them it sounds like a difficult thing to measure so how do you know if you've been successful at that that's a good question it's a hard thing to do and it's a hard thing to measure and uh I'll never know really how successful we were but uh we had a psychologist on the staff uh named Phyllis reer and Phyllis conducted an experiment at San Jose State University uh teaching SQL to college students who' had no programming experience at all and recording their progress and the kinds of errors that that they made uh turned out that these college students could become proficient in squel after a few hours of instruction uh their most common error was something funny they would forget to put quotes around strings uh so for example if a query contained the phrase name equals Fred you had to put quotes around Fred to indicate that it's a constant string somebody's name rather than the name of a column uh well that's an important distinction but a lot of students never understood it and uh intended not to put quotes around anything I can confirm that that is still a problem in every programming language 50 years later forgetting to quote your strings and putting bits of syntax in the wrong place um so after um you've sort of designed this language I think it was initially used uh just within uh IBM how did it um travel outside this um outside that organization well R and I published the first uh squl paper at a conference called Sig fep in an Arbor Michigan in June of 1974 uh siget has since changed its name to sigmod the special interest group on management of data and it's now uh probably the most prestigious annual database conference uh this conference in 1974 was very interesting because it featured a panel discussion between Ted cut and Charlie Baka uh now this was called a panel discussion but everybody knew it was a debate and in my view uh Ted Cod was the uh was the winner of this debate I think after this conference in 1974 uh Ted's relational approach was considered to be the new mainstream in database management so that's why I consider that this year 1974 starts the clock on what I've called 50 years of relational databases and since the first SQL paper appeared in this conference uh it also kind of starts the clock on 50 years of SQL I have a question for you on this because the in this paper on SQL it's spelled SE qu El now been shortened even today there's a lot of confusion about do I call it SQL do I call it SQL i' really love to have an official answer on this uh which do you prefer well at some point uh after publishing our paper we got a uh a letter from somebody's lawyer that said we couldn't use the name SQL anymore it was somebody's registered trademark so we had to officially shorten the name to SQL which stood for structured query language uh so the official name of the language is now SQL but SQL is a lot easier to say than SQL so I usually just pronounce pronounce the name SQL and uh and hope I won't get in any trouble for doing that all right we have an official answer there that's pretty I like that both ways are possible one's better for writing and one's better for speaking um so uh when I interrupt you you uh talking about the SQL paper can you tell me what happened uh once the paper was um published actually the the next thing that happened was was a tragedy um that sequel paper was the last thing that Ray Boyce and I did together uh less than a month after the Sig for de conference uh my friend Ray died suddenly and unexpectedly of brain aneurism it's a very sad event and even 50 years later it feels like a real tragedy um so I'm sure it must have been a shock to you um could you maybe tell me a bit about what it was like working with Ry yeah Ry was my best friend we moved from New York to California together uh we lived near each other we carpooled to work uh I drove R to work at IBM on the day he had his aneurysm attack and he was taken away in an ambulance Ray and I used to play something we called the query game we were experimenting with different query language designs we take turns dreaming up queries and challenging each other to express them we explored a lot of ideas in those days and at the end of the day we couldn't remember which one of us was responsible for any given idea uh collaborating with Ray was the best part of my job that's wonderful that you have fond memories of working with him and again yeah I'm sorry I'm sorry that such a tragedy happened to and to your friend I'm wondering um so after You' had this sort of revolutionary paper um things are starting to get popular or who were the first people that took on this idea who was using SQL to begin with well you have to remember that systemar was a research prototype it was not an IBM product uh so you couldn't just go somewhere and buy it um as a research group uh we wanted to gain some visibility inside IBM and to do that we needed to have some users so we distributed systemar to about a dozen internal IBM sites and also on a joint study basis to three uh Frontline IBM customers that was Boeing and Bratt Whitney and upjn and we had quarterly meetings with all of our uh users to learn about their experiences and respond to their suggestions uh it was during this period uh that we uh had to shorten the uh the name to uh to SQL okay so it a a lawsuit that I intervene there there Dar lawers um oh ah okay um all right so uh to begin with it was a research project and then when was SQL first commercialized well since SQL was invented at IBM research you might expect that IBM would be the first to bring it to Market but that's not actually the way it turned out interestingly enough uh IBM in those days had another database product called IMS and they weren't in a hurry to introduce a competitor to their successful product uh but they did allow the systemar group to publish their results in the open technical literature uh that was generous and that's how we published the SQL paper and lots of other papers about the details of the system our work well there was a small startup company uh uh called relational software Incorporated abbreviated RSI that took an interest in these papers uh the founders of RSI guessed correctly that IBM would eventually release a SQL product on mainframe computers and they saw an opportunity there uh they decided to build a product that was compatible with SQL on less expensive Hardware platforms and to bring it to the market quickly and they executed this plan very successfully fact in 1979 they released a SQL product called Oracle uh running on a mini computer a pdp1 uh and this product was immediately successful so much so that RSI changed its name to the Oracle company uh and Oracle was actually the first commercial implementation of sequel that's fascinating because I I've not heard of RSI but obvious Oracle is a huge brand name so uh I realized about the name change uh IBM uh itself didn't release a SQL product until 1981 uh on some of its smaller computers that was two years after Oracle and their strategic Mainframe product called db2 uh came out in 1983 that was four years after Oracle and by this time well Oracle had pretty much established a commanding lead in the uh database Market okay uh so it seems like that was the the main competition then was between Oracle and IBM in the the early days were there any other players yeah there were um I've been talking about systemar a research project to prove the concept of a commercial relational system but there was another project very much like that also going on at the same time at UC Berkeley uh their project was called Ingress and it was led by two Prof professors Mike Stonebreaker and Jean W well Ingress had its own highlevel quy language called quell and much like system R Ingress was distributed for free to experimental users and which were mainly universities and it became widely used as a teaching tool at universities and Ingress uh spun off a commercial company also called Ingress in 1980 uh and in the early 1980s Ingress and Oracle were the market leaders in relational databases they kind of ran neck and neck they both ran on Deck V computers uh and uh Ingress implemented the Quil language an oracle implemented Sequel and the Quil language was well liked by its users uh but I think I'd give the edge to the Oracle marketing divisions they marketed the their SQL product very aggressively and in 1984 uh Ingress decided that they had to begin supporting SQL uh in order to compete with Oracle okay so I didn't realize that there were all these sort of alternate um languages then for accessing relational databases but it seems um within a few years things have become standardized because you had uh uh the adventive uh standard for the SQL language can you tell me how that came about and what your involvement was in yeah that I think that's an interesting story uh the American national standards Institute ansy uh created a a committee in the late 1970s to define a standard database language they kept changing the name of this committee but usually had H2 in its name somewhere so I'm going to call it the H2 committee well at first this standard was supposed to be based on dbtg but in 1982 they decided to extend the mission to define a relational standard also they wind up with two different standards one based on dbtg and also a relational one and when they got into the relational business uh the uh two companies that were in the marketplace for relational systems were Oracle and Ingress and they were both marketing SQL and so the H2 committee decided that they would base their relational standard on some version of SQL they went ahead and created a standard which became an ansy standard and also an international standard with ISO uh these were named database language SQL SQL and and they were released in 1986 so that was going on uh in NC and ISO which were uh voluntary associations of commercial entities companies but actually the standards work that had the most impact in my opinion was something that was going on somewhere else it was at the National Institute of Standards and technology nist sometimes pronounced n and unlike NC uh nist is actually a branch of the federal government and in 1992 nist created something called a federal information processing standard this one was called fips 127 that happen to be identical to the ANC SQL stand uh and even more important they provided a test suite and a validation service for conformance to this standard and companies whose database product passed the validation test received a license to sell their products to the federal government well several companies did this and uh and this gave a big boost to the commercial presence of the SQL language because you could sell it to the government well the SQL standard has evolved a lot over the last 50 years it started out pretty simple and it just keep growing uh a lot of new features have been added uh data and time data types outer joins recursive queries the list goes on and on uh a new revision has come out about every five years uh the latest one came out in 2023 I think the standardization product has has had several good effects on the industry uh number one it gave customers confidence that they had multiple sources where they could buy their database software uh number two it gave vendors a way to evolve their products while maintaining compatibility with each other and number three it brought some really smart people together to evaluate requirements and make proposals this uh H2 SQL standards committee has been uh meeting on a regular basis uh for uh for a long time now many years I love that um the fact that the language became standardized helped increase adoption because it gives people trust that this is an official thing and that you know what you're getting um so that that seem like a pretty important Milestone I'm wondering are there any other important milestones in the early history of sequel that you think are important sure uh before we leave the standards um uh subject I want to give a disclaimer here uh uh during the decade of the 80s uh when a lot of this standard work was being done I actually took a leave from the database world and got involved in desktop publishing that seemed to me to be the exciting thing that was happening in the 1980s uh but IBM finally decided not to go into that business so uh so I returned to the database world uh around 1990 but by that time a lot of the standards work had had already been done so uh credit for that to other people well during the 1980s the revolution in data management uh really hit full stride the cost of computing and storage kept on coming down the volume of J data generated by businesses uh just expanded enormously uh almost every business system almost every business needed to acquire a system to manage their data Oracle of course continued to prosper but lots of other new relational products entered the market there was db2 and informix and cbase and Tandem and Microsoft SQL Server they all offered implementations of the SQL language seemed to be ring the market for everybody in fact so many products were claiming to be relational that in 1985 Ted Cod published a series of 12 rules that define an authentic relational database and you can find these rules in Wikipedia just uh just search for for cod's 12 rules but starting in the 1990s there were some truly gamechanging developments three very highquality open-source SQL implementations became available their names were MySQL and postgress SQL and SQL light and all three of these were were fully featured reliable high performance systems with large user communities uh they all had free versions and also they had additional services that you could buy for a fee well web-based applications were proliferating in the 1990s that was the do com days and and many of these apps uh used one of these open-source systems for data management uh SQL light in particular is is interesting because it's embedded invisibly all over the place uh it's in most smartphones and browsers and and many popular applications uh so these uh these three open source SQL systems are now among the most widely used database systems in the world absolutely um they are Curr popular through them and these are things that we still teach now on data Camp if you want to learn to use databases you use postco one these other ones so yeah uh it's had a huge impact and actually even at 50 years old SQL is still one of the most popular programming languages so um on things like the toob index their IE Spectrum index like most popular programming languages SQL is just it's always in the top 10 so I'm wondering um how do you account for its longevity I can think of several reasons for that the first and most important reason is Ted C got it right the relational model is simple and powerful and flex ible and elegant and really that made everything else possible uh but second I think it helped a lot that the early research by both systemar and the Ingress project at at Cal were published openly uh so there were basically no impediments to commercialization of this technology that research was given away for free uh third I think the ansy standard uh provided a well- defined language specification and a way for the language to evolve to meet new requirements and uh that kept it alive and well uh as new requirements uh came along over over a period of decades and fourth and and this is really very important are those uh highquality open source SQL implementations available for free well what's not to like about that free stuff is always good for show um excellent so looking back on this do you think that uh sequel is lived up to what you and Ry envisaged for it back in 1974 well that's a good question uh in in some ways uh SQL has been more successful than we ever dreamed it could be but I'd have to say that the language met the goals that we had defined for ourselves only in part uh remember R I thought we would that SQL would be used by what we call Casual users who were not computer programmers well it turned out we were wrong about that uh the actual users of SQL turned out to be mostly programmers building database applications well I think SLE has made the work of these programmers easier and more productive so that's a good thing uh the Casual users I think are still out there but they're not using SQL uh they're using Google and and increasingly uh I think they're starting to use AI systems like CH GPT absolutely um that's been a huge change in the last year is just uh people can have their SQL code written for them quite easily and so it's made it even more accessible so uh yeah maybe that'll bring sequel to even more people um all right is there anything in the world of datab bases that you are currently excited about I'm retired now but uh I keep hearing the term no SQL a lot uh the nosql movement I think is inspired by web applications that need massively scalable databases and that's an important requirement uh to get this scalability these systems usually relax one or more of the constraints of traditional relational database uh so here are some examples of that um number one relational databases usually have rigid schemas that say exactly what the tables look like and that are in that system no SQL systems sometimes relax this requirement they might have a partial schema or maybe no schema at all so they're more flexible uh number two relational databases uh have uh they're limited to the uh relational data model they have they're made out of flat homogeneous tables in each table all the rows left the same well no systems sometimes relax that requirement and have a different data model some of them are just really simple like key value stores others might allow tables to be nested or they might be based on some document format like XML or Json they're kind of all over the place uh well third uh relational systems usually offer some transactional guarantees like the well-known acid properties that keep data in a consistent state uh no SQL systems sometimes relax these guarantees a little bit they'll often replicate data across many nodes and they might tolerate what they call eventual consistency meaning well we'll be we'll be patient we'll allow the nodes a little while to catch up so no sequel is uh is an exciting New Direction I think it's a broad name for several promising new directions in in database research and and that's a good thing um but sometimes I think scalability is what we want uh but scalability isn't necessarily incompatible with a high level language um so I've been hearing about um a new language development called SQL Plus+ that's a a clever name I think uh SQL Plus+ is a Backward Compatible extension of SQL that originated at San Diego uh by a professor named Yannis Peppa Constantin and SQL plus plus has been implemented it's available in open- Source form from the asteris data uh project at uh UC Irvine uh led by Professor Mike KY uh so you can get it in GitHub um and also there are some commercial versions of SQL Plus+ coming out they're uh they're being marketed by couchbase and by Amazon web services uh the Amazon version goes by a different name particle but it's basically SQL Plus+ well SQL Plus+ uh it's one of those schema optional languages and for data model it operates on J collections of Json documents which you can also view as nested tables uh the correspondence between a Json document and nest of tables is the thing that makes SQL ++ compatible with earlier versions of SQL that operated on tables so uh if you're interested in this you can get more information by just Googling SQL Plus+ okay that's interesting because the SQL language has been I mean there has been some have been some updates but not that many updates to it and so having a new language that's sort of similar to SQL and backwards compatible just seem uh like a a pretty good Innovation um all right so uh just to wrap up do you have any final advice for fans of SQL well I look back over my own career uh I think of it as a case of being in the right place at the right time uh the database revolution has uh just been unfolding rapidly over the last half century and uh I was really privileged to take a part in it uh SQL didn't cause this revolution it was caused by economics uh Hardware was getting faster and cheaper at a at a exponential rate and these advances in Hardware made three things possible uh the first was a clean elegant data model like cod's relational model and the second thing was a highlevel non-procedural language which turned out to be SQL and the third thing is the optimizing compiler that brought these things together and made them commercially viable well these three items the data model the career language and the optimizing compiler uh all support each other sort of like a three-legged stool and that's what's made uh today's database systems possible I think so to wrap up uh in my career I've had some lucky breaks I've been privileged to work with some brilliant people Ted COD ray Boyce Jim Gray fat solinger the whole system our team ma Stone breaker uh uh I'm U I'm in debt to all of these people it's it's been a wild ride and I'm uh very grateful for the opportunities that have come my way over the years wonderful I mean it's such a fascinating story and your achievement to just um been used by so many millions of people so it's a very impressive stuff um all right uh thank you for joining me on the show Don oh thank you Richie it's been a pleasure talking to you ohthe database revolution has just been unfolding rapidly over the last half century and I was really privileged to take a part in it SQL didn't cause this revolution it was caused by economics hi Don welcome to the show hi Richie thank you pleasure to be here I'd love to start by just finding out how you first became interested in databases well I'll start from the beginning in in 1970 I was finishing my Graduate Studies at Stanford and I took my first professional job with IBM at the Watson Research Center in New Yorktown Heights New York uh I moved from California to New York in the winter which was not a move I'd recommend if you enjoy warm weather uh a few months later my friend Bray Boyce also completed his graduate work at Purdue and he joined IBM at the same location where I was well Yorktown was the uh Central research facility of IBM and the mission of IBM research is to study technologies that might influence IBM's future products and in 1970 uh there was kind of Revolution going on the cost of of computing was coming down very quickly and lots of companies were putting data online for the first time and this seemed like a business opportunity so uh the group that Ray and I were in was assigned to study the state-of-the-art in database management with an eye to influencing IBM's future products okay and what was this uh first database that you got interested in well we studied something called the dbtg report so let me tell you where that came from and why we were interested in it uh in the early 1970s the most respected person in the databased industry was a guy named Charles Bachman of General Electric known to his friends as Charlie and Charlie had actually invented the concept of a database management system he was the first one to call for a separate software layer to manage data that was shared by multiple applications and this was a pretty important concept and for inventing the database management system basically uh Charlie received the ACM touring award which is the most prestigious award in computer science well Charlie had actually built a database management system that was called IDs stood for integrated data store and IDs stored information in the form of records and connection connections between records you could think of them as pointers uh in in in IDs a program could navigate through what Charlie called Data space moving from one record to another by following these pointers to find an answer to a question and in fact when Charlie gave his touring award lecture he titled it the programmer as Navigator well one of the most popular business programming langu languages at the time was Cobalt and there was a movement to add database management functions to Cobalt and a committee was formed for this purpose called the database task group was abbreviated dbtg Charlie was a member of dbtg and the group published a report in 1971 uh defining a set of commands for navigating in data space based on Charlie's ideas well the dbtg report was was pretty important at the time Ray and I spent some time studying it it was wonderfully complicated it had currency indicators set of current selection rules it had a fine command with seven different versions it could do a pretty good job of answering questions that were anticipated in the database design uh but for unanticipated questions sometimes you were out of luck uh Ray and I wrote a review of the dbtg report and we suggested some incremental improvements uh we thought if we could manage if we could manage to understand something as complicated as dbtg our careers would be off to a good start that's funny um I love the analogy there of uh working with data as being navigating I think it's a phrase is not often used but you do spend so much time trying to work out how different bits of data are connected together and um it sounds like this idea of data being connected is leading towards the idea of relationships between data and relational databases so can you talk me through how relation databases came about and how you got interested in them well sure all this came about because of a paper written by Ted C Ted was a scientist at IBM's research laboratory in San Jose California and in June of 1970 he published what became a very famous paper called a relational model of data for large shared data banks the basic point of Ted's paper was that Charlie bman had gotten it all wrong and that navigating through data space was a bad idea uh Ted thought that database queries should not look like programs that tell the computer what to do he wanted to express queries in a highlevel nonprocedural language he liked to say tell me what you want not how to find it Well I read Ted's paper as part of my learning process in getting up to speed on the state-ofthe-art and databases and on first reading I wasn't much impressed with this paper uh Ted was basically a mathematician and his paper contained a lot of mathematical Dragon it defined a relation as a subset of the cartisian product of a set of domains and it introduced Concepts like data Independence and normalization and operators like permutation and projection and join and uh my impression of all this was that card's paper was interesting from a theoretical point of view but I couldn't see that it was really grounded in Practical engineering but I kept on hearing more about this relational data model I heard about a symposium that was going to be held in Miami Beach in December of 1972 and it was going to feature a tutorial on relational database well traveling to Miami in the winter had a certain appeal uh so I got permission to attend this um uh the Symposium was called the coins 72 Symposium conference on Information Systems and I actually met Ted card for the first time on the beach at Fountain Blue Hotel uh I attended the tutorial which was taught by Chris D and I have to describe it as a conversion experience for the first time I began to understand the Simplicity and power and elegance of Ted's relational approach uh queries that took a whole page in of code in dbtg uh could often be expressed in a single line in a relational approach uh so when I returned to New York uh I wasn't interested in dbtg anymore uh I was piing up a new interest in relational query languages that's fascinating and I love that even though it was a hugely influential paper your first reaction to Ted work with like oh it's just mathematical nonsense all theoretical no practical applications but once you sort of see it's in action it is actually incredibly powerful so I love how um it just translates into something uh more practical and more real um I think the first sort of real implementation of this was the systemar project that you worked on is that right well there were actually several things going on uh more or less at the same time in different places in IBM and also at other different companies uh there was the Ingress uh project for example at uh at Berkeley but uh I'm going to talk mainly about system R because that was the project that I was associated with uh there were a lot of people who saw the power and simplicity of cod's approach uh but the whole idea depended on a high level of query language with an optimizing compiler that could turn it into efficient code and the question was that sounds like a good idea but was it just science fiction or was it really ready for prime time and in 1973 IBM decided to answer this question by building an industrial strength relational system just to prove it could be done and this was done at IBM research they created a project for this purpose and called it system R well sstr was located in San Jose because that's where Ted Cod was and there were about 14 people including Ry and myself who were gathered from several IBM sites all over the country uh to come together and work on this system R project well I wasn't very happy about moving to New York at that time I'm sorry about moving from New York to San Jose at that time I had just bought a house and my wife had a good job teaching High School uh I felt kind of disrupted but I made the move and uh went to San Jose to join the system R project and that turned out to be the best decision I ever made um working on IBM's first relational database system uh really turned out to be the opportunity of life um it's interesting how these cross country moves have been problematic each time but yeah stford I've been moved that's the Legend um and then from here from system art it seems like this is leaning towards uh the really important paper that you wrote around uh the squel programming language so where did that idea come from well to tell you the truth Ray and I liked some parts of card's ideas better than others we really like this idea of a nonprocedural query with the SL and tell me what you want and not how to find it what we didn't like was the mathematical jargon in Ted's papers we wanted to design a language for a new class of user uh we called them casual users uh we thought a casual user is a professional who needs access to data but he doesn't want to be a computer programmer and he doesn't even want to rely on a computer program he might be a urban planner or a financial analyst or a insurance company executive and he might have questions that vary from day to day and he might want his results pretty quickly well the database systems of the 1970s just didn't meet these requirements so uh to serve this casual user Ry and I wanted to design a new language and we set certain goals for it number one we wanted to use the term tables instead of relations everybody knows what a table is number two we wanted to base the language on ordinary English words like select and goal number three uh the language should have no special symbols and it should be easy to type on a keyboard and goal number four which is maybe the most challenging one we wanted it to have something that we called the walk up and read property meaning in simple cases a user with no special training should be able to understand a query just by reading it well those were the goals that we set for for ourselves uh and we we called this uh new language sequel which was a acronym for a Structured English query language it's amazing how the things you were worrying about back then 50 years ago are are things that were still worrying about now so for example there's a big push at the moment to make data more accessible to everyone regardless of whether you have a technical background or not and I find it fascinating that this is something you worried about when you were first designing the SQL langage that people who didn't have this um strong mathematical background could still make use of the technology so you mentioned the idea of walk up and read so uh people just walk up look at the code and it makes sense to them it sounds like a difficult thing to measure so how do you know if you've been successful at that that's a good question it's a hard thing to do and it's a hard thing to measure and uh I'll never know really how successful we were but uh we had a psychologist on the staff uh named Phyllis reer and Phyllis conducted an experiment at San Jose State University uh teaching SQL to college students who' had no programming experience at all and recording their progress and the kinds of errors that that they made uh turned out that these college students could become proficient in squel after a few hours of instruction uh their most common error was something funny they would forget to put quotes around strings uh so for example if a query contained the phrase name equals Fred you had to put quotes around Fred to indicate that it's a constant string somebody's name rather than the name of a column uh well that's an important distinction but a lot of students never understood it and uh intended not to put quotes around anything I can confirm that that is still a problem in every programming language 50 years later forgetting to quote your strings and putting bits of syntax in the wrong place um so after um you've sort of designed this language I think it was initially used uh just within uh IBM how did it um travel outside this um outside that organization well R and I published the first uh squl paper at a conference called Sig fep in an Arbor Michigan in June of 1974 uh siget has since changed its name to sigmod the special interest group on management of data and it's now uh probably the most prestigious annual database conference uh this conference in 1974 was very interesting because it featured a panel discussion between Ted cut and Charlie Baka uh now this was called a panel discussion but everybody knew it was a debate and in my view uh Ted Cod was the uh was the winner of this debate I think after this conference in 1974 uh Ted's relational approach was considered to be the new mainstream in database management so that's why I consider that this year 1974 starts the clock on what I've called 50 years of relational databases and since the first SQL paper appeared in this conference uh it also kind of starts the clock on 50 years of SQL I have a question for you on this because the in this paper on SQL it's spelled SE qu El now been shortened even today there's a lot of confusion about do I call it SQL do I call it SQL i' really love to have an official answer on this uh which do you prefer well at some point uh after publishing our paper we got a uh a letter from somebody's lawyer that said we couldn't use the name SQL anymore it was somebody's registered trademark so we had to officially shorten the name to SQL which stood for structured query language uh so the official name of the language is now SQL but SQL is a lot easier to say than SQL so I usually just pronounce pronounce the name SQL and uh and hope I won't get in any trouble for doing that all right we have an official answer there that's pretty I like that both ways are possible one's better for writing and one's better for speaking um so uh when I interrupt you you uh talking about the SQL paper can you tell me what happened uh once the paper was um published actually the the next thing that happened was was a tragedy um that sequel paper was the last thing that Ray Boyce and I did together uh less than a month after the Sig for de conference uh my friend Ray died suddenly and unexpectedly of brain aneurism it's a very sad event and even 50 years later it feels like a real tragedy um so I'm sure it must have been a shock to you um could you maybe tell me a bit about what it was like working with Ry yeah Ry was my best friend we moved from New York to California together uh we lived near each other we carpooled to work uh I drove R to work at IBM on the day he had his aneurysm attack and he was taken away in an ambulance Ray and I used to play something we called the query game we were experimenting with different query language designs we take turns dreaming up queries and challenging each other to express them we explored a lot of ideas in those days and at the end of the day we couldn't remember which one of us was responsible for any given idea uh collaborating with Ray was the best part of my job that's wonderful that you have fond memories of working with him and again yeah I'm sorry I'm sorry that such a tragedy happened to and to your friend I'm wondering um so after You' had this sort of revolutionary paper um things are starting to get popular or who were the first people that took on this idea who was using SQL to begin with well you have to remember that systemar was a research prototype it was not an IBM product uh so you couldn't just go somewhere and buy it um as a research group uh we wanted to gain some visibility inside IBM and to do that we needed to have some users so we distributed systemar to about a dozen internal IBM sites and also on a joint study basis to three uh Frontline IBM customers that was Boeing and Bratt Whitney and upjn and we had quarterly meetings with all of our uh users to learn about their experiences and respond to their suggestions uh it was during this period uh that we uh had to shorten the uh the name to uh to SQL okay so it a a lawsuit that I intervene there there Dar lawers um oh ah okay um all right so uh to begin with it was a research project and then when was SQL first commercialized well since SQL was invented at IBM research you might expect that IBM would be the first to bring it to Market but that's not actually the way it turned out interestingly enough uh IBM in those days had another database product called IMS and they weren't in a hurry to introduce a competitor to their successful product uh but they did allow the systemar group to publish their results in the open technical literature uh that was generous and that's how we published the SQL paper and lots of other papers about the details of the system our work well there was a small startup company uh uh called relational software Incorporated abbreviated RSI that took an interest in these papers uh the founders of RSI guessed correctly that IBM would eventually release a SQL product on mainframe computers and they saw an opportunity there uh they decided to build a product that was compatible with SQL on less expensive Hardware platforms and to bring it to the market quickly and they executed this plan very successfully fact in 1979 they released a SQL product called Oracle uh running on a mini computer a pdp1 uh and this product was immediately successful so much so that RSI changed its name to the Oracle company uh and Oracle was actually the first commercial implementation of sequel that's fascinating because I I've not heard of RSI but obvious Oracle is a huge brand name so uh I realized about the name change uh IBM uh itself didn't release a SQL product until 1981 uh on some of its smaller computers that was two years after Oracle and their strategic Mainframe product called db2 uh came out in 1983 that was four years after Oracle and by this time well Oracle had pretty much established a commanding lead in the uh database Market okay uh so it seems like that was the the main competition then was between Oracle and IBM in the the early days were there any other players yeah there were um I've been talking about systemar a research project to prove the concept of a commercial relational system but there was another project very much like that also going on at the same time at UC Berkeley uh their project was called Ingress and it was led by two Prof professors Mike Stonebreaker and Jean W well Ingress had its own highlevel quy language called quell and much like system R Ingress was distributed for free to experimental users and which were mainly universities and it became widely used as a teaching tool at universities and Ingress uh spun off a commercial company also called Ingress in 1980 uh and in the early 1980s Ingress and Oracle were the market leaders in relational databases they kind of ran neck and neck they both ran on Deck V computers uh and uh Ingress implemented the Quil language an oracle implemented Sequel and the Quil language was well liked by its users uh but I think I'd give the edge to the Oracle marketing divisions they marketed the their SQL product very aggressively and in 1984 uh Ingress decided that they had to begin supporting SQL uh in order to compete with Oracle okay so I didn't realize that there were all these sort of alternate um languages then for accessing relational databases but it seems um within a few years things have become standardized because you had uh uh the adventive uh standard for the SQL language can you tell me how that came about and what your involvement was in yeah that I think that's an interesting story uh the American national standards Institute ansy uh created a a committee in the late 1970s to define a standard database language they kept changing the name of this committee but usually had H2 in its name somewhere so I'm going to call it the H2 committee well at first this standard was supposed to be based on dbtg but in 1982 they decided to extend the mission to define a relational standard also they wind up with two different standards one based on dbtg and also a relational one and when they got into the relational business uh the uh two companies that were in the marketplace for relational systems were Oracle and Ingress and they were both marketing SQL and so the H2 committee decided that they would base their relational standard on some version of SQL they went ahead and created a standard which became an ansy standard and also an international standard with ISO uh these were named database language SQL SQL and and they were released in 1986 so that was going on uh in NC and ISO which were uh voluntary associations of commercial entities companies but actually the standards work that had the most impact in my opinion was something that was going on somewhere else it was at the National Institute of Standards and technology nist sometimes pronounced n and unlike NC uh nist is actually a branch of the federal government and in 1992 nist created something called a federal information processing standard this one was called fips 127 that happen to be identical to the ANC SQL stand uh and even more important they provided a test suite and a validation service for conformance to this standard and companies whose database product passed the validation test received a license to sell their products to the federal government well several companies did this and uh and this gave a big boost to the commercial presence of the SQL language because you could sell it to the government well the SQL standard has evolved a lot over the last 50 years it started out pretty simple and it just keep growing uh a lot of new features have been added uh data and time data types outer joins recursive queries the list goes on and on uh a new revision has come out about every five years uh the latest one came out in 2023 I think the standardization product has has had several good effects on the industry uh number one it gave customers confidence that they had multiple sources where they could buy their database software uh number two it gave vendors a way to evolve their products while maintaining compatibility with each other and number three it brought some really smart people together to evaluate requirements and make proposals this uh H2 SQL standards committee has been uh meeting on a regular basis uh for uh for a long time now many years I love that um the fact that the language became standardized helped increase adoption because it gives people trust that this is an official thing and that you know what you're getting um so that that seem like a pretty important Milestone I'm wondering are there any other important milestones in the early history of sequel that you think are important sure uh before we leave the standards um uh subject I want to give a disclaimer here uh uh during the decade of the 80s uh when a lot of this standard work was being done I actually took a leave from the database world and got involved in desktop publishing that seemed to me to be the exciting thing that was happening in the 1980s uh but IBM finally decided not to go into that business so uh so I returned to the database world uh around 1990 but by that time a lot of the standards work had had already been done so uh credit for that to other people well during the 1980s the revolution in data management uh really hit full stride the cost of computing and storage kept on coming down the volume of J data generated by businesses uh just expanded enormously uh almost every business system almost every business needed to acquire a system to manage their data Oracle of course continued to prosper but lots of other new relational products entered the market there was db2 and informix and cbase and Tandem and Microsoft SQL Server they all offered implementations of the SQL language seemed to be ring the market for everybody in fact so many products were claiming to be relational that in 1985 Ted Cod published a series of 12 rules that define an authentic relational database and you can find these rules in Wikipedia just uh just search for for cod's 12 rules but starting in the 1990s there were some truly gamechanging developments three very highquality open-source SQL implementations became available their names were MySQL and postgress SQL and SQL light and all three of these were were fully featured reliable high performance systems with large user communities uh they all had free versions and also they had additional services that you could buy for a fee well web-based applications were proliferating in the 1990s that was the do com days and and many of these apps uh used one of these open-source systems for data management uh SQL light in particular is is interesting because it's embedded invisibly all over the place uh it's in most smartphones and browsers and and many popular applications uh so these uh these three open source SQL systems are now among the most widely used database systems in the world absolutely um they are Curr popular through them and these are things that we still teach now on data Camp if you want to learn to use databases you use postco one these other ones so yeah uh it's had a huge impact and actually even at 50 years old SQL is still one of the most popular programming languages so um on things like the toob index their IE Spectrum index like most popular programming languages SQL is just it's always in the top 10 so I'm wondering um how do you account for its longevity I can think of several reasons for that the first and most important reason is Ted C got it right the relational model is simple and powerful and flex ible and elegant and really that made everything else possible uh but second I think it helped a lot that the early research by both systemar and the Ingress project at at Cal were published openly uh so there were basically no impediments to commercialization of this technology that research was given away for free uh third I think the ansy standard uh provided a well- defined language specification and a way for the language to evolve to meet new requirements and uh that kept it alive and well uh as new requirements uh came along over over a period of decades and fourth and and this is really very important are those uh highquality open source SQL implementations available for free well what's not to like about that free stuff is always good for show um excellent so looking back on this do you think that uh sequel is lived up to what you and Ry envisaged for it back in 1974 well that's a good question uh in in some ways uh SQL has been more successful than we ever dreamed it could be but I'd have to say that the language met the goals that we had defined for ourselves only in part uh remember R I thought we would that SQL would be used by what we call Casual users who were not computer programmers well it turned out we were wrong about that uh the actual users of SQL turned out to be mostly programmers building database applications well I think SLE has made the work of these programmers easier and more productive so that's a good thing uh the Casual users I think are still out there but they're not using SQL uh they're using Google and and increasingly uh I think they're starting to use AI systems like CH GPT absolutely um that's been a huge change in the last year is just uh people can have their SQL code written for them quite easily and so it's made it even more accessible so uh yeah maybe that'll bring sequel to even more people um all right is there anything in the world of datab bases that you are currently excited about I'm retired now but uh I keep hearing the term no SQL a lot uh the nosql movement I think is inspired by web applications that need massively scalable databases and that's an important requirement uh to get this scalability these systems usually relax one or more of the constraints of traditional relational database uh so here are some examples of that um number one relational databases usually have rigid schemas that say exactly what the tables look like and that are in that system no SQL systems sometimes relax this requirement they might have a partial schema or maybe no schema at all so they're more flexible uh number two relational databases uh have uh they're limited to the uh relational data model they have they're made out of flat homogeneous tables in each table all the rows left the same well no systems sometimes relax that requirement and have a different data model some of them are just really simple like key value stores others might allow tables to be nested or they might be based on some document format like XML or Json they're kind of all over the place uh well third uh relational systems usually offer some transactional guarantees like the well-known acid properties that keep data in a consistent state uh no SQL systems sometimes relax these guarantees a little bit they'll often replicate data across many nodes and they might tolerate what they call eventual consistency meaning well we'll be we'll be patient we'll allow the nodes a little while to catch up so no sequel is uh is an exciting New Direction I think it's a broad name for several promising new directions in in database research and and that's a good thing um but sometimes I think scalability is what we want uh but scalability isn't necessarily incompatible with a high level language um so I've been hearing about um a new language development called SQL Plus+ that's a a clever name I think uh SQL Plus+ is a Backward Compatible extension of SQL that originated at San Diego uh by a professor named Yannis Peppa Constantin and SQL plus plus has been implemented it's available in open- Source form from the asteris data uh project at uh UC Irvine uh led by Professor Mike KY uh so you can get it in GitHub um and also there are some commercial versions of SQL Plus+ coming out they're uh they're being marketed by couchbase and by Amazon web services uh the Amazon version goes by a different name particle but it's basically SQL Plus+ well SQL Plus+ uh it's one of those schema optional languages and for data model it operates on J collections of Json documents which you can also view as nested tables uh the correspondence between a Json document and nest of tables is the thing that makes SQL ++ compatible with earlier versions of SQL that operated on tables so uh if you're interested in this you can get more information by just Googling SQL Plus+ okay that's interesting because the SQL language has been I mean there has been some have been some updates but not that many updates to it and so having a new language that's sort of similar to SQL and backwards compatible just seem uh like a a pretty good Innovation um all right so uh just to wrap up do you have any final advice for fans of SQL well I look back over my own career uh I think of it as a case of being in the right place at the right time uh the database revolution has uh just been unfolding rapidly over the last half century and uh I was really privileged to take a part in it uh SQL didn't cause this revolution it was caused by economics uh Hardware was getting faster and cheaper at a at a exponential rate and these advances in Hardware made three things possible uh the first was a clean elegant data model like cod's relational model and the second thing was a highlevel non-procedural language which turned out to be SQL and the third thing is the optimizing compiler that brought these things together and made them commercially viable well these three items the data model the career language and the optimizing compiler uh all support each other sort of like a three-legged stool and that's what's made uh today's database systems possible I think so to wrap up uh in my career I've had some lucky breaks I've been privileged to work with some brilliant people Ted COD ray Boyce Jim Gray fat solinger the whole system our team ma Stone breaker uh uh I'm U I'm in debt to all of these people it's it's been a wild ride and I'm uh very grateful for the opportunities that have come my way over the years wonderful I mean it's such a fascinating story and your achievement to just um been used by so many millions of people so it's a very impressive stuff um all right uh thank you for joining me on the show Don oh thank you Richie it's been a pleasure talking to you oh\n"

#200 50 Years of SQL _ Don Chamberlin Computer Scientist and Co-Inventor of SQL

Random Videos