EXTRA BITS - SGML HTML XML - Computerphile

The Evolution of HTML and XML: A Subset Relationship

This really means that a diagram that we had in the previous video can now be extended. You'll recall that our described html has been the enabling technology, the foundations of the building in a way, and we built these applications on top of them. I did a memo, I referred to the dod's cows, I referred to the text encoding initiative, and I referred to html. Say you know, sad it had the l at the end - it caused so much misunderstanding. But anyway, let's just extend this a bit more now over to the right, and let us say that what the xml committee did was to enable as it were inside this sgml universe as a proper subset like that. And that one is labeled xml.

I've tried to draw it wholly within the html boundary just to emphasize that yes, it really is a genuine subset. The theologians would say, ah, yes, but it's a subset of the extended sgml that included several options that charles specified way back and you had to have those in order for it to work. But nevertheless, yes, it is in spirit a proper subset. And that meant that you could generate on top of that new tag sets or cleaned-up versions of existing ones.

One of the first things they did of course was to clean up html and its spec into xml format. Remember xml is a subset of html because they're the same things - they're meta syntaxes built now on top of xml notation is xhtml, which is cleaned-up html all tags matching defined wonderfully. And of course, several other people tens of thousands of other people have been able to build their own tag sets on top of xml technology.

Just to take one that's very familiar to me and to some of my academic colleagues for mathematics markup again - one wishes that they hadn't bought the l on the end but they did mathemate, and here's a good one. Adobe knows what they're doing; you don't put ml on the end of things unless it's a meta language perhaps. They took their pdf technology and made it so that it could be expressed in xml tag-set form, and that of course is called scalable vector graphics svg.

And I'll put the dots here to show that there are now literally i would guess thousands if not tens of thousands - remember all of these; the way you refer to them is you say svg is an application of xml. Can we just be absolutely clear then just to throw this in one more time what was the big problem with putting l on the end of these? Because unfortunately, the two enabling technologies end with ml and the in-crowd so we know it says markup language - it's actually meta language but of course if you start using it for something that is a fixed tag set not a tag-set technology punctuation rules so to speak then people start saying things like oh they're just; they're you know html is a subset of that no, it's not a subset of html, it's an application you say, and people start to make those mistakes between application and subset.

Motivated by the fact that they all end in ml, so they all must be the same sort of thing and they don't. Piece broke out - i think we should say that alongside all of this in the mid-1990s, and equally important, it wasn't just getting xml sorted out and therefore being able to do html as xhtml properly. It was also the fact that i think in that era tim berners-lee moved from cern he moved to mit, and all credit to him exactly; the right thing i think; mit made him an offer to host basically a sort of academic-like institution concerned entirely with setting standards for the web and being vendor-neutral in everything it did.

And as i say all credit to him, he could have made millions out of the world there's no question but he opted to say no i would sooner make a smaller amount of money and be respected and to have mine be the standard way of doing things so in a way the vendors can meet on vendor-neutral territory still have shouting matches and still have bus stops - but it's all under the aegis of the w3c, the world-wide web consortium which is in charge of all this.

I've never actually been to one of their meetings i've talked to one or two people who've been there and i said how do you cope with the fact that people still get subsets and applications confused? You know; is xhtml a cut-down version of xml no no no no - and one said i'm weary i'm going bold, i've got a grey beard all we try and ensure is that everything we say on our website is absolutely accurate and correct but to try and get everybody in the whole world to understand and get it right life's too short. Html is a meta notation; memo is a specific tag set and just to tell you how it was because suddenly it's not that old, is bad - all of this really means that we need to be careful with our language and make sure we're using the correct terminology.

"WEBVTTKind: captionsLanguage: enwe've been wandering down the the buy roads of html x ml sgml all sorts of mls and various non-mls and last time i think we spoke some people had wandered off to the pub and sorted everything out over a pint so what happened about that then what happened next well after about 18 months the draft xml specification was published and everybody agreed great job didn't we learn a lot from the decisions in html that caused so much difficulty and of course the major thing not by means the only one but a big major thing was that you must not omit your end tags in xml they will be there that being said of course a lot of browsers still have to be tolerant about it some people you know maybe i don't know dreamweaver or something might produce a very very good xml compliant web page but a lot of people still hand code a lot of people particularly with paragraphs go straight p p p and don't see the need to close off the paragraph so you've got to be a bit tolerant but nevertheless the idea was you do not omit your end tags and there were about seven or eight other rather more technical things had to be cleaned up from html but it made a lot of difference to people implementing that they were cleaned up one of the first ones i'm aware of because i tend to forget it is that um in sg in sgml because it was the days of punch cards uppercase and lowercase were treated as the same so if you called it para in capitals it was the same as parrot in lowercase or even you could have capital p little a capital r little a again and that was the same as para you know that had to be cleaned up so what this made possible was for the browser implementers they loved it because it meant that if all the end tags were there you could do a brackets matching operation essentially a bit like in a compiler where you match open curly brace with closed curly brace you could tick everything off and you could say that is a tree i may not understand what all the tags mean but boy they match all right and now that's very useful to know about so what it also meant was that there was a clean target for front-end software to aim at the idea is that your dreamweavers or whatever probably still aren't totally compliant i don't know but they could say we will try and produce really good nice xml um x yes xml notated web page stuff so what this really means is that a diagram that we had in the previous video can now be extended you'll recall that our described html has been the enabling technology the foundations of the building in a way and we built these applications on top of them i did a memo i referred to the dod's cows i referred to the text encoding initiative and i referred to html and say you know sad it had the l at the end it caused so much misunderstanding but anyway let's just extend this a bit more now over to the right and let us say that what the xml committee did was to enable as it were inside this sgml universe as a proper subset like that and that one is labeled xml and i've tried to draw it wholly within the html boundary just to emphasize that yes it really is a genuine subset the theologians would say ah yes but it's a subset of the extended sgml that included several options that charles specified way back and you had to have those in order for it to work but nevertheless yes it is in spirit a proper subset and that meant that you could generate on top of that new tag sets or cleaned up versions of existing ones so one of the first things they did of course was to clean up html and its spec into xml format now remember xml is a subset of html because they're the same things they're meta syntaxes built now on top of xml notation is x h t m l cleaned up html all tags matching defined wonderful and of course several other people tens of thousands of other people now have been able to build their own tag sets on top of xml technology just to take one that's very familiar to to me and to some of my academic colleagues for mathematics markup again one wishes that they hadn't bought the l on the end but they did mathemate and here's a good one adobe know what they're doing you don't put ml on the end of things unless it's a meta language perhaps they took their pdf technology and they made it so that it could be expressed in xml tag set form and that of course is called scalable vector graphics s v g and i'll put the dots here to show that there are now literally i would guess thousands if not tens of thousands remember all of these the way you refer to them is you say svg is an application of xml can we just be absolutely clear then just to throw this in one more time what was the big problem with putting l on the end of these because unfortunately the two enabling technologies end with ml and the in crowd so we know it says markup language it's actually market meta language but of course if you start using it for something that is a fixed tag set not a tag set technology punctuation rules so to speak then people start saying things like oh they're just they're you know html is a subset of that no it's not a subset of html it's an application you say and people start to make those mistakes between application and subset motivated by the fact that they all end in ml so they all must me the same sort of thing and they don't so piece broke out i think we should say that alongside all of this in the mid 1990s and equally important it wasn't just getting xml sorted out and therefore being able to do html as xhtml properly it was also the fact i think in that era tim berners-lee moved from cern he moved to mit and all credit to him exactly the right thing i think mit made him an offer to host basically a sort of academic-like institution concerned entirely with setting standards for the web and being vendor-neutral in everything it did and as i say all credit to him he could have made millions out of the world there's no question but he opted to say no i would sooner make a smaller amount of money and be respected and to have mine be the standard way of doing things so in a way the vendors can meet on vendor neutral territory still have shouting matches and still have bus stops but it's all under the aegis of the w3c the world wide web consortium which is in charge of all this i've never actually been to one of their meetings i've talked to one or two people who've been there and i said how do you cope with the fact that people still get subsets and applications confused you know is xhtml a cut-down version of xml no no no no and one said i'm weary i'm going bold i've got a grey beard all we try and ensure is that everything we say on our website is absolutely accurate and correct but to try and get everybody in the whole world to understand and get it right life's too short html is a meta notation memo is a specific tag set and just to tell you how it was because suddenly it's not that old is bad all becomes good right so you have to have this set of intermediate concepts rightwe've been wandering down the the buy roads of html x ml sgml all sorts of mls and various non-mls and last time i think we spoke some people had wandered off to the pub and sorted everything out over a pint so what happened about that then what happened next well after about 18 months the draft xml specification was published and everybody agreed great job didn't we learn a lot from the decisions in html that caused so much difficulty and of course the major thing not by means the only one but a big major thing was that you must not omit your end tags in xml they will be there that being said of course a lot of browsers still have to be tolerant about it some people you know maybe i don't know dreamweaver or something might produce a very very good xml compliant web page but a lot of people still hand code a lot of people particularly with paragraphs go straight p p p and don't see the need to close off the paragraph so you've got to be a bit tolerant but nevertheless the idea was you do not omit your end tags and there were about seven or eight other rather more technical things had to be cleaned up from html but it made a lot of difference to people implementing that they were cleaned up one of the first ones i'm aware of because i tend to forget it is that um in sg in sgml because it was the days of punch cards uppercase and lowercase were treated as the same so if you called it para in capitals it was the same as parrot in lowercase or even you could have capital p little a capital r little a again and that was the same as para you know that had to be cleaned up so what this made possible was for the browser implementers they loved it because it meant that if all the end tags were there you could do a brackets matching operation essentially a bit like in a compiler where you match open curly brace with closed curly brace you could tick everything off and you could say that is a tree i may not understand what all the tags mean but boy they match all right and now that's very useful to know about so what it also meant was that there was a clean target for front-end software to aim at the idea is that your dreamweavers or whatever probably still aren't totally compliant i don't know but they could say we will try and produce really good nice xml um x yes xml notated web page stuff so what this really means is that a diagram that we had in the previous video can now be extended you'll recall that our described html has been the enabling technology the foundations of the building in a way and we built these applications on top of them i did a memo i referred to the dod's cows i referred to the text encoding initiative and i referred to html and say you know sad it had the l at the end it caused so much misunderstanding but anyway let's just extend this a bit more now over to the right and let us say that what the xml committee did was to enable as it were inside this sgml universe as a proper subset like that and that one is labeled xml and i've tried to draw it wholly within the html boundary just to emphasize that yes it really is a genuine subset the theologians would say ah yes but it's a subset of the extended sgml that included several options that charles specified way back and you had to have those in order for it to work but nevertheless yes it is in spirit a proper subset and that meant that you could generate on top of that new tag sets or cleaned up versions of existing ones so one of the first things they did of course was to clean up html and its spec into xml format now remember xml is a subset of html because they're the same things they're meta syntaxes built now on top of xml notation is x h t m l cleaned up html all tags matching defined wonderful and of course several other people tens of thousands of other people now have been able to build their own tag sets on top of xml technology just to take one that's very familiar to to me and to some of my academic colleagues for mathematics markup again one wishes that they hadn't bought the l on the end but they did mathemate and here's a good one adobe know what they're doing you don't put ml on the end of things unless it's a meta language perhaps they took their pdf technology and they made it so that it could be expressed in xml tag set form and that of course is called scalable vector graphics s v g and i'll put the dots here to show that there are now literally i would guess thousands if not tens of thousands remember all of these the way you refer to them is you say svg is an application of xml can we just be absolutely clear then just to throw this in one more time what was the big problem with putting l on the end of these because unfortunately the two enabling technologies end with ml and the in crowd so we know it says markup language it's actually market meta language but of course if you start using it for something that is a fixed tag set not a tag set technology punctuation rules so to speak then people start saying things like oh they're just they're you know html is a subset of that no it's not a subset of html it's an application you say and people start to make those mistakes between application and subset motivated by the fact that they all end in ml so they all must me the same sort of thing and they don't so piece broke out i think we should say that alongside all of this in the mid 1990s and equally important it wasn't just getting xml sorted out and therefore being able to do html as xhtml properly it was also the fact i think in that era tim berners-lee moved from cern he moved to mit and all credit to him exactly the right thing i think mit made him an offer to host basically a sort of academic-like institution concerned entirely with setting standards for the web and being vendor-neutral in everything it did and as i say all credit to him he could have made millions out of the world there's no question but he opted to say no i would sooner make a smaller amount of money and be respected and to have mine be the standard way of doing things so in a way the vendors can meet on vendor neutral territory still have shouting matches and still have bus stops but it's all under the aegis of the w3c the world wide web consortium which is in charge of all this i've never actually been to one of their meetings i've talked to one or two people who've been there and i said how do you cope with the fact that people still get subsets and applications confused you know is xhtml a cut-down version of xml no no no no and one said i'm weary i'm going bold i've got a grey beard all we try and ensure is that everything we say on our website is absolutely accurate and correct but to try and get everybody in the whole world to understand and get it right life's too short html is a meta notation memo is a specific tag set and just to tell you how it was because suddenly it's not that old is bad all becomes good right so you have to have this set of intermediate concepts right\n"