**Introduction to Chemometrics and Cheminformatics**
Chemometrics and cheminformatics are two closely related fields that have emerged as essential tools in the analysis and interpretation of chemical data. The term "chemometrics" was coined by Tadeus Czescu in 1957, and it is defined as the application of statistical methods to the study of chemical phenomena. Chemometrics involves the use of mathematical and computational techniques to analyze and understand the properties of molecules, such as their structure, reactivity, and physical and biological properties.
In contrast, cheminformatics refers to the collection, organization, and analysis of large amounts of chemical data. This field has its roots in the development of computer hardware and software, which enabled the creation of large databases that could store and manage vast amounts of chemical information. The advancement of database technology and the availability of computational power have made it possible to analyze and interpret complex chemical data. Cheminformatics is closely related to chemometrics, but it places a greater emphasis on the organization and analysis of large datasets.
**The Role of Computers in Chemometrics**
One of the key factors that contributed to the development of chemometrics was the introduction of computers. The availability of computational power enabled chemists to perform complex calculations and analyze large datasets quickly and efficiently. In the early days of chemometrics, computers played a crucial role in allowing chemometric calculations to be performed. This led to the development of computer-assisted design of experiments (DOE), optimization of chemical reactions, and prediction of chemical properties.
**The Advancement of Cheminformatics**
In contrast, the advancement of cheminformatics was driven by the need for large-scale storage and analysis of chemical data. The availability of high-performance computing hardware and software enabled chemists to create massive databases that could store and manage vast amounts of chemical information. The development of QSAR (Quantitative Structure-Activity Relationship) theory also played a major role in the emergence of cheminformatics.
QSAR is a technique used to predict the biological activity of molecules based on their chemical structure. The development of QSAR theory enabled chemists to analyze large datasets and identify patterns and relationships between molecular properties and biological activity. This led to the creation of databases that could store and manage large amounts of chemical data, which in turn enabled cheminformatics.
**Computational Chemistry**
Computational chemistry is a field that emerged from quantum mechanics and is also known as quantum chemistry. The goal of computational chemistry was to study the properties of molecules and understand how they interact with other molecules. This led to the development of quantum mechanical models that could calculate molecular properties, such as energy and charge distribution.
The use of computational chemistry has enabled chemists to analyze large datasets and identify patterns and relationships between molecular properties and biological activity. Computational chemistry has also enabled the prediction of molecular structures and properties, which can be used to design new molecules with specific properties.
**Machine Learning in Cheminformatics**
Machine learning is a technique that enables computers to learn from data and make predictions or classifications without being explicitly programmed. In cheminformatics, machine learning algorithms are used to analyze large datasets and identify patterns and relationships between molecular properties and biological activity.
The use of machine learning in cheminformatics has enabled chemists to identify new leads for drug discovery, predict the efficacy of compounds, and understand how molecules interact with other molecules. Machine learning algorithms can also be used to analyze large datasets and identify trends and patterns that may not be apparent through traditional analysis methods.
**Recap**
In summary, chemometrics and cheminformatics are two closely related fields that have emerged as essential tools in the analysis and interpretation of chemical data. Chemometrics involves the use of mathematical and computational techniques to analyze and understand the properties of molecules, while cheminformatics refers to the collection, organization, and analysis of large amounts of chemical data.
Computational chemistry is a field that emerged from quantum mechanics and is used to calculate molecular properties and predict their behavior. Machine learning algorithms are used in cheminformatics to analyze large datasets and identify patterns and relationships between molecular properties and biological activity.
**Infographics**
Please see the infographics drawn today for a visual representation of these concepts.
"WEBVTTKind: captionsLanguage: enokay so we're back with another episode of the ask me anything and today i'm going to be covering also in the domain of bioinformatics and so let's have a look at the question that i will be answering today so the question is from username and so the question is what is the real difference between computational biology bioinformatics computational chemistry chemometrics or do they apply the similar statistical concept but called somehow different things okay so here's my attempt to answer this and in doing so i'm going to use my tablet in order to make some infographic drawing or just maybe some doodling all right so let's do this okay so the first area is computational biology so let me draw that computational and then the second one is bioinformatics okay so computational biology and bioinformatics are quite similar so i'm drawing it side by side computational chemistry let's do that computational chemistry and then we have chemometrics and i'm going to add one more thing to this and that is chem informatics cam informatics all right so the first two field that i will be comparing is computational biology and bioinformatics so actually i have a previous video part one and part two where i've shown how computational approaches could be used to make sense of biological data and so that was actually like a bioinformatics 101 and i'll provide you the link in the description of this video and so briefly in that video i recall that i was mentioning the differences between computational biology and bioinformatics and so both terms are quite similar and some might be using it interchangeably so based on my own understanding computational biology is where you apply computational approaches in order to solve biological problems and for bioinformatics it is more into the computational aspect it is essentially development of novel algorithms it is essentially focusing more on the computational aspect and so for someone coming from a technical field like computer science or software engineering i would consider these scientists to be falling into the area of bioinformatics on the other hand if someone coming from like biology and they just need some quick tool to perform analysis they might be using web servers or gui softwares then i would say that these scientists belong to computational biology and i'm sure there are several people who might be using this interchangeably to refer to the same thing and so in a nutshell that's essentially how i believe it is so computational biology is more into the utilization of computational approaches to analyze the biological data and also to make sense of that while in bioinformatics the goal is to apply more sophisticated or advanced concepts in computer science in order to develop new tools in bioinformatics in order to help wet lab or bench scientists make sense of the data so essentially you could think of a software engineer or a computer scientist who are developing solutions and so these are referred to as bioinformatics tools for which they have already developed some form of programs and the scientists who will be using that they might be also writing some simple scripts and so they will be referred to as computational biologists okay so for computational biology i'll say apply existing computational tools to solve biological data problems but for bioinformatics develop novel computational approaches or tools to tackle biological data problems so you see here that the common theme is biological data problems so the computational biology aspect is to apply an existing approach computational approach to make sense of the data so the emphasis there is to obtain biological insights whereas for in bioinformatics the goal is to develop new tools or new algorithms and so the ultimate goal is to provide new and state-of-the-art bioinformatics tools okay so this is the answer for the first part let me move the computational chemistry chem informatics to the other paper to page two okay so these three are quite similar let me first start with the chemometrics so chemometric was a field where computers was recently being introduced and the thing is there's a lot of data in industrial chemical and so there are data such as like spectroscopy data like coming from infrared spectroscopy or mass spec or meaning mass spectrometry so this area has a lot of variables that are computed for the chemicals that are being investigated and so in the industrial chemistry domain the emphasis is on optimizing the the industrial processing so actually i've created or written a book chapter about this let me find this one moment okay and so in the book chapter that i've written and i think it's going to be released sometime in the year 2021 and so i'm going to read it for you from the introduction part so i've written a section about brief history of chemometric and comparing that with chem informatics so here i've written that the historical developments of the field of chemometric dated back to the 1960s and the term chemometric was coined by svante wald who is a swedish researcher and the term was coined in one of his grant or research grant application and it was used for the first time in 1972 and in parallel actually the field of qsar or quantitative structure activity relationship was also started in the early 1960s as well where cohen hans and his colleague fujita developed this approach called quantitative structure activity relationship so that is essentially how in modern times we're using machine learning in order to make predictions of the solubility of molecules or to predict the bioactivity of molecule so this area is called qsar or q-star and it is the area that i'm predominantly doing in my full-time job at the university where i do research into building q-star models and if you come to think of it the field of qsar the precursor work to the development of the qsar area was actually can be dated back to 1863 where a researcher by the name of cross he observed a relationship between the toxicity of primary aliphatic alcohol to the water solubility and so back at the time such relationship was observed that there's a relationship between the toxicity of the alcohol with their solubility the water solubility and there are several studies that have been released afterward that eventually led to the development of qsar in 1960s and the field of chemometric is predominantly being used in order to study chemical data at the industrial level and mostly are involving the use of spectroscopy data because spectroscopy data is usually of high dimension where there are countless number of descriptors in the thousand to ten thousands scale and so that actually led to the development of the principal component analysis where high dimensional data will be scaled down to a lower dimensional form and so as you can see here chemometrics and chem informatic is quite similar but chem informatic is more into big data so chemometric was developed in a time when there was just the introduction of computers and so computer played an instrumental role in allowing chemometric calculations as for chem informatics aside from the introduction of computers the advancement in database or data storage led to the development of the field called chem informatics and at the same time the area of qsar was a major player for chem informatics because in chem informatics there's a lot of chemical data that we need to have a place for it to be stored data storage so here you can see that in chemometrics the data is quite wide meaning that there will be a handful of compounds whereas in chem informatics the data is both wide and long meaning that there will be large chemical library okay so let's continue with computational chemistry right so computational chemistry is quite similar to chem informatics but then the origin of the field of computational chemistry had its origin from quantum mechanics and so i'll say origins from quantum mechanics and sometimes it's called quantum chemistry aka quantum chemistry and so back at the time in quantum mechanics the goal was to study the properties of molecule and the properties of molecule pertaining to the electrons and so in chemistry we know that each atom are made up of electrons and protons as well and also neutrons and so the collection of atoms will be connected in different arrangements in order to give rise to molecules and so atoms will be connected through single bonds double bonds triple bonds and the combination of different type of atoms right it could be carbon nitrogen oxygen so the different combination and the different type of connectivity will give rise to different molecule and each atom will be different in the terms of the electrons some are electron rich some are electron poor and so such disparity will give rise to the unique properties of molecule and that is why some drugs are quite reactive some drugs are more inert and they are less reactive and so we can say that the properties of drug molecule or compounds are governed by the atom and the atoms are governed by the electrons okay and so in quantum mechanics or computational chemistry the aim is to compute the molecular property of compounds and so if we're thinking in terms of machine learning computational chemistry will allow us to compute the molecular properties in terms of the electronic configuration of the molecule and such properties such as the total energy the distribution of charge in the molecule and also the the orbital energy of the highest occupied molecular orbital and also the energy level at the lowest unoccupied molecular orbital which is having the acronym of homo and lumo and so both of these properties and other like charges or energy they are able to be used as molecular descriptors okay and so so essentially we can calculate the molecular properties so we can calculate the properties and they are able to be used for machine learning purposes okay so actually this is a very high level view of the three fields that we see here but for the utilization in the field of machine learning so the utilization of chemical data in the context of machine learning i think this is quite enough for us to have an understanding at the high level so we're going to be seeing here that let me do a recap so in chemometric the dimension the high dimension or the number of descriptor will be quite large so if we think of it in terms of the data frame it will have many columns or wide and the number of rows will be quite moderate to few numbers so therefore the number of compound will be medium-sized whereas in the field of chemical matter the number of descriptor will be large and the number of rows or the number of compounds will also be large as well because of the high number of compounds but then for the chemometric we're focusing more on the properties right so therefore the properties will have several columns whereby it's coming from spectroscopy or experimental measurement and so we could see here that chemometric and chemical they're quite related and if we're taking a closer look into the molecular properties then it is the field of computational chemistry which comes from the field of quantum mechanics which is essentially aka quantum chemistry and so this will allow us to compute the molecular features or molecular property in terms of the electronic configuration of the molecule and so we can see here that these properties will feed into the chemical area because it will be stored in big databases and so such big chemical data will then be used by machine learning okay and so i'll be sharing you the infographics drawn today and if you're finding value in this video please give it a thumbs up subscribe if you haven't yet done so hit on the notification bell in order to be notified of the next video and as always the best way to learn data science is to do data science and please enjoy the journey thank you for watching please like subscribe and share and i'll see you in the next one but in the meantime please check out these videosokay so we're back with another episode of the ask me anything and today i'm going to be covering also in the domain of bioinformatics and so let's have a look at the question that i will be answering today so the question is from username and so the question is what is the real difference between computational biology bioinformatics computational chemistry chemometrics or do they apply the similar statistical concept but called somehow different things okay so here's my attempt to answer this and in doing so i'm going to use my tablet in order to make some infographic drawing or just maybe some doodling all right so let's do this okay so the first area is computational biology so let me draw that computational and then the second one is bioinformatics okay so computational biology and bioinformatics are quite similar so i'm drawing it side by side computational chemistry let's do that computational chemistry and then we have chemometrics and i'm going to add one more thing to this and that is chem informatics cam informatics all right so the first two field that i will be comparing is computational biology and bioinformatics so actually i have a previous video part one and part two where i've shown how computational approaches could be used to make sense of biological data and so that was actually like a bioinformatics 101 and i'll provide you the link in the description of this video and so briefly in that video i recall that i was mentioning the differences between computational biology and bioinformatics and so both terms are quite similar and some might be using it interchangeably so based on my own understanding computational biology is where you apply computational approaches in order to solve biological problems and for bioinformatics it is more into the computational aspect it is essentially development of novel algorithms it is essentially focusing more on the computational aspect and so for someone coming from a technical field like computer science or software engineering i would consider these scientists to be falling into the area of bioinformatics on the other hand if someone coming from like biology and they just need some quick tool to perform analysis they might be using web servers or gui softwares then i would say that these scientists belong to computational biology and i'm sure there are several people who might be using this interchangeably to refer to the same thing and so in a nutshell that's essentially how i believe it is so computational biology is more into the utilization of computational approaches to analyze the biological data and also to make sense of that while in bioinformatics the goal is to apply more sophisticated or advanced concepts in computer science in order to develop new tools in bioinformatics in order to help wet lab or bench scientists make sense of the data so essentially you could think of a software engineer or a computer scientist who are developing solutions and so these are referred to as bioinformatics tools for which they have already developed some form of programs and the scientists who will be using that they might be also writing some simple scripts and so they will be referred to as computational biologists okay so for computational biology i'll say apply existing computational tools to solve biological data problems but for bioinformatics develop novel computational approaches or tools to tackle biological data problems so you see here that the common theme is biological data problems so the computational biology aspect is to apply an existing approach computational approach to make sense of the data so the emphasis there is to obtain biological insights whereas for in bioinformatics the goal is to develop new tools or new algorithms and so the ultimate goal is to provide new and state-of-the-art bioinformatics tools okay so this is the answer for the first part let me move the computational chemistry chem informatics to the other paper to page two okay so these three are quite similar let me first start with the chemometrics so chemometric was a field where computers was recently being introduced and the thing is there's a lot of data in industrial chemical and so there are data such as like spectroscopy data like coming from infrared spectroscopy or mass spec or meaning mass spectrometry so this area has a lot of variables that are computed for the chemicals that are being investigated and so in the industrial chemistry domain the emphasis is on optimizing the the industrial processing so actually i've created or written a book chapter about this let me find this one moment okay and so in the book chapter that i've written and i think it's going to be released sometime in the year 2021 and so i'm going to read it for you from the introduction part so i've written a section about brief history of chemometric and comparing that with chem informatics so here i've written that the historical developments of the field of chemometric dated back to the 1960s and the term chemometric was coined by svante wald who is a swedish researcher and the term was coined in one of his grant or research grant application and it was used for the first time in 1972 and in parallel actually the field of qsar or quantitative structure activity relationship was also started in the early 1960s as well where cohen hans and his colleague fujita developed this approach called quantitative structure activity relationship so that is essentially how in modern times we're using machine learning in order to make predictions of the solubility of molecules or to predict the bioactivity of molecule so this area is called qsar or q-star and it is the area that i'm predominantly doing in my full-time job at the university where i do research into building q-star models and if you come to think of it the field of qsar the precursor work to the development of the qsar area was actually can be dated back to 1863 where a researcher by the name of cross he observed a relationship between the toxicity of primary aliphatic alcohol to the water solubility and so back at the time such relationship was observed that there's a relationship between the toxicity of the alcohol with their solubility the water solubility and there are several studies that have been released afterward that eventually led to the development of qsar in 1960s and the field of chemometric is predominantly being used in order to study chemical data at the industrial level and mostly are involving the use of spectroscopy data because spectroscopy data is usually of high dimension where there are countless number of descriptors in the thousand to ten thousands scale and so that actually led to the development of the principal component analysis where high dimensional data will be scaled down to a lower dimensional form and so as you can see here chemometrics and chem informatic is quite similar but chem informatic is more into big data so chemometric was developed in a time when there was just the introduction of computers and so computer played an instrumental role in allowing chemometric calculations as for chem informatics aside from the introduction of computers the advancement in database or data storage led to the development of the field called chem informatics and at the same time the area of qsar was a major player for chem informatics because in chem informatics there's a lot of chemical data that we need to have a place for it to be stored data storage so here you can see that in chemometrics the data is quite wide meaning that there will be a handful of compounds whereas in chem informatics the data is both wide and long meaning that there will be large chemical library okay so let's continue with computational chemistry right so computational chemistry is quite similar to chem informatics but then the origin of the field of computational chemistry had its origin from quantum mechanics and so i'll say origins from quantum mechanics and sometimes it's called quantum chemistry aka quantum chemistry and so back at the time in quantum mechanics the goal was to study the properties of molecule and the properties of molecule pertaining to the electrons and so in chemistry we know that each atom are made up of electrons and protons as well and also neutrons and so the collection of atoms will be connected in different arrangements in order to give rise to molecules and so atoms will be connected through single bonds double bonds triple bonds and the combination of different type of atoms right it could be carbon nitrogen oxygen so the different combination and the different type of connectivity will give rise to different molecule and each atom will be different in the terms of the electrons some are electron rich some are electron poor and so such disparity will give rise to the unique properties of molecule and that is why some drugs are quite reactive some drugs are more inert and they are less reactive and so we can say that the properties of drug molecule or compounds are governed by the atom and the atoms are governed by the electrons okay and so in quantum mechanics or computational chemistry the aim is to compute the molecular property of compounds and so if we're thinking in terms of machine learning computational chemistry will allow us to compute the molecular properties in terms of the electronic configuration of the molecule and such properties such as the total energy the distribution of charge in the molecule and also the the orbital energy of the highest occupied molecular orbital and also the energy level at the lowest unoccupied molecular orbital which is having the acronym of homo and lumo and so both of these properties and other like charges or energy they are able to be used as molecular descriptors okay and so so essentially we can calculate the molecular properties so we can calculate the properties and they are able to be used for machine learning purposes okay so actually this is a very high level view of the three fields that we see here but for the utilization in the field of machine learning so the utilization of chemical data in the context of machine learning i think this is quite enough for us to have an understanding at the high level so we're going to be seeing here that let me do a recap so in chemometric the dimension the high dimension or the number of descriptor will be quite large so if we think of it in terms of the data frame it will have many columns or wide and the number of rows will be quite moderate to few numbers so therefore the number of compound will be medium-sized whereas in the field of chemical matter the number of descriptor will be large and the number of rows or the number of compounds will also be large as well because of the high number of compounds but then for the chemometric we're focusing more on the properties right so therefore the properties will have several columns whereby it's coming from spectroscopy or experimental measurement and so we could see here that chemometric and chemical they're quite related and if we're taking a closer look into the molecular properties then it is the field of computational chemistry which comes from the field of quantum mechanics which is essentially aka quantum chemistry and so this will allow us to compute the molecular features or molecular property in terms of the electronic configuration of the molecule and so we can see here that these properties will feed into the chemical area because it will be stored in big databases and so such big chemical data will then be used by machine learning okay and so i'll be sharing you the infographics drawn today and if you're finding value in this video please give it a thumbs up subscribe if you haven't yet done so hit on the notification bell in order to be notified of the next video and as always the best way to learn data science is to do data science and please enjoy the journey thank you for watching please like subscribe and share and i'll see you in the next one but in the meantime please check out these videos\n"