WEKA Tutorial #1.1 - How to Build a Data Mining Model from Scratch

Welcome to the Data Professor: An Introduction to Data Science and Building Your First Prediction Model

Hello and welcome back to my channel, the Data Professor! I'm Chanin Nantasenamat, and today we're going to explore one of the most exciting fields out there - data science. As we all know, data is ubiquitous in our daily lives, and with the ever-increasing amount of big data available, it's essential to learn how to analyze, gain insights from, and make informed decisions using this information.

So, what exactly is data? Data pertains to information about entities of interest, such as health parameters of a human being, characteristics of cars, or properties of drugs. These examples illustrate the diverse range of data types that we can collect and work with in our daily lives. In essence, data science is a broad field that encompasses various smaller disciplines like statistics, mathematics, data visualization, programming, data mining, and machine learning.

Data mining is a subset of data science that refers to the specific process of making use of data to build prediction models and extract knowledge from the data. On the other hand, machine learning refers to the learning algorithms used to create these prediction models within the data mining process. So, as you can see, data science is a multidisciplinary field that offers a wealth of opportunities for individuals looking to explore and apply their analytical skills.

Now that we've had a brief introduction to data science, let's get started with building our very first prediction model! To achieve this, we'll be using the WEKA program, which is an excellent tool for performing data mining. WEKA has an intuitive graphical user interface that allows us to pre-process, transform the data, and construct the prediction model using various machine learning algorithms.

Weka was created by two developers, Ian Witten and Eibe Frank, from the University of Waikato. It's a versatile program that can be used on multiple platforms, including Windows, Mac, and Linux. Before we begin, it's essential to select the correct version of WEKA for our operating system. We have several versions available, including stable and developer versions, each with its unique features.

The first file we need to download is the WEKA program itself, which comes in a 64-bit version. However, we also have options that include or exclude the Java Virtual Environment (JRE), depending on our operating system requirements. If you're starting out with data science, it's recommended to use the stable version of WEKA.

Once we've selected the correct file, we need to check if our computer is running a 64-bit or 32-bit operating system. This information will help us determine whether we need to install the Java Virtual Environment (JRE) as part of the WEKA software. To do this, we can open the Properties window and look for the Java version. If your computer doesn't have Java installed, you can download it from Google.

After selecting our desired version of WEKAsome, it's time to start downloading the program. This process may take a few minutes, depending on your internet speed. Once the download is complete, we'll need to install the software. During this process, we'll be asked if we want to allow the program to make changes to our device.

To continue with the installation, we simply need to follow the prompts and click through the next steps. This will involve installing the Java Virtual Environment (JRE) as part of the WEKA software. Once this is complete, we can start using WEKA and begin building our first prediction model.

And that's it for today's tutorial! I hope you enjoyed learning about data science and how to build your very first prediction model using WEKA. If you haven't subscribed to my channel yet, please consider doing so, as well as clicking on the notification bell to stay informed about upcoming videos. Until next time, I'll see you in the next video!

"WEBVTTKind: captionsLanguage: enWelcome back to the Data Professor, I'mChanin Nantasenamat and in this episodeI'm going to give you a quickintroduction about what is data scienceand how you can go about building yourvery first prediction model so withoutfurther due, let's get started!Data is ubiquitous, and in this day andage, we have an ever-increasing amount ofdata, infamously known as big data, which we can use to analyze, to gain insightsand to drive the decision-making process.So, what exactly is data, data pertains toinformation about entities of interestFor example, (1) health parameters of a human being such as the red and white bloodcell count, the blood profile, lipidprofile and other parameters thatdescribes the health status of anindividual, (2)characteristics of cars suchas the top speed that it can go and fuelconsumption rates, (3) properties of drugssuch as the molecular size, solubility,electronic and hydrophobic properties ofthe drug.Simply put, data science is a very bigfield that encompasses several smallerdisciplines such as statistics,mathematics, data visualization,programming, data mining and machine learning. So as you can see data mining is asubset of data science and it refers tothe specific process of making use ofthe data in order to build a predictionmodel and extracting knowledge from thedata, while machine learning refers tothe learning algorithms that are used tocreate the prediction models inside thedata mining process. So there you have ita very brief introduction to datascience.Now comes the fun part, let's get startedin building our very first predictionmodel!WEKA is a program for performing datamining. It has an intuitive graphicaluser interface that allows you topre-process, transform the data as wellas construct the prediction model usinga variety of machine learning algorithmsand it was created by two developers IanWitten and Eibe Frank from the Universityof Waikato. So let's begin by firstinstalling WEKA onto your computer. Sowhat you need to do is go toGoogle and then search for WEKA andthen click on the first link. So noticethat the URL is coming from theUniversity of Waikato. So click on thelink. So it's the page that was open acouple of seconds ago. So let's getstarted by downloading the program. Soclick on the download button and thenscroll down, you'll notice thatthey're going to have several versionshere. Snapshot is when they have a, like abeta version, which is not stable yet, butwhat you want is the stable versionright here or they also have thedeveloper version where they alsoprovide new features, which are not yetstable but are included for your usagehere. If you're into the latest featureyou might want to try this one. But ifyou're starting out, I would recommendusing the stable version. So it has (support) formany platform: on the Windows platformfor the Mac platform and also for Linuxplatform as well. So before you begin youwill have to select one of the fourlinks right here. So what are they? Well,the first link is the WEKA program, righthere, version 3.8.3 and it also comes with a JavaVirtual Environment as you can see fromthe final name, for the 64-bit version.However, the second file is the WEKAprogram alone as you can see here by thename of WEKA and the version number3.8.3 and then x64 wouldmean it is built for the 64-bit versionof Windows but it does not come with theJava Virtual Machine so therefore youdon't see the JREin the file name. And the third fileis similar to the first file in which ithas the WEKA program along with theJava Virtual Machine but it is built forthe 32-bit version of your Windows. Andthe fourth file is the WEKA programbuilt for the 32-bit version. So if youare wondering which version should yougo with? Well, let's check out what is theversion of your computer's (Java)? whether it is 64or 32 bit, Oh it's right hereProperties and then notice the 64-bitversion right here. So this computer has64-bit, so I'm going to go forthe 64-bit version, however I will haveto identify whether I want to have Javaor without Java. So in order to do thatlet's check whether my computer has Javaor not and you can do the same by goingto the search icon, type in CMD and clickon the command prompt andthen you will see this command promptwindow coming up, type in Java and if itsays that Java is not recognized, then itmeans that your computer does not haveJava installed. So let's go with thefirst file which has Java prepackagedalong with the WEKA software. So let'sclick on here and that will take you tothe download link. Wait a bit, okay and then your download have started so it's a115 megabytes so thatshould take you a little while, okay so theinternet speed is going up and we are acouple of seconds away from downloadingthe program. Okay so it's finished andlet's install. So click on theinstallation file and it will askwhether you want to allow this programto make changes to your device. So I'llclick on Yes and then the next step ispretty easy and straightforward. So clickon the Next buttonand we are close to completion and nowit's going to install the Java VirtualMachine. So click on the installbutton, click on okay, wait some more.Okay so we're almost there, okay so Java hassuccessfully been installed and I willclick on the Close button and then WEKAsay it is completed, so once it'scompleted, we'll click on the Nextbutton and then it has to tick for us tostart WEKA and click on Finished.So, until next time, I'm Chanin Nantasenamaton the Data Professor channel andif you haven't subscribed yet, pleaseconsider subscribing and clicking on thenotification bell so that you will benotified on the next video. So, I'll seeyou in the next one!\n"