**Welcome to Sirajology: Training a Neural Network to Compose Music**
In this episode, we're going to train a neural network to compose music all by itself. Machine-generated music! The technical term for this is 'music language modeling' and it has a long history of research behind it. Markov Models and Restricted Boltzmann Machines are some of the techniques used in this field. As Siraj said, "Music is how we communicate our emotions and passions, and its completely based on mathematical relationships."
At the lowest level, music is a series of sound waves that create pockets of air pressure, and the pitch we hear depend on the frequency of changes in this air pressure. We've created annotation to help us map these sounds into an instruction set. So, if machine learning is all about feeding data into models to find patterns and make predictions, could we use it to generate music all by itself? Absolutely!
We're going to build an app that learns how to compose British folk music by training on a dataset of British folk music. We'll be using TensorFlow, the sickest machine learning library ever, to do this in just 10 lines of Python. We'll be following the tried-and-true 4-step machine learning methodology to do this: Collect a dataset, build the model, train the model, and test the model.
**Collecting Our Dataset**
To start off, we want to collect our dataset. So let's import the urllib module, which will let us download a file from the web. Once we import it, we can call the URL retrievemethod to do just that. We'll set the parameters to the link to the dataset and the name we'll call the downloaded file. We're using the Nottingham dataset for this demo, which is a collection of 1000 British folk songs in MIDI format. MIDI format is perfect for us since it encodes all the note and time information exactly how it would be written in music annotation.
It comes in a zip file, so we'll want to unzip it as well. We can do this programmatically using the zipfile module. We'll extract the data from the zip and place it in the data directory. We've got our data, it's time to create the model. But before we do that, we need to think about how we want to represent our input data.
**Representing Our Input Data**
There are 88 possible pitches in a MIDI file so we could do one vector representation per note. But let's be more specific. At each time step in the music, there are two things happening. There's the main tune or melody and then there are the supporting notes or harmony. Let's represent each as a vector. And to make things easier, we'll make two assumptions. The first is that the melody is monophonic. That means only one note is played at each time stamp. The second is that the harmony at each stamp can be classified into a chord class. So, that's two different vectors, one for melody and one for harmony.
We'll then combine them into one vector for each stamp. Music plays out over a period of time, it's a sequence of notes. So we need to use a sequence learning model - it has to accept a sequence of notes as an input and output a new sequence of notes. Plain old neural nets can't do this. They accept fixed sized inputs like an image or a number. We'll need a special kind of neural network, a recurrent neural network.
**Recurrent Neural Networks**
Yeah! Those are the ones we're going to use. A Recurrent Neural Network is designed to handle sequential data and keep track of context over time. It has a feedback loop that allows it to maintain a state over time, which is perfect for our music composition task. We'll also need hyperparameters to train our model.
**Hyperparameters**
These are the parameters that we humans set for how our model operates, like knobs on a control panel. How many layers do we want? How many iterations for training? How many neurons? You could play around with these, turning all the knobs in different ways to perfect your end-result, but chances someone somewhere has solved the problem you're working and and you can just use an existing model with pre-tuned hyperparameters to build something awesome.
**Training Our Model**
Now that we have our model, we can go ahead and train it. We can just call the train_model method of our recurrent neural net class to do this. This'll get the network to start collecting the input data piece by piece. It took me about 2 hours to train it on my 2013 MacBook Pro. But you don't have to wait until it's completely done training to test it out. Just wait until you see the \\\
"WEBVTTKind: captionsLanguage: enI actually didn’t play anything. You justheard AI generated music.Hello World, welcome to Sirajology! In thisepisode, we’re going to train a neural networkto compose music all by itself. Machine generatedmusic! the technical term for this is ‘musiclanguage modeling’ and it has a long historyof research behind it. Markov Models and restrictedbolztman machines. Which kind of sounds likesomething out of half life or bioshock.Hold on babe. I’ve got to go save the worldusing my restricted boltzman machine.Music is how we communicate our emotions andpassions and its completely based on mathematicalrelationships. Octaves, Chords, Scales, keys,all of it is math. At the lowest level, musicis a series of sound waves that create pocketsof air pressure, and the pitch we hear dependson frequency of changes in this air pressure.We’vecreated annotation to help us map these soundsinto an instruction set. So if machine learningis all about feeding data into models to findpatterns and make predictions, could we useit to generate music all by itself? absofruitly.We’re going to be build an app that learnshow to compose british folk music by trainingon a dataset of british folk music. We’llbe using Tensorflow, the sickest machine learninglibrary ever, to do this in just 10 linesof Python. We’ll be following the triedand true 4 step machine learning methodologyto do this. Collect a dataset, build the model,train the model, and test the model. To startoff, we’ll want to collect our dataset.So let’s import the urllib module, whichwill let us download a file from the web.Once we import it we can call the URLretrievemethod to do just that. We’ll set the parametersto the link to the dataset and the name we’llcall the downloaded file. We’re using thenottingham dataset for this demo, which isa collection of 1000 british folk songs inMIDI format. MIDI format is perfect for ussince it encodes all the note and time informationexactly how it would be written in music annotation.It comes in a zip file, so we’ll want tounzip it as well. We can do this programmaticallyusing the zipfile module. We’ll extractthe data from the zip and place it in thedata directory.We’ve got our data, it’s time to createthe model. But before we do that, we needto think about how we want to represent ourinput data.There are 88 possible pitches ina MIDI file so we could do one vector representationper note. But lets be more specific. At eachtime step in the music, there are two thingshapppening. Theres the main tune or melodyand then there are the supporting notes orharmony. Let’s represent each as a vector.And to make things easier we’ll make twoassumptions. The first is that the melodyis monophonic. That means only one note isplayed at each time stamp. The second is thatthe harmony at each stamp can be classifiedinto a chord class. So thats two differentvectors one for melody and one for harmony.We’ll then combine them into one vectorfor each stamp.We can just import our ML helper class andthen call the create model method to do this.Music plays out over a period of time, itsa sequence of notes. So we need to use a sequencelearning model - it has to accept a sequenceof notes as an input and output a new sequenceof notes. Plain old neural nets can’t dothis. They accept fixed sized inputs likean image or a number. We’ll need a specialkind of neural network, a recurrent neuralnetwork. Yeah! Those can deal with sequencessince data doesn’t just flow one way, itloops. This allows the network to have a kindof short term memory. Yeah, that’ll work.But wait. We want our network to not justremember the most recent music its heard,but all the music its heard. Like, a pieceof music can have multiple themes in differentparts of it (hopeful, melancholic, angry)and if the network only remembers the mostrecent part which was cheery, it’s justgoing to compose cheery stuff. We need a specialtype of recurrent neural network called aLong short term memory network. Super specifici know. This type of network has a short termmemory that is LONG, like it can rememberthings from way back in the sequence of data,and it uses everything it remembers to generatenew sequences.We can add this model in our code with justone line using our helper class. It’ll generatethe sequences and chord mapping file to afile called ‘’ in the data folder. Thisis just a serialized byte stream representationof our music that we’re going to train ourmodel with it. Now that we have our modelwe can go ahead and train it. You might bethinking wait this is a little too easy, isn’tthere more to it? Well yeah, every machinelearning model has a set of what are called‘hyperparameters’. These are the parametersthat we humans set for how our model operates,like knobs on a control panel. How many layersdo we want? how many iterations for training?How many neurons? You could play around withthese, turning all the knobs in differentways to perfect your end-result, but chancessomeone somewhere has solved the problem you’reworking and and you can just use an existingmodel with pre-tuned hyperparameters to buildsomething awesomeSo now we’re ready to train our model. Wecan just call the train_model method of ourrecurrent neural net class to do this. This’llget the the network to start collecting theinput data piece by piece. It took me about2 hours to train it on my 2013 macbook pro.But you don’t have to wait until its competelydone training to test it out. Just wait untilyou see the \"Best loss so far encountered,saving model.” message. Once you see thatyou can type ‘rnn_sample’ into terminalwith the flag —config file and point itto the newly generated config file in yourmodels folder. That will generate a new songusing the newly trained model you’ve justcreated. To generate music we just samplethe melody and harmony at each time step andplug it into our trained model. The modelwill then predict what the next notes willbe. The collection of all the predicted notesis our newly generated song.Let’s listen in to what I’ve generated.So it sounds nice, it could better but itgives off that british folk vibe. There aredefinitely some improvements that could bemade. The time signature is kinda sporadicand in terms of long-term structure, thereseems to be a lack of repeated themes andphrases. The solution may well be more dataand more computing power. It usually is whenit comes to machine learning with deep neuralnets. Machine learning can help us learn thefundamental nature of how music works in waysthat we haven’t even thought about. I’vegot links below, check em out. and I’vegotta go fix a runtime error so thanks forwatching\n"