Natural Language Processing with TensorFlow 2 - Beginner's Course

The Prediction Problem and Generating Text: A Deep Dive into AI Model Development

To handle the complex task of predicting the prediction problem and generating text, we need to define a generate text model and start with a starting string. This requires passing in the model we want to use for generation, as well as a prompt for the AI, commonly referred to as "input." The number of characters we want to generate is specified by the `generate` parameter, set to 1000 in this case. We also need to specify an integer representation of our starting string and expand it along the batch dimension.

To achieve this, we need to define an empty list to keep track of our generated text and a temperature value. The temperature controls the surprising factor of the generated text, with lower values resulting in more predictable and reasonable output, while higher values yield more unpredictable and potentially "crazy" results. To reset the model's state, we say `or i let's scroll down i in range num generates`. This step is crucial for obtaining accurate predictions.

Next, we define the input evaluation step as follows: `predictions = model(input_eval)`. We also need to calculate the predicted ID of the word returned by the model using `tf.squeeze` and then apply a random categorical distribution with `num_samples = 1 - 1 / num_pi`. This ensures that our generated text is probabilistically distributed according to the model's predictions.

We continue by defining another key step in generating text: appending the predicted ID to the input string. This process is repeated until we have generated the desired number of characters. Finally, we return the starting string and the generated text.

To demonstrate the effectiveness of our generate text model, let's try it out on a Shakespearean quote from Romeo and Juliet.

The Moment of Truth: Putting the Model to the Test

We load the model into our Python environment and specify the `generate` parameter as 1000. We also define an input string that serves as a prompt for our AI model.

```python

import tensorflow as tf

from tensorflow import keras

model = keras.models.load_model('path_to_your_model')

```

Next, we specify the number of characters we want to generate using `generate` and pass in the input string as required. We also set up an empty list to store our generated text and a temperature value.

```python

num_generations = 1000

input_string = "romeo colon give it a space"

temp = 1

generated_text = []

```

We then enter the loop where we repeatedly append the predicted ID of the word returned by the model to our input string. This process continues until we have generated the desired number of characters.

```python

for i in range(num_generations):

predictions = model(input_string)

predicted_id = tf.squeeze(predictions, axis=1)

num_samples = 1 - 1 / num_pi

predicted_id *= num_samples

generated_text.append(predicted_id)

input_string += str(predicted_id)

```

After completing the loop, we return the starting string and our generated text.

```python

print("Generate Text Model:")

generate_text(model, input_string)

```

The output of this code is surprising. It generates a coherent piece of Shakespearean-style text that resembles a real quote from Romeo and Juliet. This demonstrates the power of deep learning models in generating human-like text.

Conclusion

In conclusion, our experiment shows that it's possible to develop an AI model that can generate coherent and even impressive pieces of text within a short period of time. While this doesn't surpass what humans can do, it does demonstrate the capabilities of machine learning algorithms in language generation.

Furthermore, we've seen how our model starts with no information about the English language whatsoever and gradually generates something resembling real text within minutes. This highlights the potential applications of these models in various fields such as writing assistance, translation, or even content creation.

As for future directions, exploring more advanced architectures like transformer networks could lead to even more sophisticated results. Moreover, training these models on larger datasets could result in significant improvements over current performance.

In summary, our experiment showcases the capabilities of AI models in generating human-like text and highlights the importance of ongoing research into deep learning algorithms for natural language processing applications.

"WEBVTTKind: captionsLanguage: enwelcome free code campers to a practical introduction to natural language processing with tensorflow 2. i am your host dr phil tabor in 2012 i got my phd in experimental condensed matter physics and went to work for intel corporation as a back-end dry edge process engineer i left there in 2015 to pursue my own interests and have been studying artificial intelligence and deep learning ever since if you're unfamiliar with natural language processing it is the application of deep neural networks to text processing it allows us to do things such as text generation you may have heard the hubbub in recent months over the open ai gpt 2 algorithm that allowed them to produce fake news it also allows us to do things like sentiment classification as well as something more mathematical which is representing strings of characters words as mathematical constructs that allow us to determine relationships between those words but more on that in the videos it would be most helpful if you have some background in deep learning if you know something about deep neural networks but it's not really required we're going to walk through everything in the tutorial so you'll be able to go from start to finish without any prior knowledge although of course it would be helpful if you'd like to see more deep learning reinforcement learning and natural language processing content check me out here on youtube at machine learning with phil i hope to see you there and i really hope you enjoy the video let's get to it in this tutorial you are gonna learn how to do word embeddings with tensorflow 2.0 if you don't know what that means don't worry i'm gonna explain what it is and why it's important as we go along let's get started before we begin with our imports a couple of housekeeping items first of all i am basically working through the tensorflow tutorial from their website so i'm going to link that in the description so i'm not claiming this code is my own although i do some cleaning up at the end to kind of make it my own but in general it's not really my code so we start with our imports as usual we need i o to handle dumping the word embeddings to a file so that we can visualize later we'll need matplotlibe to handle plotting we will need tensorflow as tf and just a word so this is tensorflow 2.1.0 rsc1 release candidate 1. so this is as far as i'm aware of the latest build so tensorflow 2.0 throws some really weird warnings and 2.1 seems to deal with that so i've upgraded so if you're running tensorflow 2.0 and you get funny errors uh sorry funny warnings but you still get functional code and learning that is why you want to update to the newest version of tensorflow of course we need kiros to handle pretty much everything we also need the layers for our embedding and dense layers and we're also going to use the tensorflow data sets so i'm not going to have you download your own data set we're going to use the imdb movie data set for this particular tutorial so of course that is an additional dependency for this tutorial so now that we've handled our imports let's talk a little bit about what word embeddings are so how can you represent a word for a machine and more importantly instead of a string of characters how can you represent a collection of words a bag of words if you will so you have a number of options one way is to take the entire set of all the words that you have in your say movie reviews you know you just take all the words and find all the unique words and that becomes your dictionary and you can represent that as a one-hot encoding so if you have let's say ten thousand words then you would have a vector for each word with ten thousand elements which are predominantly zeros except for the one corresponding to whichever word it is the problem with this encoding is that while it does work it is incredibly inefficient and it because it is sparse you know the majority of the data is zero and the only one important bit in the whole thing so not very efficient and another option is to do integer encoding so you can just rank order the numbers uh sorry the words you could do it in alphabetical order the order doesn't really matter you can just assign a number to each unique word and then every time that word appears in a review you would have that integer in an array so you end up with a set of variable length arrays where the length of the array corresponds to the number of words in the review and the members of the array correspond to the words that appear within that review now this works this is far more efficient but it's still not quite ideal right so it doesn't tell you anything about the relationships between the words so if you think of the word let's say king it has a number of connotations right a king is a man for one so there is some relationship between a king and a man a king has power right he has control over a domain a kingdom so there is also the connotation of owning land and having control over that land uh king may also have a queen so it has some sort of relationship to a queen as well i may have a prince a princess you know all these kinds of different relationships between words that are not incorporated into the uh integer encoding of our dictionary the reason is that the integer encoding of our dictionary forms a basis in some higher dimensional space but all those vectors are orthogonal so if you take their dot product they are essentially at right angles to each other in a hybrid dimensional space and so their dot product is zero so there's no projection of one vector one word onto another there's no overlap in the meaning between the words at least in this higher dimensional space now word embeddings fix this problem by keeping the integer encoding but then doing a transformation to a totally different space so we introduce a new space of a vector of some arbitrary length it's a hyper parameter of your model much like the number of neurons in a dense layer as a hyper parameter of your model the length of the embedding layer is a hyperparameter and we'll just say it's eight so the word king then has eight floating point elements that describe its relationship to all the other vectors in that space and so what that allows you to do is to take dot products between two arbitrary words in your dictionary and you get non-zero components and so that what that means in practical terms is that you get a sort of semantic relationship between words that emerges as a consequence of training your model so the way it works in practice is we're going to have a whole bunch of reviews from the imdb data set and they will have some classification as a good or bad review so for instance you know uh for the star wars last jedi movie i don't think it's in the in there but you know my review would be that it was terrible awful no good totally ruined luke luke's character and so you would see and i'm not alone in that so if you uh did a huge number of reviews for the last jedi you would see a strong correlation of words such as horrible bad wooden characters mary sue things like that and so the model would then take those words run them through the embedding layer and try to come up with a prediction for whether or not that is a good or bad review and match it up to the training label and then do back propagation to vary those weights in that embedding layer so let's say eight elements and by training over the data set multiple times you can refine these weights such that you are able to predict whether or not a review is positive or negative about a particular movie but also it shows you the relationship between the words because the model learns the correlations between words within reviews that give it either a positive or negative context so that is word embeddings in a nutshell and we're going to go ahead and get started coding that so the first thing we're going to have is a an embedding layer and this is just going to be for illustration purposes and that'll be layers dot embedding and let's say there's a thousand and five elements so we'll say result equals embedding layer tf constant one two three so then let's print the result uh dot numpy okay so let's head to the terminal and execute this and see precisely what we get actually let's do this to print result dot numpy.shape i think that should work let's see what we get in the terminal and let's head to the terminal now all right let's give it a try okay so what's important here is you see that you get an array of three elements right because we did the tf constant of one two and three and you see we have five elements because we have broken the integers into some components in that five element space okay so and it has shape three by five which you would expect because you're passing on three elements and each of these three elements these three integers correspond to a word of an embedding layer of five elements okay that's relatively clear let's go back to the code editor and see what else we can build with this okay so let's go ahead and just kind of comment out all this stuff because we don't need it anymore so now let's get to the business of actually loading our data set and doing interesting things with it so we want to use the data set load function so we'll say train data test data and some info tfts.load imdb reviews express subwords 8 okay and then we will define a split and that is tfds.split.train tfts.split.test and we will have a couple other parameters with info equals true that incorporates information about the um about the data sets and as supervised equals true so as supervised tells the data set loader that we want to get back information in the form of data and label as a tuple so we have the labels for training of our data so now we're going to need an encoder so we'll say info.features text dot encoder and so let's just um find out what words we have in our dictionary from this we'll say print encoder sub words first 20 elements save that and head back to the terminal and print it out and see what we can see so let's run that again and you it's hard to see let me move my face over for a moment and you can see that we get a list of words the underscore so the underscore corresponds to space you get commas periods a underscore and underscore of so you have a whole bunch of words with underscores that indicate that they are spaces okay so this is kind of the makings of a dictionary so let's head back to the code editor and continue building on this so we no longer need that print statement now the next problem we have to deal with is the fact that these reviews are all different lengths right so we don't have an identical length for each of the reviews and so when we load up elements into a matrix let's say they're going to have different lengths and that is kind of problematic so the way we deal with that is by adding padding so we find the length of the longest review and then for every review that is short in that we append a bunch of zeros to the end uh in our bag of words so a list of words you know the list of integers we will append a bunch of zeros at the end so zero isn't a word uh it doesn't correspond to anything and the word start with one the rank ordinal numbers start with one and so we insert a zero because it doesn't correspond to anything it won't hurt the training of our model so we need something called padded shapes and that has this shape so batch size and an empty list an empty tuple there so now that we have our padded shapes we're ready to go ahead and get our training and test batches so let's do that and since we're a good data scientist we want to do a shuffle we're going to use a batch size of 10 and a padded shapes specified by what we just defined let's clean that up and let's copy because the train the test batches are pretty much identical except it's testdata.shuffle and it's the same size so we don't have to do any changes there scroll down so you can see okay so that gives us our data so what we need next after the data is an actual model so let's go ahead and define a model so in as is typical for keras it is a sequential model and that takes a list of layers so the first layer is an embedding layer and that takes encoder.vocab size now this is you know given to us up here by the encoder object that's given by the information from our data set and we have some vocabulary size so there's ten thousand words vocabulary size is vocab size it's just the size of our dictionary and we want to define an embedding dim so that's the number of dimensions for our embedding layer so we'll call it something like 16 to start so let's add another layer global gobble global average pooling 1d and then we'll need a finally a dense layer one output activation equals sigmoid so if this seems mysterious what this is is the probability that a mapping of sorry this layer is the probability that the review is positive so it's a sigmoid go ahead and get rid of that and now we want to compile our model with the atom optimizer a binary cross entropy loss with accuracy metrics not meterics metrics equals accuracy okay that's our model and that is all we need for that so now we are ready to think about training it so let's go ahead and do that next so what we want to do is train and dump the history of our training in an object called that we're going to call history model.fit we're going to pass train batches 10 epochs and we're going to need validation data and that'll be test batches and we'll use something like 20 validation steps okay so let's scroll down a little bit so you can see it first of all and then we're going to think about once it's done let's go ahead and plot it so let's may as well do that now so let's handle that so we want to convert our history to a dictionary and that's history.history and we want to get the accuracy by taking the accuracy key and we want the validation accuracy uh using correct syntax of course val accuracy for validation accuracy and the number of epochs is just range one two line of accuracy plus one so then we want to do a plot big size nice and large twelve by nine uh we want to plot the epochs versus the accuracy b0 label equals training accuracy we want to plot the validation accuracy using just a blue line not blue o's or dots blue dot sorry and label equals validation accuracy uh plot.x label epochs plot dot y label accuracy and let's go ahead and add a title while we're at it trading and validation accuracy scroll down a little bit we will include a legend having an extraordinarily difficult time typing tonight location equals lower right and a y limit of zero point five and one that should be a tuple excuse me and plot dot show all right so let's go ahead and head to the terminal and run this and see what the plot looks like and we are back let me move my ugly mug over so we can see a little bit more and let us run the software and see what we get okay so it has started training and it takes around 10 to 11 seconds per epoch so i'm just going to sit here and twiddle my thumbs for a minute and fast forward the video while we wait so of course once it finished running i realize i have a typo and that is typical so in line 46 it is p it is i spelled out plot instead of plt but that's all right let's take a look at the data we get in the terminal anyway so you can see that the validation accuracy is around 92.5 pretty good and the training accuracy is around 93.82 so a little bit of overtraining and i've run this a bunch of times and you tend to get a little bit more over training i'm kind of surprised that this final now that i'm running over youtube it is actually a little bit less overtraining uh but either way there are some evidence over training but a 90 accuracy for such a simple model isn't entirely hateful so i'm going to go ahead and head back and correct that typo and then run it again and then show you the plot so it is here in line 46 right there and just make sure that nothing else looks wonky and i believe it is all good there looking at my cheat sheet uh everything looks fine okay let's go back to the terminal and try it again all right once more all right so it has finished and you can see that this time the validation accuracy was around 89.5 percent whereas the training accuracy was 93.85 so it is a little bit over trainee in this particular run and there is significant run to run variation as you might expect so let's take a look at the plot all right so i've stuck my ugly mug right here in the middle so you can see that the training accuracy goes up over time as we would expect and the validation accuracy generally does that but kind of tops out about halfway through the number of epochs so this is clearly working and this is actually pretty cool with such a simple model we can get some decent uh review or sentiment as it were classification but we can do one more neat thing and that is to actually visualize the relationships between the words that are embedding learns so let's head back to the code editor and then let's write some code to tackle that task okay so before we do that you know i want to clean up the code first let's go ahead and do that so i will leave in all that commented stuff but let's define a few functions we'll need a function to get our data we'll need a function to get our model and we'll need a function to plot data and we'll need a function to get our embeddings we'll say retrieve embeddings and i'll fill in the parameters for those as we go along so let's take this stuff from our get our data cut that paste it and of course use proper indentation because python is a little bit particular about that okay make sure everything lines up nicely and then of course we have to return the stuff that we are interested in so we want to return train data test data and in fact that's not actually what we want to do i take it back let's come down here and uh we want our uh sorry we don't actually want to return our data we want to turn our batches so return train batches test batches and we'll also need our encoder for the visualizing the relationship relationships between words so let's return that now okay now uh let's handle the function for the get model next so let's come down here and grab this actually let's yeah grab all of it and come here and do that and let's make embedding dim a parameter of our model and you notice in our model we need the encoder so we also have to pass in the encoder as well as an embedding dim and then at the bottom of the function we want to return that model pretty straightforward so then let's handle the plot data next so we have all of this grab that and indent here so we're going to need a history and uh that looks like all we need because we define epochs accuracy and validation accuracy okay so it looks like all we need in the plot data function so then we have to write our retrieve embeddings function but first let's handle all the other stuff we'll say train batches test batches and encoder equals get data in fact let's rename that to get batch data to be more specific this is kind of being pedantic but you always want to be as descriptive as possible with your naming conventions so that way people can read the code and know precisely what it does without having to you know make any guesses so if i just say get data it isn't necessarily clear that i'm getting batches out of that data you know i could just be getting single instances it could return a generator it is a little bit ambiguous so changing the function name to get batch data is the appropriate thing to do so then we'll say model equals get model and we pass it the encoder and then the history will work as intended and then we call our function to plot the history and that should work as intended as well and now we are ready to tackle the retrieve embeddings function so that is relatively straightforward so what we want to do is we want to pass in the model and the encoder and we don't want to pass what we want to do is we want to the purpose of this function is to take our encoder and dump it to a tsv file that we can load into a visualizer in the browser to visualize the principle component analysis of our word encodings so we need files to write to and we need to enumerate over the sub words in our encoder and write the metadata as well as the vectors for our encodings so outvectors io.open vex dot tsv and in write mode and encoding of utf-8 we need out metadata and that's similar meta.tsv write encoding equals utf-8 very similar now we need to iterate over our encoder sub words and get the vectors out of that to dump to our vector file as well as the metadata weight sub num plus one and so we have the plus one here because remember that uh we start from one because zero is for our uh padding right zero doesn't correspond to a word so the words start from one and go on so we want to write the word plus a new line and for the vectors i'm going to write a tab delimited string x in vector and plus a new line character at the end and then we want to close our files okay so then we just scroll down and call our function retrieve embeddings model and encoder okay so assuming i haven't made any typos this should actually work so i'm going to go ahead and head back to the terminal and try it again all right moment of truth so it is training so i didn't make any mistakes up until that point uh one second we'll see if it actually makes it through the plot oh but really quick so if you run this with tensorflow 2 let me move my face out of the way if you run this with tensorflow 2 you will get this out of range end of sequence error and if you do google if you do a google search for that you will see a thread about it in the github and basically someone says that it is fixed in 2.1.0.rc1 the version of tensorflow which i am running however i still get the warning on the first run in version 2.0.0 i get the warning on every epoch so it kind of clutters up the terminal output but it still runs nonetheless and gets comparable accuracy so it doesn't seem to affect the model performance but it you know makes for an ugly youtube video and gives me an easy feeling so i went ahead and updated to the latest release candidate 2.1.0 and you can see that it works relatively well so one second and we'll see the plot again and of course i made a mistake again it's plot history not uh it's plot data not plot history let's fix that all right uh plot let's change this to plot history because that is more precise and we will try it again let's do it all right so it has finished and you can see that the story is much the same a little bit of overtraining on the training data let's take a look at the plot and the plot is totally consistent with what we got the last time you know an increasing training accuracy and a leveling off of validation accuracy so let's go ahead and check out how these word embeddings look in the browser but first of course i made a mistake so weights are not defined and that is because i didn't define them so let's go back to the code editor and do that all right so what we want to do is this weights equal model dot layers subzero dot get weights so this will give us the actual weights from our model which is the uh the zeroth layer is the embedding layer and we want to get the weights and the zeroth element of that so i'm going to go ahead and head back the terminal and i'm going to actually get rid of the plot here because we know that works and i'm sick of seeing it so we will just do the model fitting and retrieve the embedding so let's do that now it's one of the downsides of doing code live is i make all kinds of silly mistakes while talking and typing but that's life see in a minute all right so that finished running let's head to the browser and take a look at what it looks like okay so can i zoom in i can a little bit so let's take a look at this so to get this you go to load over here on the left side you can't really see my cursor but you go to load on the left side load your vector and metadata files and then you want to click on this 3d labels mode here and let's take a look at this so you see right here on the left side annexed seated and ottoman so these would make sense to be you know pretty close together because they you know kind of would you would expect those three words to be together right annexed and seated if you annex something someone else has to seed it it makes sense let's kind of move around a little bit see what else we can find okay so this looks like a good one we see waterways navigable humid rainfall petroleum earthquake so you can see there are some pretty good relationships here between the words that all makes sense uh if you scroll over here what's interesting is you see estonia herzegovina slovakia sorry for mispronouncing that cyprus you see a bunch of country names so it seems to learn the names and it seems to learn that there are relationships between different geographic regions in this case countries there we see seated and annexed on ottoman again and you can even see concord in here next to annexed and seated deposed arc bishop bishop assassinated oh you can't see that let me move my face there just moved me over so now you can see surrendered conquered spain right spain was conquered for a time by the moors archbishop deposed surrendered assassinated invaded you can see all kinds of cool stuff here so this is what it looks like i've seen other words like beautiful wonderful together other stuff so if you play around with this you'll see all sorts of uh interesting relationships between words and this is just the visual representation of what the word embeddings look like in a reduced dimensional representation of its higher dimensional space so i hope that has been helpful i thought this was a really cool project just a few dozen lines of code and you get uh to something that is actually a really neat uh kind of a neat result where you have um a higher dimensional space that gives you mathematic relationships between words and it does a pretty good job of learning the relationships between those words now what's interesting is i wonder how well this could be generalized to other stuff so if we feed it you know say twitter twitter tweets could we get the sentiment out of that i'm not entirely sure that's something we would have to play around with uh it seems like he would be able to so long as there is significant overlap in the dictionaries between the words that we have for the imdb reviews and the dictionary of words from the twitter feeds that we scrape but that would be an interesting application of this to kind of find toxic twitter comments uh and the like but i hope this was helpful just a reminder my new course is on sale for 9.99 for the next five days there will be one more sale last several days of the year but there will be a gap several days in between this channel totally supported by ad revenue as well as my course sales so if you want to support the cause go ahead and click the link in the pinned comment slash description and if not hey go ahead and share this because that is totally free and i like that just as well leave a comment down below hit the subscribe button if you haven't already hit the bell icon to get notified when i release new content and i will see you in the next video in this tutorial you are going to learn how to do sentiment classification with tensorflow 2.0 let's get started before we begin a couple of notes first of all it would be very helpful if you have already seen my previous video on doing word embeddings in tensorflow 2.0 because we're going to be borrowing heavily from the concepts i presented in that video if not it's not a huge deal i'll show you everything we need to do as we go along it's just it'll make more sense with that sort of background second point is that i am working through the official tensorflow tutorials this isn't my code i did have to fix a couple of bugs in the code so i guess that makes it mine to some extent but unless i did not write this so i'm just presenting it for your consumption in video format all that said let's go ahead and get to coding our sentiment analysis software so as usual we begin with our imports we will need the tensorflow datasets to handle the data from the imdb library of course you need tensorflow to handle tensorflow type operations so the first thing we want to do is to load our data set and get our training and testing data from that as well as our encoder which i explained in the previous video so let's start there data set and info is load of the imdb reviews uh help if i spelled it correctly subwords 8k now just a word these are the reviews uh a bunch of reviews from the imdb data set so you have a review with an associated classification of either positive or negative with info equals true as supervised equals true let's tab that over next we will need our training and testing data sets set equals data set subtrain and data set sub test and finally we need our encoder dot encoder good grief i can type cannot type tonight at all so if you don't know what an encoder is the basic idea is that it is a sort of reduced dimensional representation of a set of words so you take a word and it associates that with an n-dimensional vector that has components that will be non-perpendicular to other words in your dictionary so what that means is that you can express words in terms of each other whereas if you set each word in your dictionary to be a basis vector they're orthogonal and so there's no relationship between something like king and queen for instance whereas with the auto encoder representation uh whereas with the sorry the word embedding representation it is the it has a non-zero component of one vector along another so you have some relationship between words that allows you to parse meaning of your string of text and i give a better explanation in my previous video so check that out for your own education so we're gonna need a couple of global variables above our size 10 000 a batch size for training and some padded shapes and this is for padding so when you have a string of words the string of words uh could be different lengths so you have to pad to the length of the longest review basically and that is batch size by empty so the next thing we'll need is our actual data set we're going to shuffle it because we're a good data scientist and we're going to want to get a padded batch from that in the shape defined with the variable above and the test data set is very similar good grief so i i'm using vim for my new text editor part of my new year's resolution and um let's yank that and it is a little bit tricky if you've never used it before i'm still getting used to it there we go as you can see then we have to go back into insert mode test data set test data set dot padded batch and padded shapes all right that is good uh next thing we need is our model so the model is going to be a sequential keras model with a bi-directional layer as well as a couple of dense layers we're using a binary cross entropy loss with an atom optimizer learning rate of 10 by 1 by 10 to the minus 4. and then we will say tf keras dot layers embedding encoder.vocab size 64. tf keras layers bi-directional tf keras.layers.l 64. two parentheses dense and that is 64. with a rally value activation if i could ever learn to type properly that would be very helpful another dense layer with an output and this output is going to get a sigmoid activation and what this represents is the probability of the review being either positive or negative so the final output of the model is going to be a floating point number between zero and one and it will be the probability of it being a positive review and we're going to pass in a couple of dummy uh reviews uh just some kind of softball kind of stuff to see how well it does but before that we have to compile our model and with a binary cross entropy loss optimizer equals tf keras optimizers atom the learning rate 1 by 10 to the minus 4 and we want metrics accuracy and then we want the uh history which is just the model fit and this is really for uh plotting purposes but i'm not gonna do any plotting you get the idea that the you know the accuracy goes up over the time and and the uh loss goes down over time so no real need to plot that train data set we're just gonna do three epochs you can do more but for the purpose of the video i'm just gonna do three actually let's do five because i'll do five for the next model we're going to do validation data equals test data set and validation steps 30. so next we need to consider a couple of functions so one of them is to pad the uh the vectors that we pass in to whatever size and the second is to actually generate a prediction so let's define those functions and just to be clear this is for the sample text we're going to pass in because remember the reviews all are all of varying lengths and so we have to uh for purposes of the i guess you can say continuity of inputs to your model and not a really technical phrase but so that way you pass in the same length of vector to you know your model for the training we have to deal with the problem of the same problem with the sample text that we're going to pass in because we don't have an automated tensorflow function to handle it for us and we're going to pad it with zeros because those don't have any meaning in our dictionary and we want to return the vector after extending it so if you're not familiar with this idiom in python uh you can multiply a quantity like say a string by a number to basically multiply that string so if you had the letter a multiplied by 10 it would give you 10 a's and you can do that with you know list elements as well pretty cool stuff a neat little feature of python a little known i think but that's what we're doing here so we're going to uh going to pad the zeros to the size of whatever whatever size we want minus whatever the length of our vector is and extend that vector with those zeros next we need a sample predict function and the reason we can't just do model.predict is because we have the the issue of dealing with the padding text equals encoder.encode and remember the encoder is what goes from the uh string representation to the higher dimensional representation that allows you to make correlations between words so if you want to pad it then pad to size encoded sample thread text 64. that's our batch size or our max length sorry and then encoded sample thread text is tf cast flip 32 and predictions model dot predict if that expand dimensions encoded sample thread text zero batch dimension return predictions all right so now we have a model that we have trained once you run the code of course uh now let's come up with a couple of dummy simple very basic uh reviews to see how it scores them so we'll say sample text equals uh this movie was awesome the acting was incredible uh highly recommend then we're going to spell sample text correctly of course and then we're going to come up with our predictions equal sample predict sample text pad equals true and we're going to multiply that by 100 so we get it as a percentage and can i i can't quite scroll down that is a feature not a bug i am sure uh you can write in whatever positive review you want so then we'll say print uh probability this is a positive review predictions and i haven't done this before so when i coded this up the first time i have it executing twice once with pad equals false once with pad equals true to see the delta in the predictions and surprise surprise is more accurate when you give it a padded review but in this case i'm going to change it up on the fly and do a different set of sample text and give it a negative review and see how it does this movie was so so i don't know what this is going to do that's kind of a you know vernacular i don't know if that was in the database so we'll see the acting was mediocre kind of recommend and predictions sample predict sample text pad equals true times 100 and we can um yank the line and paste it all right okay so we're going to go ahead and save this and go back to the terminal and execute it and see how it does and then we're going to come back and write a slightly more complicated model to see how well that does to see if you know adding complexity to the model improves the accuracy of our predictions so let us write quit and if you've never used vim uh you have to press colon wq sorry when you're not in insert mode uh right quit to get out and then we're gonna go to the terminal and see how well it does all right so here we are in the terminal let's give it a shot and see how many typos i made ooh interesting so it says check that the data set name is spelled correctly that probably means i misspelled the name of the data set all right let me scroll up a little bit uh it's imdb reviews okay i am oh right there data set yeah you can't yeah right there okay so i misspelled the name of the data set not a problem vimtf sentiment let us go up to here i am db right quit and give it another shot i misspelled dense okay can you see that no not quite uh it says here let me move myself over has no attribute dense so let's fix that that's in line 24 line 24 insert an s quit and try again there now it is training for five epochs i am going to let this ride and show you the results when it is done really quick you can see that it gives this funny error let me go ahead and move my face out of the way now this i keep seeing in the tensorflow 2 stuff so uh as far as i can tell this is related to the version of tensorflow this isn't something i'm doing or you're doing there is an open issue on github and previously it would run that error every time i trained with every epoch however after updating do i think tensorflow 2.1 it only does it after the first one so i guess you gain a little bit there uh but it is definitely but it's definitely an issue with tensorflow so i'm not too worried about that so let's go ahead on this train all right so it has finished running and i have teleported to the top right so you can see the accuracy and you can see accuracy starts out low and ends up around 93.9 not too shabby for just five epochs on a very simple model likewise the loss starts relatively high and goes relatively low what's most interesting is that we do get a 79.8 percent probability that our first review was positive which it is so an 80 probability of it being correct is pretty good and then an only 41.93 percent probability the second being positive now this was a bit of a lukewarm review i said it was so so so a 40 probability of it being positive is pretty reasonable in my estimation so now let's see if we can make a more complex model and get better results so let's go back to the code and type that up so here we are let's scroll down and say let's make our new model so model you have to make sure you're in insert mode of course model equals tf keras sequential tf keras layers of course you need an embedding layer to start encoder.vocab size 64. let's move my mug like so and add our next layer which is keras layers bi-directional lstm 64 return true and i am way too far over 88 that is still well we're just going to have to live with it it's just going to be bad code not up to the pep 8 standards but whatever sumi bi-directional lstm 32 keras layers dot dense and 64 with a volume activation and to prevent overfitting we are going to add in a little bit of drop out just 0.5 so 50 percent and add our final classification layer with a sigmoid activation model do i have let me double check here looks like i forgot a parenthesis there we go good grief delete that line and make our new model model lock compile loss equals binary cross entropy optimizer equals atom same learning rate we don't want to change too many things at once that wouldn't be scientific accuracy history equals model.fit train data set data set not cert epochs equal 5 validation data set equals test data set 30 validation steps and we're just going to scroll up here and uh copy whoop copy all of this visual yank and come down and paste all right so ah what's so i'm detecting a problem here so i need to modify my sample predict problem uh my sample predict so let's go ahead and pass in a model uh call it model underscore just to be safe because i'm declaring one model and then another i want to make sure these scoping issues are not going to bite me in the rear end i need model equals model and let's do likewise here model eagles model and we'll come up here and modify it here as well just to be pedantic and i'm very tired so this is probably unnecessary but we want to make sure we aren't getting any funny scoping issues so that the model is doing precisely what we would expect so let's go ahead and write quit and try running it oh actually i take it back i want to go ahead and get rid of the fitting for this because we've already run it we can leave it actually you know what now that i'm thinking about it let's just do this and then we will comment this out all right and then we don't even need the the model equals model there but i'm going to leave it all right let's try it again let's see what we get so remember we had a uh 80 and 41 or 42 probability of it being positive so let's see what we get with the new model validation data set so i must have mistyped something so let's take a look here right there because it is validation data not validation data set all right try it again all right it is training i will let this run and show you the results when it finishes so of course after running it i realized i made a mistake in the uh and the declaration of the sample predict function typical typical unexpected keyword argument so let's come here and you know let's just get rid of it oh because it's model underscore um yeah let's get rid of it because we no longer need it and get rid of this typical typical all right this is one of the situations in which a jupiter notebook would be helpful but whatever i will stick to them and the terminal and pi files because i'm old all right let's try this again and i'll just go ahead and edit all this out and we will uh meet up when it finishes i've done it again oh it's not my day folks not my day and let us find that there delete once again all right so i finally fixed all the errors it is done training and we have our results so probability this is a positive review 86 percent a pretty good improvement over 80 what's even better is that the uh probability of the second review which was lukewarm so so being positive has fallen from 41 or 42 down to 20 22 almost cut in half so pretty good improvement with a they you know somewhat more complicated model and at the expense of slightly longer training so you know 87 seconds as opposed to 47 seconds so i know sometimes six minutes as opposed to three not too bad so anyway so what we've done here is loaded a series of imdb reviews used it to train a model to do sentiment prediction by looking at correlations between the words and the labels for either positive or negative sentiment and then asking the model to predict what the sentiment of a obviously positive and somewhat lukewarm review was and we get pretty good results in a very short amount of time that is the power of tensorflow 2.0 so i thank you for watching any questions comments leave them down below i try to answer all of them less so now that i have more subscribers more views it gets a little bit more overwhelming but i will do my best speaking of which hit the subscribe button hit the notification bell because i know only 14 of you are getting my notifications and look forward to seeing you in the next video where he sees your head my lovely we sleep her with my hate or for me think that we give his cruel he cries said your honors ear i shall gromas no i haven't just had a stroke don't call 9-1-1 i've just written a basic artificial intelligence to generate shakespearean text now we get to finally address the question which is better writing shakespearean sonnets a billion monkey hours or a poorly trained ai let's get started all right first before we begin with our imports a couple of administrative notes the first of which is that this is an official tensorflow tutorial i have not written this code myself and in fact it is quite well written as it is the first tutorial i haven't had to make any corrections or adjustments to so i will leave a link in the description for those that want to go into this in more detail on their own time so feel free to check that out when you have a moment available let's get started with our imports the first thing you want to import is os that will handle some operation os level type stuff we want tensorflow as tf of course and we want numpy as np now notably we are not importing the tensorflow data set imports because this is not using an official tensorflow data set rather it is using the data due to i believe andre carpathi gets a credit for this but it is basically a text representation of a shakespearean sonnet which one i don't know doesn't state in the tutorial and i am not well read enough to be able to identify it based on the first several characters i suppose if i printed out enough of the terminal i could figure it out based on who's in it but i don't know and it's not really all that important but what is important is that we have to download it using the built-in tensorflow keras utils and of course they have their own function to get a file and it's just a simple text file called shakespeare.txt and that lives at https storage googleapis.com shakespeare.txt okay and so let's get an idea for what we're working with here so let's open it up in uh read binary mode with an encoding of utf-8 and let's go ahead and print out the length of the text so we'll say length of text blank characters dot format when text and let's go ahead and print the first 250 characters to get an idea of what we are working with all right let's head to the terminal and test this out say python tf text gen dot pi object has no attribute decode so i have messed something up most likely a parenthesis somewhere text equals open path to file.read that's right i forgot the read method insert read dot d code there we go let's try that perfect so now we see that we do indeed have some text and it has uh one million one hundred fifteen thousand three hundred ninety four characters so a fairly lengthy work uh you know several hundred thousand words at least and you see it begins with first is it uh first citizen this is important because we're gonna refer back to this text a few different times in the tutorial so just keep in mind that the first word is first very very simple and hey if you know what uh play or sonnet this is leave a comment down below because you're you know more well-cultured more well-read than i am i would be interested to know but let's proceed with the tutorial let's head back to our file and the first thing we want to do is comment these out because we don't want to print that to the terminal every single time we run the code but the first thing we have to handle is vectorizing our text now if you have seen my other two tutorials on natural language processing and tensorflow you know that we have to go from a text based representation to an integer representation or in some cases yeah totally energy representation not floating point uh in order to pass this data into the deep neural network so let's go ahead and start with that so we say our vocabulary is going to be sorted a set of the text so we're just going to sort it and make a set of unique stuff so we'll say print or unique words rather blank unique characters format len of vocab so we now important thing to keep in mind is that we are starting with merely characters we are not starting with any conception of a word so the model is going to go from knowing nothing about language at all to understanding the concept of words as well as line breaks and a little bit about grammar you kind of saw from the introduction that it's not so great probably better than the monkeys typing away but it is you know starting from complete scratch into something that kind of approximates language processing so we have sorted our vocabulary now we have to go from the character space to the integer representation so we'll say care to idx where care is just you know character that's going to be dictionary of unique characters and their integer idx their integer encoding for idx unique and enumerate vocab closing bracket and we need the idx 2 care which is the inverse operation numpy array of vocab uh then we have something called text as int and that's a numpy array of a list comprehension of care to idx of care for care in text so we're just going to take all the characters in the text look up their idx representation and stick it into a vector numpy array in this case so now let's go ahead and print this stuff out to see what we're dealing with to see what our vocabulary looks like and we'll make something pretty looking we'll say 4 care blank and zip care to idx range 20. we're only going to look at the first 20 elements we don't need to print out the whole dictionary print 4s colon 3d dot format representation of the character here to idx care and then at the end we'll print a new line um actually let's do this too so we'll say print blank characters map to int you know how many characters will be mapped to int format representation of text just the first 13 uh text as int 13. tab that over and write this and run it so unexpected end of file while parsing okay so what that means is i have forgotten a parenthesis which is here perfect now we can write quit now let's give it a shot okay so you can see we have 65 unique characters so we have a dictionary of 65 characters and new line maps to zero space maps to one so basically it's the sort has placed all of the characters the non non-alphanumeric characters at the beginning and we even have some numbers in there uh curiously the number three maps to nine but whatever and then you see we have the capital letters and the lowercase letters will follow later and so our first sentence is first citizen first 13 characters rather and that maps to this following vector here so we have gone from this string to this vector representation so that is all well and good but that is just the first step in the process so the next step is handling what we call the prediction problem so the real goal here is to feed the model some string of text and then it outputs the most likely characters it thinks will follow based on what it reads in the shakespearean work and so we want to chunk up our data into sequences of length 100 and then go ahead and use that to create a data set and then from there we can create batches of data in other words chunks of sentences or chunks of whatever sequence length characters we want let's go ahead and go back to our vim editor and start there and so the first thing is we want to go ahead and comment all this out because we don't want to print everything every single time and then handle the problem of the sequence length so we'll say sequence length equals 100 characters something manageable you want something too small something too large so number of examples uh per epoch equals line of text divided by sequence a length plus one where does a plus one come from it comes from the fact that we're going to be feeding it a character and trying to predict the rest of the characters in the sequence so you have the plus one there next we have a care data set tf data data set of course it's tensorflow it has to deal with its own data sets it doesn't handle text files too well so we're going to go data set from tensor slices text as int let's go ahead and print out uh what we have here for i in care dataset dot take the first five elements and so this is just a sequence of individual characters so we should get the first five characters out print uh idx two care i dot numpy let's go ahead and go to the terminal and run this write quit and run it once more and you see we get the word first as one would expect that is the if we scroll up that is the uh first five characters first and then citizen okay so that seems to work so now let's handle batching the data so let's go back to our vim editor get rid of this print statement by inserting a couple of comments and worry about dealing with a batch so sequence says equals care data set dot batch sequence length plus one drop remainder equals true so we'll just get rid of the characters at the end for item can i scroll down at all does it let me do that no it does not one of the downsides of vim is an editor so for item in sequence is take five the first five sequences of 100 characters print representation blank dot join idx2 care item.numpy and a whole bunch of parentheses and let's go ahead and um go back to the terminal and see how this runs so let's run it and you see i really should put a new line in there at the beginning we can see first citizen before we proceed any further hear me speak blah blah blah so we get a bunch of uh character sequences including the new line characters so that is pretty helpful so uh one thing to note is that these new lines are what give the uh the deep neural network a sense of where line breaks occur so it knows that after some sequence of characters they should expect a line break because that formulates you know the kind of metered speaking that you find in shakespeare so that's well and good let's go ahead and handle the next problem of splitting our data into chunks of target and input text remember we have to start with one character and predict the next set of characters so let's handle that but of course to begin we want to comment that out and in fact we do we need this now let's leave it in there it's not going to hurt anything so we'll say we're going to define a function called split input target and that takes a chunk of data as input and it says input text equals chunk everything up to -1 target text equals chunk 1 onward return input text target text so we're going to get an input sequence as well as a target so we want to double set uh we want to double check this by saying data set equal sequences dot map i'm going to map this function onto our sequences split input target let's add in a new line for clarity and say you know what let's do this there we go so we'll say we're going to print the first examples of the input and target values say for input example target example and data set dot take just the first thing print input data uh representation blank dot join idx2 care input example dot numpy whole bunch of parentheses print target data representation blank dot join idx to care target example.numpy all right let's head to the terminal and try this okay so you see our input data is this for a citizen before we proceed any further and it ends with you and then the target data is erst citizen so given this input what is the target so we have basically shifted the data one character to the right for our target with respect to our input and that's a task given one character predict the next likely sequence of characters so to make that more clear let's go ahead and kind of step through that one character at a time so let's come down here and of course the first thing we want to do is get rid of these print statements and then say for i input idx target idx and enumerate input example first five target example i forgot a zip statement five how many that's the enumerate that gets a colon i forgot my zip here and enumerate um zip add an extra parenthesis and then we want to add a print statement we'll say print step 4d dot format i print blank input some string dot format input idx representation of idx 2 care input idx print expected output uh yes dot dot format target idx comma representation idx 2 care target idx all right now let's head to the terminal and run this and we should get something that makes perfect sense name input example is not defined okay so input example uh oh of course i got rid of this all right all right so here you can see the output so step zero uh the input is an integer 18 that maps to the character f and the expected output is i so it knows that it should expect uh the next character which is the um next character in the sequence now keep in mind this isn't trained with an rnn yet this is just stepping through the data to kind of show you that given one character what should it expect next so that's all well and good the next thing we have to handle is creating training batches and then uh training our model building and training the model so let's head back to the text editor and handle that so let's go ahead and comment all this out and handle the conception of a batch so we'll say let's handle the batch size next so we'll say batch size equals 64 and buffer size just how many characters you want to load ten thousand data set equals data set dot shuffle buffer size dot batch batch size drop remainder he goes true uh then we want to say vocab signs equals line of vocab we're gonna start building our model next so embedding dimension 256 rnn units 1024 so we will use a function to go ahead and build our model we'll say def build model vocab size embedding dim rnn units batch size model tf keras sequential tf keras layers and embedding layer of course we have to go with an embedding layer at the beginning because if you recall from the first video we have to go from this integer representation to a reduced dimensional representation a word embedding that allows the model to find relationships between words because this integer basis all of these vectors are orthogonal to one another there's no overlap of characters however in the word embedding the higher dimensional space or reduced dimensional space allows you to have some overlap of relationship between characters so those vectors are non-orthogonal they are to some extent co-linear so just a bit of math speak for you but that is what is going on there vocab size embedding dim batch input shape equals batch size by none so i can take something arbitrary and and that recurrent initializer initializer uh yeah i think i spelled that right glow rot uniform is that right yep okay so now we have another layer let's tab that over say tf keras.layers.dense and it'll output something of vocab size so now let's end that and return our model so now that we have a model the next thing we want to do is build and compile that model so we'll say model it goes build model vocab size equals one of vocab um you know this is one i guess one kind of thing i don't like about the tutorial embedding them there's a little bit of that little bit right there but whatever betting dim and we need rnn units equals rnn units batch size equals batch size so that will make our model and uh let's go ahead and see what type of predictions that model outputs without training so we'll say for input example batch target example batch and data set dot take one now keep in mind this is going to be quite rough because there is no you know there's no training yet so it's going to be garbage but let's just see what we get so we say example batch predictions equals model input example batch print example let's print the shape example batch predictions.shape and that should be batch size sequence length and vocab size and you know what while we're at it let's just print out a model summary so you can see what's going on and see what is what so let's head to the terminal try this again see how many typos i made batch inputs shape is probably a batch input shape online 77 right here uh batch size batch size what have i done something stupid and no doubt oh it's probably here um batch inputs shape there we go try it again okay so you can see that it it has output to something batch size by 100 characters by vocab size makes sense here is the model 4 million or so parameters they're all trainable and you can see that the majority of those are in the gated recurrent unit so let's go back to the text editor and start thinking about start thinking about training the model so we come here let's go ahead and get rid of this print statement we don't need it we can get rid of the model summary as well and think about training our model the first thing we need to train the model is a loss function so we'll pass in labels and logits and return tf keras losses sparse categorical cross entropy labels logits from logits equals true and then since we are good python programmers we will format this a little bit better like that and we can go ahead and start training our model so we will say so we will say model dot compile and optimizer equals atom loss equals loss and say check point directory equals dot slash training checkpoints check point prefix use os path join checkpoint der check yeah checkpoint underscore epoch so epoc is a variable it's going to get passed in by tensorflow or keras in this case and so it'll know whatever you know epoch we're on it'll save a checkpoint with that name checkpoint callback you have to define callbacks uh tf.keras.callbacks.model checkpoint file path equals checkpoint prefix save weights only equals true and so we'll train for in this case i don't know something like uh for reference i trained it for 100 box to generate the text you saw at the beginning of the tutorial but it doesn't really matter all that much so we'll say 25 epochs because it's not the most sophisticated model in the world so we'll say history equals model.fit data set epochs equals epochs callbacks equals checkpoint callback all right let's head to the terminal and run this it says expected string bytes not a tuple okay so as path join um that i probably made some kind of silly mistake says checkpoint der that is a string checkpoint underscore epoch is fine that's interesting now what was that error that is online ninety one oh i understand so i have a comma there at the end so it's an implied tuple okay let's try this again scratching my head trying to figure that one out all right so now it is training so i'm gonna go ahead and uh let this run and i'll be back when it is finished okay so it has finished training and you can see that the loss uh went down by a factor of you know three or four about three or so from two point seven all the way down to point seven seven so it did pretty well in terms of training now this is 25 epochs we don't have to rerun the training because we did the model checkpointing so the next and final order of business is to write the function to generate the predictive text you know the output of the model uh so that way we can kind of get some sort of idea of what sort of shakespearean prose this artificial intelligence can generate let's go ahead and head to our file so the first thing we have to think about is how are we going to handle loading our model and that will require that we don't do the build model up here so we can just get rid of that and we certainly don't want to compile or train the model again we want to load it from a checkpoint so what we'll do is say model it goes build model vocab size embedding dim rnn units and batch size equals what batch size equals one that's right because when we pass in a set of input text we don't want to get out you know a huge batch of output text we just want a single sequence of output text then we see model.load weights tf.train latest checkpoint checkpoint dir so this will scan the directory and get our lotus checkpoint latest checkpoint now we want to build the model by saying tf tensor shape one by none so batch size of one and an arbitrary length of characters so then we'll say model dot summary and we can scroll down a little bit for readability uh so that'll print out the new model to the terminal so the next thing we have to handle is the uh prediction of the prediction problem and generating text so let's say um define generate text model and start string so we need to pass in the model we want to use to generate the text as well as a starting string a prompt for the ai if you will i'm generate equals 1000 that's the number of characters we want to generate uh input eval equals care to idx s4s and start string we have to go to the character representation of sorry the integer representation of our characters and we have to expand that along the batch dimension we need an empty list to keep track of our generated text and a temperature so the temperature kind of handles the so the temperature kind of handles the uh surprising factor of the text so it'll take the text and scale up by some number in this case a temperature one means just whatever the model outputs so uh a lot a smaller number means more more reasonable more predictable text and large number gives you uh some kind of crazy wacky type of stuff so let us reset states on our model and say or i let's scroll down i in range num generates predictions equals model input eval predictions equals tf squeeze along the batch dimension zero predictions equals predictions divided by temperature and predicted id which is the um the prediction of the id of the word returned by the model tf random categorical predictions num samples equals one minus one zero dot num pi then we say input eval equals tf.expand dimms predicted id 0 text generated dot append idx 2 care predicted id so if you're not familiar with this the random categorical as a probability distribution when you have a set of discrete categories and it will predict them oh i forgot a one here that will uh break so it will uh pre it will generate predictions according to the distribution defined by this variable predictions so then we want to return start string and that may be familiar to you if you watch some of my other reinforcement learning tutorials the actual critic methods in particular use the categorical distribution plus mpstring.join text generated so then you want to say print generate text model start string equals romeo colon give it a space as well all right now moment of truth let's see how well our model does write that go to the terminal and try it again so you see it loads the model pretty well and we have our text that is quite quick so king richard iii says i will practice on his son you are beheads for me you henry brutus replies and welcome general and music the while tyrell you know i'm wondering if these aren't the collected works of shakespeare actually now that i'm reading this uh looking at all of the names that's kind of brutus and king richard that sounds like it's uh from a couple of different plays caesar and whatever king richard appears in i don't know again i'm an uncultured swine uh you let me know but you can see that what's really fascinating here is that this model started out with no information about the english language whatsoever it knew nothing at all about english we didn't tell it that there are words we didn't tell there are sentences we didn't tell it that you should add in breaks or periods or any other type of punctuation it knows nothing at all and within i don't know two and a half minutes of training it generates a model that can string together characters and words in a way that almost kind of makes sense now uh you know bernadine says i am a roman and by tenot and me now that is mostly gibberish but i am a roman certainly makes sense uh you know but warwick i have poison that you have heard you know that is kind of something uh to add my own important process of that hung in point okay that's kind of silly uh is is pointing that my soul i love him well so it strings together words in a way that almost makes sense now returning back to the question of which is better a billion monkey hours of typing or this ai my money is solidly on the ai you know these aren't put together randomly these are put together probabilistically and they kind of sort of make sense and you can see how more sophisticated models like the open ai text generator could be somewhat more sophisticated using transformer networks and how they can be better at actually creating text that even makes even more sense although what's interesting is that it's not a you know a significant quote-unquote quantum leap i hate that phrase but it's not a quantum leap over what we've done here in just a few minutes on our own gpu in our own rooms that is quite cool and uh that is something uh that never ceases to amaze me so i hope you found this tutorial enjoyable if you have make sure to hit the subscribe and the bell icon because i know only 14 of you get my notifications and look forward to seeing you all in the next videowelcome free code campers to a practical introduction to natural language processing with tensorflow 2. i am your host dr phil tabor in 2012 i got my phd in experimental condensed matter physics and went to work for intel corporation as a back-end dry edge process engineer i left there in 2015 to pursue my own interests and have been studying artificial intelligence and deep learning ever since if you're unfamiliar with natural language processing it is the application of deep neural networks to text processing it allows us to do things such as text generation you may have heard the hubbub in recent months over the open ai gpt 2 algorithm that allowed them to produce fake news it also allows us to do things like sentiment classification as well as something more mathematical which is representing strings of characters words as mathematical constructs that allow us to determine relationships between those words but more on that in the videos it would be most helpful if you have some background in deep learning if you know something about deep neural networks but it's not really required we're going to walk through everything in the tutorial so you'll be able to go from start to finish without any prior knowledge although of course it would be helpful if you'd like to see more deep learning reinforcement learning and natural language processing content check me out here on youtube at machine learning with phil i hope to see you there and i really hope you enjoy the video let's get to it in this tutorial you are gonna learn how to do word embeddings with tensorflow 2.0 if you don't know what that means don't worry i'm gonna explain what it is and why it's important as we go along let's get started before we begin with our imports a couple of housekeeping items first of all i am basically working through the tensorflow tutorial from their website so i'm going to link that in the description so i'm not claiming this code is my own although i do some cleaning up at the end to kind of make it my own but in general it's not really my code so we start with our imports as usual we need i o to handle dumping the word embeddings to a file so that we can visualize later we'll need matplotlibe to handle plotting we will need tensorflow as tf and just a word so this is tensorflow 2.1.0 rsc1 release candidate 1. so this is as far as i'm aware of the latest build so tensorflow 2.0 throws some really weird warnings and 2.1 seems to deal with that so i've upgraded so if you're running tensorflow 2.0 and you get funny errors uh sorry funny warnings but you still get functional code and learning that is why you want to update to the newest version of tensorflow of course we need kiros to handle pretty much everything we also need the layers for our embedding and dense layers and we're also going to use the tensorflow data sets so i'm not going to have you download your own data set we're going to use the imdb movie data set for this particular tutorial so of course that is an additional dependency for this tutorial so now that we've handled our imports let's talk a little bit about what word embeddings are so how can you represent a word for a machine and more importantly instead of a string of characters how can you represent a collection of words a bag of words if you will so you have a number of options one way is to take the entire set of all the words that you have in your say movie reviews you know you just take all the words and find all the unique words and that becomes your dictionary and you can represent that as a one-hot encoding so if you have let's say ten thousand words then you would have a vector for each word with ten thousand elements which are predominantly zeros except for the one corresponding to whichever word it is the problem with this encoding is that while it does work it is incredibly inefficient and it because it is sparse you know the majority of the data is zero and the only one important bit in the whole thing so not very efficient and another option is to do integer encoding so you can just rank order the numbers uh sorry the words you could do it in alphabetical order the order doesn't really matter you can just assign a number to each unique word and then every time that word appears in a review you would have that integer in an array so you end up with a set of variable length arrays where the length of the array corresponds to the number of words in the review and the members of the array correspond to the words that appear within that review now this works this is far more efficient but it's still not quite ideal right so it doesn't tell you anything about the relationships between the words so if you think of the word let's say king it has a number of connotations right a king is a man for one so there is some relationship between a king and a man a king has power right he has control over a domain a kingdom so there is also the connotation of owning land and having control over that land uh king may also have a queen so it has some sort of relationship to a queen as well i may have a prince a princess you know all these kinds of different relationships between words that are not incorporated into the uh integer encoding of our dictionary the reason is that the integer encoding of our dictionary forms a basis in some higher dimensional space but all those vectors are orthogonal so if you take their dot product they are essentially at right angles to each other in a hybrid dimensional space and so their dot product is zero so there's no projection of one vector one word onto another there's no overlap in the meaning between the words at least in this higher dimensional space now word embeddings fix this problem by keeping the integer encoding but then doing a transformation to a totally different space so we introduce a new space of a vector of some arbitrary length it's a hyper parameter of your model much like the number of neurons in a dense layer as a hyper parameter of your model the length of the embedding layer is a hyperparameter and we'll just say it's eight so the word king then has eight floating point elements that describe its relationship to all the other vectors in that space and so what that allows you to do is to take dot products between two arbitrary words in your dictionary and you get non-zero components and so that what that means in practical terms is that you get a sort of semantic relationship between words that emerges as a consequence of training your model so the way it works in practice is we're going to have a whole bunch of reviews from the imdb data set and they will have some classification as a good or bad review so for instance you know uh for the star wars last jedi movie i don't think it's in the in there but you know my review would be that it was terrible awful no good totally ruined luke luke's character and so you would see and i'm not alone in that so if you uh did a huge number of reviews for the last jedi you would see a strong correlation of words such as horrible bad wooden characters mary sue things like that and so the model would then take those words run them through the embedding layer and try to come up with a prediction for whether or not that is a good or bad review and match it up to the training label and then do back propagation to vary those weights in that embedding layer so let's say eight elements and by training over the data set multiple times you can refine these weights such that you are able to predict whether or not a review is positive or negative about a particular movie but also it shows you the relationship between the words because the model learns the correlations between words within reviews that give it either a positive or negative context so that is word embeddings in a nutshell and we're going to go ahead and get started coding that so the first thing we're going to have is a an embedding layer and this is just going to be for illustration purposes and that'll be layers dot embedding and let's say there's a thousand and five elements so we'll say result equals embedding layer tf constant one two three so then let's print the result uh dot numpy okay so let's head to the terminal and execute this and see precisely what we get actually let's do this to print result dot numpy.shape i think that should work let's see what we get in the terminal and let's head to the terminal now all right let's give it a try okay so what's important here is you see that you get an array of three elements right because we did the tf constant of one two and three and you see we have five elements because we have broken the integers into some components in that five element space okay so and it has shape three by five which you would expect because you're passing on three elements and each of these three elements these three integers correspond to a word of an embedding layer of five elements okay that's relatively clear let's go back to the code editor and see what else we can build with this okay so let's go ahead and just kind of comment out all this stuff because we don't need it anymore so now let's get to the business of actually loading our data set and doing interesting things with it so we want to use the data set load function so we'll say train data test data and some info tfts.load imdb reviews express subwords 8 okay and then we will define a split and that is tfds.split.train tfts.split.test and we will have a couple other parameters with info equals true that incorporates information about the um about the data sets and as supervised equals true so as supervised tells the data set loader that we want to get back information in the form of data and label as a tuple so we have the labels for training of our data so now we're going to need an encoder so we'll say info.features text dot encoder and so let's just um find out what words we have in our dictionary from this we'll say print encoder sub words first 20 elements save that and head back to the terminal and print it out and see what we can see so let's run that again and you it's hard to see let me move my face over for a moment and you can see that we get a list of words the underscore so the underscore corresponds to space you get commas periods a underscore and underscore of so you have a whole bunch of words with underscores that indicate that they are spaces okay so this is kind of the makings of a dictionary so let's head back to the code editor and continue building on this so we no longer need that print statement now the next problem we have to deal with is the fact that these reviews are all different lengths right so we don't have an identical length for each of the reviews and so when we load up elements into a matrix let's say they're going to have different lengths and that is kind of problematic so the way we deal with that is by adding padding so we find the length of the longest review and then for every review that is short in that we append a bunch of zeros to the end uh in our bag of words so a list of words you know the list of integers we will append a bunch of zeros at the end so zero isn't a word uh it doesn't correspond to anything and the word start with one the rank ordinal numbers start with one and so we insert a zero because it doesn't correspond to anything it won't hurt the training of our model so we need something called padded shapes and that has this shape so batch size and an empty list an empty tuple there so now that we have our padded shapes we're ready to go ahead and get our training and test batches so let's do that and since we're a good data scientist we want to do a shuffle we're going to use a batch size of 10 and a padded shapes specified by what we just defined let's clean that up and let's copy because the train the test batches are pretty much identical except it's testdata.shuffle and it's the same size so we don't have to do any changes there scroll down so you can see okay so that gives us our data so what we need next after the data is an actual model so let's go ahead and define a model so in as is typical for keras it is a sequential model and that takes a list of layers so the first layer is an embedding layer and that takes encoder.vocab size now this is you know given to us up here by the encoder object that's given by the information from our data set and we have some vocabulary size so there's ten thousand words vocabulary size is vocab size it's just the size of our dictionary and we want to define an embedding dim so that's the number of dimensions for our embedding layer so we'll call it something like 16 to start so let's add another layer global gobble global average pooling 1d and then we'll need a finally a dense layer one output activation equals sigmoid so if this seems mysterious what this is is the probability that a mapping of sorry this layer is the probability that the review is positive so it's a sigmoid go ahead and get rid of that and now we want to compile our model with the atom optimizer a binary cross entropy loss with accuracy metrics not meterics metrics equals accuracy okay that's our model and that is all we need for that so now we are ready to think about training it so let's go ahead and do that next so what we want to do is train and dump the history of our training in an object called that we're going to call history model.fit we're going to pass train batches 10 epochs and we're going to need validation data and that'll be test batches and we'll use something like 20 validation steps okay so let's scroll down a little bit so you can see it first of all and then we're going to think about once it's done let's go ahead and plot it so let's may as well do that now so let's handle that so we want to convert our history to a dictionary and that's history.history and we want to get the accuracy by taking the accuracy key and we want the validation accuracy uh using correct syntax of course val accuracy for validation accuracy and the number of epochs is just range one two line of accuracy plus one so then we want to do a plot big size nice and large twelve by nine uh we want to plot the epochs versus the accuracy b0 label equals training accuracy we want to plot the validation accuracy using just a blue line not blue o's or dots blue dot sorry and label equals validation accuracy uh plot.x label epochs plot dot y label accuracy and let's go ahead and add a title while we're at it trading and validation accuracy scroll down a little bit we will include a legend having an extraordinarily difficult time typing tonight location equals lower right and a y limit of zero point five and one that should be a tuple excuse me and plot dot show all right so let's go ahead and head to the terminal and run this and see what the plot looks like and we are back let me move my ugly mug over so we can see a little bit more and let us run the software and see what we get okay so it has started training and it takes around 10 to 11 seconds per epoch so i'm just going to sit here and twiddle my thumbs for a minute and fast forward the video while we wait so of course once it finished running i realize i have a typo and that is typical so in line 46 it is p it is i spelled out plot instead of plt but that's all right let's take a look at the data we get in the terminal anyway so you can see that the validation accuracy is around 92.5 pretty good and the training accuracy is around 93.82 so a little bit of overtraining and i've run this a bunch of times and you tend to get a little bit more over training i'm kind of surprised that this final now that i'm running over youtube it is actually a little bit less overtraining uh but either way there are some evidence over training but a 90 accuracy for such a simple model isn't entirely hateful so i'm going to go ahead and head back and correct that typo and then run it again and then show you the plot so it is here in line 46 right there and just make sure that nothing else looks wonky and i believe it is all good there looking at my cheat sheet uh everything looks fine okay let's go back to the terminal and try it again all right once more all right so it has finished and you can see that this time the validation accuracy was around 89.5 percent whereas the training accuracy was 93.85 so it is a little bit over trainee in this particular run and there is significant run to run variation as you might expect so let's take a look at the plot all right so i've stuck my ugly mug right here in the middle so you can see that the training accuracy goes up over time as we would expect and the validation accuracy generally does that but kind of tops out about halfway through the number of epochs so this is clearly working and this is actually pretty cool with such a simple model we can get some decent uh review or sentiment as it were classification but we can do one more neat thing and that is to actually visualize the relationships between the words that are embedding learns so let's head back to the code editor and then let's write some code to tackle that task okay so before we do that you know i want to clean up the code first let's go ahead and do that so i will leave in all that commented stuff but let's define a few functions we'll need a function to get our data we'll need a function to get our model and we'll need a function to plot data and we'll need a function to get our embeddings we'll say retrieve embeddings and i'll fill in the parameters for those as we go along so let's take this stuff from our get our data cut that paste it and of course use proper indentation because python is a little bit particular about that okay make sure everything lines up nicely and then of course we have to return the stuff that we are interested in so we want to return train data test data and in fact that's not actually what we want to do i take it back let's come down here and uh we want our uh sorry we don't actually want to return our data we want to turn our batches so return train batches test batches and we'll also need our encoder for the visualizing the relationship relationships between words so let's return that now okay now uh let's handle the function for the get model next so let's come down here and grab this actually let's yeah grab all of it and come here and do that and let's make embedding dim a parameter of our model and you notice in our model we need the encoder so we also have to pass in the encoder as well as an embedding dim and then at the bottom of the function we want to return that model pretty straightforward so then let's handle the plot data next so we have all of this grab that and indent here so we're going to need a history and uh that looks like all we need because we define epochs accuracy and validation accuracy okay so it looks like all we need in the plot data function so then we have to write our retrieve embeddings function but first let's handle all the other stuff we'll say train batches test batches and encoder equals get data in fact let's rename that to get batch data to be more specific this is kind of being pedantic but you always want to be as descriptive as possible with your naming conventions so that way people can read the code and know precisely what it does without having to you know make any guesses so if i just say get data it isn't necessarily clear that i'm getting batches out of that data you know i could just be getting single instances it could return a generator it is a little bit ambiguous so changing the function name to get batch data is the appropriate thing to do so then we'll say model equals get model and we pass it the encoder and then the history will work as intended and then we call our function to plot the history and that should work as intended as well and now we are ready to tackle the retrieve embeddings function so that is relatively straightforward so what we want to do is we want to pass in the model and the encoder and we don't want to pass what we want to do is we want to the purpose of this function is to take our encoder and dump it to a tsv file that we can load into a visualizer in the browser to visualize the principle component analysis of our word encodings so we need files to write to and we need to enumerate over the sub words in our encoder and write the metadata as well as the vectors for our encodings so outvectors io.open vex dot tsv and in write mode and encoding of utf-8 we need out metadata and that's similar meta.tsv write encoding equals utf-8 very similar now we need to iterate over our encoder sub words and get the vectors out of that to dump to our vector file as well as the metadata weight sub num plus one and so we have the plus one here because remember that uh we start from one because zero is for our uh padding right zero doesn't correspond to a word so the words start from one and go on so we want to write the word plus a new line and for the vectors i'm going to write a tab delimited string x in vector and plus a new line character at the end and then we want to close our files okay so then we just scroll down and call our function retrieve embeddings model and encoder okay so assuming i haven't made any typos this should actually work so i'm going to go ahead and head back to the terminal and try it again all right moment of truth so it is training so i didn't make any mistakes up until that point uh one second we'll see if it actually makes it through the plot oh but really quick so if you run this with tensorflow 2 let me move my face out of the way if you run this with tensorflow 2 you will get this out of range end of sequence error and if you do google if you do a google search for that you will see a thread about it in the github and basically someone says that it is fixed in 2.1.0.rc1 the version of tensorflow which i am running however i still get the warning on the first run in version 2.0.0 i get the warning on every epoch so it kind of clutters up the terminal output but it still runs nonetheless and gets comparable accuracy so it doesn't seem to affect the model performance but it you know makes for an ugly youtube video and gives me an easy feeling so i went ahead and updated to the latest release candidate 2.1.0 and you can see that it works relatively well so one second and we'll see the plot again and of course i made a mistake again it's plot history not uh it's plot data not plot history let's fix that all right uh plot let's change this to plot history because that is more precise and we will try it again let's do it all right so it has finished and you can see that the story is much the same a little bit of overtraining on the training data let's take a look at the plot and the plot is totally consistent with what we got the last time you know an increasing training accuracy and a leveling off of validation accuracy so let's go ahead and check out how these word embeddings look in the browser but first of course i made a mistake so weights are not defined and that is because i didn't define them so let's go back to the code editor and do that all right so what we want to do is this weights equal model dot layers subzero dot get weights so this will give us the actual weights from our model which is the uh the zeroth layer is the embedding layer and we want to get the weights and the zeroth element of that so i'm going to go ahead and head back the terminal and i'm going to actually get rid of the plot here because we know that works and i'm sick of seeing it so we will just do the model fitting and retrieve the embedding so let's do that now it's one of the downsides of doing code live is i make all kinds of silly mistakes while talking and typing but that's life see in a minute all right so that finished running let's head to the browser and take a look at what it looks like okay so can i zoom in i can a little bit so let's take a look at this so to get this you go to load over here on the left side you can't really see my cursor but you go to load on the left side load your vector and metadata files and then you want to click on this 3d labels mode here and let's take a look at this so you see right here on the left side annexed seated and ottoman so these would make sense to be you know pretty close together because they you know kind of would you would expect those three words to be together right annexed and seated if you annex something someone else has to seed it it makes sense let's kind of move around a little bit see what else we can find okay so this looks like a good one we see waterways navigable humid rainfall petroleum earthquake so you can see there are some pretty good relationships here between the words that all makes sense uh if you scroll over here what's interesting is you see estonia herzegovina slovakia sorry for mispronouncing that cyprus you see a bunch of country names so it seems to learn the names and it seems to learn that there are relationships between different geographic regions in this case countries there we see seated and annexed on ottoman again and you can even see concord in here next to annexed and seated deposed arc bishop bishop assassinated oh you can't see that let me move my face there just moved me over so now you can see surrendered conquered spain right spain was conquered for a time by the moors archbishop deposed surrendered assassinated invaded you can see all kinds of cool stuff here so this is what it looks like i've seen other words like beautiful wonderful together other stuff so if you play around with this you'll see all sorts of uh interesting relationships between words and this is just the visual representation of what the word embeddings look like in a reduced dimensional representation of its higher dimensional space so i hope that has been helpful i thought this was a really cool project just a few dozen lines of code and you get uh to something that is actually a really neat uh kind of a neat result where you have um a higher dimensional space that gives you mathematic relationships between words and it does a pretty good job of learning the relationships between those words now what's interesting is i wonder how well this could be generalized to other stuff so if we feed it you know say twitter twitter tweets could we get the sentiment out of that i'm not entirely sure that's something we would have to play around with uh it seems like he would be able to so long as there is significant overlap in the dictionaries between the words that we have for the imdb reviews and the dictionary of words from the twitter feeds that we scrape but that would be an interesting application of this to kind of find toxic twitter comments uh and the like but i hope this was helpful just a reminder my new course is on sale for 9.99 for the next five days there will be one more sale last several days of the year but there will be a gap several days in between this channel totally supported by ad revenue as well as my course sales so if you want to support the cause go ahead and click the link in the pinned comment slash description and if not hey go ahead and share this because that is totally free and i like that just as well leave a comment down below hit the subscribe button if you haven't already hit the bell icon to get notified when i release new content and i will see you in the next video in this tutorial you are going to learn how to do sentiment classification with tensorflow 2.0 let's get started before we begin a couple of notes first of all it would be very helpful if you have already seen my previous video on doing word embeddings in tensorflow 2.0 because we're going to be borrowing heavily from the concepts i presented in that video if not it's not a huge deal i'll show you everything we need to do as we go along it's just it'll make more sense with that sort of background second point is that i am working through the official tensorflow tutorials this isn't my code i did have to fix a couple of bugs in the code so i guess that makes it mine to some extent but unless i did not write this so i'm just presenting it for your consumption in video format all that said let's go ahead and get to coding our sentiment analysis software so as usual we begin with our imports we will need the tensorflow datasets to handle the data from the imdb library of course you need tensorflow to handle tensorflow type operations so the first thing we want to do is to load our data set and get our training and testing data from that as well as our encoder which i explained in the previous video so let's start there data set and info is load of the imdb reviews uh help if i spelled it correctly subwords 8k now just a word these are the reviews uh a bunch of reviews from the imdb data set so you have a review with an associated classification of either positive or negative with info equals true as supervised equals true let's tab that over next we will need our training and testing data sets set equals data set subtrain and data set sub test and finally we need our encoder dot encoder good grief i can type cannot type tonight at all so if you don't know what an encoder is the basic idea is that it is a sort of reduced dimensional representation of a set of words so you take a word and it associates that with an n-dimensional vector that has components that will be non-perpendicular to other words in your dictionary so what that means is that you can express words in terms of each other whereas if you set each word in your dictionary to be a basis vector they're orthogonal and so there's no relationship between something like king and queen for instance whereas with the auto encoder representation uh whereas with the sorry the word embedding representation it is the it has a non-zero component of one vector along another so you have some relationship between words that allows you to parse meaning of your string of text and i give a better explanation in my previous video so check that out for your own education so we're gonna need a couple of global variables above our size 10 000 a batch size for training and some padded shapes and this is for padding so when you have a string of words the string of words uh could be different lengths so you have to pad to the length of the longest review basically and that is batch size by empty so the next thing we'll need is our actual data set we're going to shuffle it because we're a good data scientist and we're going to want to get a padded batch from that in the shape defined with the variable above and the test data set is very similar good grief so i i'm using vim for my new text editor part of my new year's resolution and um let's yank that and it is a little bit tricky if you've never used it before i'm still getting used to it there we go as you can see then we have to go back into insert mode test data set test data set dot padded batch and padded shapes all right that is good uh next thing we need is our model so the model is going to be a sequential keras model with a bi-directional layer as well as a couple of dense layers we're using a binary cross entropy loss with an atom optimizer learning rate of 10 by 1 by 10 to the minus 4. and then we will say tf keras dot layers embedding encoder.vocab size 64. tf keras layers bi-directional tf keras.layers.l 64. two parentheses dense and that is 64. with a rally value activation if i could ever learn to type properly that would be very helpful another dense layer with an output and this output is going to get a sigmoid activation and what this represents is the probability of the review being either positive or negative so the final output of the model is going to be a floating point number between zero and one and it will be the probability of it being a positive review and we're going to pass in a couple of dummy uh reviews uh just some kind of softball kind of stuff to see how well it does but before that we have to compile our model and with a binary cross entropy loss optimizer equals tf keras optimizers atom the learning rate 1 by 10 to the minus 4 and we want metrics accuracy and then we want the uh history which is just the model fit and this is really for uh plotting purposes but i'm not gonna do any plotting you get the idea that the you know the accuracy goes up over the time and and the uh loss goes down over time so no real need to plot that train data set we're just gonna do three epochs you can do more but for the purpose of the video i'm just gonna do three actually let's do five because i'll do five for the next model we're going to do validation data equals test data set and validation steps 30. so next we need to consider a couple of functions so one of them is to pad the uh the vectors that we pass in to whatever size and the second is to actually generate a prediction so let's define those functions and just to be clear this is for the sample text we're going to pass in because remember the reviews all are all of varying lengths and so we have to uh for purposes of the i guess you can say continuity of inputs to your model and not a really technical phrase but so that way you pass in the same length of vector to you know your model for the training we have to deal with the problem of the same problem with the sample text that we're going to pass in because we don't have an automated tensorflow function to handle it for us and we're going to pad it with zeros because those don't have any meaning in our dictionary and we want to return the vector after extending it so if you're not familiar with this idiom in python uh you can multiply a quantity like say a string by a number to basically multiply that string so if you had the letter a multiplied by 10 it would give you 10 a's and you can do that with you know list elements as well pretty cool stuff a neat little feature of python a little known i think but that's what we're doing here so we're going to uh going to pad the zeros to the size of whatever whatever size we want minus whatever the length of our vector is and extend that vector with those zeros next we need a sample predict function and the reason we can't just do model.predict is because we have the the issue of dealing with the padding text equals encoder.encode and remember the encoder is what goes from the uh string representation to the higher dimensional representation that allows you to make correlations between words so if you want to pad it then pad to size encoded sample thread text 64. that's our batch size or our max length sorry and then encoded sample thread text is tf cast flip 32 and predictions model dot predict if that expand dimensions encoded sample thread text zero batch dimension return predictions all right so now we have a model that we have trained once you run the code of course uh now let's come up with a couple of dummy simple very basic uh reviews to see how it scores them so we'll say sample text equals uh this movie was awesome the acting was incredible uh highly recommend then we're going to spell sample text correctly of course and then we're going to come up with our predictions equal sample predict sample text pad equals true and we're going to multiply that by 100 so we get it as a percentage and can i i can't quite scroll down that is a feature not a bug i am sure uh you can write in whatever positive review you want so then we'll say print uh probability this is a positive review predictions and i haven't done this before so when i coded this up the first time i have it executing twice once with pad equals false once with pad equals true to see the delta in the predictions and surprise surprise is more accurate when you give it a padded review but in this case i'm going to change it up on the fly and do a different set of sample text and give it a negative review and see how it does this movie was so so i don't know what this is going to do that's kind of a you know vernacular i don't know if that was in the database so we'll see the acting was mediocre kind of recommend and predictions sample predict sample text pad equals true times 100 and we can um yank the line and paste it all right okay so we're going to go ahead and save this and go back to the terminal and execute it and see how it does and then we're going to come back and write a slightly more complicated model to see how well that does to see if you know adding complexity to the model improves the accuracy of our predictions so let us write quit and if you've never used vim uh you have to press colon wq sorry when you're not in insert mode uh right quit to get out and then we're gonna go to the terminal and see how well it does all right so here we are in the terminal let's give it a shot and see how many typos i made ooh interesting so it says check that the data set name is spelled correctly that probably means i misspelled the name of the data set all right let me scroll up a little bit uh it's imdb reviews okay i am oh right there data set yeah you can't yeah right there okay so i misspelled the name of the data set not a problem vimtf sentiment let us go up to here i am db right quit and give it another shot i misspelled dense okay can you see that no not quite uh it says here let me move myself over has no attribute dense so let's fix that that's in line 24 line 24 insert an s quit and try again there now it is training for five epochs i am going to let this ride and show you the results when it is done really quick you can see that it gives this funny error let me go ahead and move my face out of the way now this i keep seeing in the tensorflow 2 stuff so uh as far as i can tell this is related to the version of tensorflow this isn't something i'm doing or you're doing there is an open issue on github and previously it would run that error every time i trained with every epoch however after updating do i think tensorflow 2.1 it only does it after the first one so i guess you gain a little bit there uh but it is definitely but it's definitely an issue with tensorflow so i'm not too worried about that so let's go ahead on this train all right so it has finished running and i have teleported to the top right so you can see the accuracy and you can see accuracy starts out low and ends up around 93.9 not too shabby for just five epochs on a very simple model likewise the loss starts relatively high and goes relatively low what's most interesting is that we do get a 79.8 percent probability that our first review was positive which it is so an 80 probability of it being correct is pretty good and then an only 41.93 percent probability the second being positive now this was a bit of a lukewarm review i said it was so so so a 40 probability of it being positive is pretty reasonable in my estimation so now let's see if we can make a more complex model and get better results so let's go back to the code and type that up so here we are let's scroll down and say let's make our new model so model you have to make sure you're in insert mode of course model equals tf keras sequential tf keras layers of course you need an embedding layer to start encoder.vocab size 64. let's move my mug like so and add our next layer which is keras layers bi-directional lstm 64 return true and i am way too far over 88 that is still well we're just going to have to live with it it's just going to be bad code not up to the pep 8 standards but whatever sumi bi-directional lstm 32 keras layers dot dense and 64 with a volume activation and to prevent overfitting we are going to add in a little bit of drop out just 0.5 so 50 percent and add our final classification layer with a sigmoid activation model do i have let me double check here looks like i forgot a parenthesis there we go good grief delete that line and make our new model model lock compile loss equals binary cross entropy optimizer equals atom same learning rate we don't want to change too many things at once that wouldn't be scientific accuracy history equals model.fit train data set data set not cert epochs equal 5 validation data set equals test data set 30 validation steps and we're just going to scroll up here and uh copy whoop copy all of this visual yank and come down and paste all right so ah what's so i'm detecting a problem here so i need to modify my sample predict problem uh my sample predict so let's go ahead and pass in a model uh call it model underscore just to be safe because i'm declaring one model and then another i want to make sure these scoping issues are not going to bite me in the rear end i need model equals model and let's do likewise here model eagles model and we'll come up here and modify it here as well just to be pedantic and i'm very tired so this is probably unnecessary but we want to make sure we aren't getting any funny scoping issues so that the model is doing precisely what we would expect so let's go ahead and write quit and try running it oh actually i take it back i want to go ahead and get rid of the fitting for this because we've already run it we can leave it actually you know what now that i'm thinking about it let's just do this and then we will comment this out all right and then we don't even need the the model equals model there but i'm going to leave it all right let's try it again let's see what we get so remember we had a uh 80 and 41 or 42 probability of it being positive so let's see what we get with the new model validation data set so i must have mistyped something so let's take a look here right there because it is validation data not validation data set all right try it again all right it is training i will let this run and show you the results when it finishes so of course after running it i realized i made a mistake in the uh and the declaration of the sample predict function typical typical unexpected keyword argument so let's come here and you know let's just get rid of it oh because it's model underscore um yeah let's get rid of it because we no longer need it and get rid of this typical typical all right this is one of the situations in which a jupiter notebook would be helpful but whatever i will stick to them and the terminal and pi files because i'm old all right let's try this again and i'll just go ahead and edit all this out and we will uh meet up when it finishes i've done it again oh it's not my day folks not my day and let us find that there delete once again all right so i finally fixed all the errors it is done training and we have our results so probability this is a positive review 86 percent a pretty good improvement over 80 what's even better is that the uh probability of the second review which was lukewarm so so being positive has fallen from 41 or 42 down to 20 22 almost cut in half so pretty good improvement with a they you know somewhat more complicated model and at the expense of slightly longer training so you know 87 seconds as opposed to 47 seconds so i know sometimes six minutes as opposed to three not too bad so anyway so what we've done here is loaded a series of imdb reviews used it to train a model to do sentiment prediction by looking at correlations between the words and the labels for either positive or negative sentiment and then asking the model to predict what the sentiment of a obviously positive and somewhat lukewarm review was and we get pretty good results in a very short amount of time that is the power of tensorflow 2.0 so i thank you for watching any questions comments leave them down below i try to answer all of them less so now that i have more subscribers more views it gets a little bit more overwhelming but i will do my best speaking of which hit the subscribe button hit the notification bell because i know only 14 of you are getting my notifications and look forward to seeing you in the next video where he sees your head my lovely we sleep her with my hate or for me think that we give his cruel he cries said your honors ear i shall gromas no i haven't just had a stroke don't call 9-1-1 i've just written a basic artificial intelligence to generate shakespearean text now we get to finally address the question which is better writing shakespearean sonnets a billion monkey hours or a poorly trained ai let's get started all right first before we begin with our imports a couple of administrative notes the first of which is that this is an official tensorflow tutorial i have not written this code myself and in fact it is quite well written as it is the first tutorial i haven't had to make any corrections or adjustments to so i will leave a link in the description for those that want to go into this in more detail on their own time so feel free to check that out when you have a moment available let's get started with our imports the first thing you want to import is os that will handle some operation os level type stuff we want tensorflow as tf of course and we want numpy as np now notably we are not importing the tensorflow data set imports because this is not using an official tensorflow data set rather it is using the data due to i believe andre carpathi gets a credit for this but it is basically a text representation of a shakespearean sonnet which one i don't know doesn't state in the tutorial and i am not well read enough to be able to identify it based on the first several characters i suppose if i printed out enough of the terminal i could figure it out based on who's in it but i don't know and it's not really all that important but what is important is that we have to download it using the built-in tensorflow keras utils and of course they have their own function to get a file and it's just a simple text file called shakespeare.txt and that lives at https storage googleapis.com shakespeare.txt okay and so let's get an idea for what we're working with here so let's open it up in uh read binary mode with an encoding of utf-8 and let's go ahead and print out the length of the text so we'll say length of text blank characters dot format when text and let's go ahead and print the first 250 characters to get an idea of what we are working with all right let's head to the terminal and test this out say python tf text gen dot pi object has no attribute decode so i have messed something up most likely a parenthesis somewhere text equals open path to file.read that's right i forgot the read method insert read dot d code there we go let's try that perfect so now we see that we do indeed have some text and it has uh one million one hundred fifteen thousand three hundred ninety four characters so a fairly lengthy work uh you know several hundred thousand words at least and you see it begins with first is it uh first citizen this is important because we're gonna refer back to this text a few different times in the tutorial so just keep in mind that the first word is first very very simple and hey if you know what uh play or sonnet this is leave a comment down below because you're you know more well-cultured more well-read than i am i would be interested to know but let's proceed with the tutorial let's head back to our file and the first thing we want to do is comment these out because we don't want to print that to the terminal every single time we run the code but the first thing we have to handle is vectorizing our text now if you have seen my other two tutorials on natural language processing and tensorflow you know that we have to go from a text based representation to an integer representation or in some cases yeah totally energy representation not floating point uh in order to pass this data into the deep neural network so let's go ahead and start with that so we say our vocabulary is going to be sorted a set of the text so we're just going to sort it and make a set of unique stuff so we'll say print or unique words rather blank unique characters format len of vocab so we now important thing to keep in mind is that we are starting with merely characters we are not starting with any conception of a word so the model is going to go from knowing nothing about language at all to understanding the concept of words as well as line breaks and a little bit about grammar you kind of saw from the introduction that it's not so great probably better than the monkeys typing away but it is you know starting from complete scratch into something that kind of approximates language processing so we have sorted our vocabulary now we have to go from the character space to the integer representation so we'll say care to idx where care is just you know character that's going to be dictionary of unique characters and their integer idx their integer encoding for idx unique and enumerate vocab closing bracket and we need the idx 2 care which is the inverse operation numpy array of vocab uh then we have something called text as int and that's a numpy array of a list comprehension of care to idx of care for care in text so we're just going to take all the characters in the text look up their idx representation and stick it into a vector numpy array in this case so now let's go ahead and print this stuff out to see what we're dealing with to see what our vocabulary looks like and we'll make something pretty looking we'll say 4 care blank and zip care to idx range 20. we're only going to look at the first 20 elements we don't need to print out the whole dictionary print 4s colon 3d dot format representation of the character here to idx care and then at the end we'll print a new line um actually let's do this too so we'll say print blank characters map to int you know how many characters will be mapped to int format representation of text just the first 13 uh text as int 13. tab that over and write this and run it so unexpected end of file while parsing okay so what that means is i have forgotten a parenthesis which is here perfect now we can write quit now let's give it a shot okay so you can see we have 65 unique characters so we have a dictionary of 65 characters and new line maps to zero space maps to one so basically it's the sort has placed all of the characters the non non-alphanumeric characters at the beginning and we even have some numbers in there uh curiously the number three maps to nine but whatever and then you see we have the capital letters and the lowercase letters will follow later and so our first sentence is first citizen first 13 characters rather and that maps to this following vector here so we have gone from this string to this vector representation so that is all well and good but that is just the first step in the process so the next step is handling what we call the prediction problem so the real goal here is to feed the model some string of text and then it outputs the most likely characters it thinks will follow based on what it reads in the shakespearean work and so we want to chunk up our data into sequences of length 100 and then go ahead and use that to create a data set and then from there we can create batches of data in other words chunks of sentences or chunks of whatever sequence length characters we want let's go ahead and go back to our vim editor and start there and so the first thing is we want to go ahead and comment all this out because we don't want to print everything every single time and then handle the problem of the sequence length so we'll say sequence length equals 100 characters something manageable you want something too small something too large so number of examples uh per epoch equals line of text divided by sequence a length plus one where does a plus one come from it comes from the fact that we're going to be feeding it a character and trying to predict the rest of the characters in the sequence so you have the plus one there next we have a care data set tf data data set of course it's tensorflow it has to deal with its own data sets it doesn't handle text files too well so we're going to go data set from tensor slices text as int let's go ahead and print out uh what we have here for i in care dataset dot take the first five elements and so this is just a sequence of individual characters so we should get the first five characters out print uh idx two care i dot numpy let's go ahead and go to the terminal and run this write quit and run it once more and you see we get the word first as one would expect that is the if we scroll up that is the uh first five characters first and then citizen okay so that seems to work so now let's handle batching the data so let's go back to our vim editor get rid of this print statement by inserting a couple of comments and worry about dealing with a batch so sequence says equals care data set dot batch sequence length plus one drop remainder equals true so we'll just get rid of the characters at the end for item can i scroll down at all does it let me do that no it does not one of the downsides of vim is an editor so for item in sequence is take five the first five sequences of 100 characters print representation blank dot join idx2 care item.numpy and a whole bunch of parentheses and let's go ahead and um go back to the terminal and see how this runs so let's run it and you see i really should put a new line in there at the beginning we can see first citizen before we proceed any further hear me speak blah blah blah so we get a bunch of uh character sequences including the new line characters so that is pretty helpful so uh one thing to note is that these new lines are what give the uh the deep neural network a sense of where line breaks occur so it knows that after some sequence of characters they should expect a line break because that formulates you know the kind of metered speaking that you find in shakespeare so that's well and good let's go ahead and handle the next problem of splitting our data into chunks of target and input text remember we have to start with one character and predict the next set of characters so let's handle that but of course to begin we want to comment that out and in fact we do we need this now let's leave it in there it's not going to hurt anything so we'll say we're going to define a function called split input target and that takes a chunk of data as input and it says input text equals chunk everything up to -1 target text equals chunk 1 onward return input text target text so we're going to get an input sequence as well as a target so we want to double set uh we want to double check this by saying data set equal sequences dot map i'm going to map this function onto our sequences split input target let's add in a new line for clarity and say you know what let's do this there we go so we'll say we're going to print the first examples of the input and target values say for input example target example and data set dot take just the first thing print input data uh representation blank dot join idx2 care input example dot numpy whole bunch of parentheses print target data representation blank dot join idx to care target example.numpy all right let's head to the terminal and try this okay so you see our input data is this for a citizen before we proceed any further and it ends with you and then the target data is erst citizen so given this input what is the target so we have basically shifted the data one character to the right for our target with respect to our input and that's a task given one character predict the next likely sequence of characters so to make that more clear let's go ahead and kind of step through that one character at a time so let's come down here and of course the first thing we want to do is get rid of these print statements and then say for i input idx target idx and enumerate input example first five target example i forgot a zip statement five how many that's the enumerate that gets a colon i forgot my zip here and enumerate um zip add an extra parenthesis and then we want to add a print statement we'll say print step 4d dot format i print blank input some string dot format input idx representation of idx 2 care input idx print expected output uh yes dot dot format target idx comma representation idx 2 care target idx all right now let's head to the terminal and run this and we should get something that makes perfect sense name input example is not defined okay so input example uh oh of course i got rid of this all right all right so here you can see the output so step zero uh the input is an integer 18 that maps to the character f and the expected output is i so it knows that it should expect uh the next character which is the um next character in the sequence now keep in mind this isn't trained with an rnn yet this is just stepping through the data to kind of show you that given one character what should it expect next so that's all well and good the next thing we have to handle is creating training batches and then uh training our model building and training the model so let's head back to the text editor and handle that so let's go ahead and comment all this out and handle the conception of a batch so we'll say let's handle the batch size next so we'll say batch size equals 64 and buffer size just how many characters you want to load ten thousand data set equals data set dot shuffle buffer size dot batch batch size drop remainder he goes true uh then we want to say vocab signs equals line of vocab we're gonna start building our model next so embedding dimension 256 rnn units 1024 so we will use a function to go ahead and build our model we'll say def build model vocab size embedding dim rnn units batch size model tf keras sequential tf keras layers and embedding layer of course we have to go with an embedding layer at the beginning because if you recall from the first video we have to go from this integer representation to a reduced dimensional representation a word embedding that allows the model to find relationships between words because this integer basis all of these vectors are orthogonal to one another there's no overlap of characters however in the word embedding the higher dimensional space or reduced dimensional space allows you to have some overlap of relationship between characters so those vectors are non-orthogonal they are to some extent co-linear so just a bit of math speak for you but that is what is going on there vocab size embedding dim batch input shape equals batch size by none so i can take something arbitrary and and that recurrent initializer initializer uh yeah i think i spelled that right glow rot uniform is that right yep okay so now we have another layer let's tab that over say tf keras.layers.dense and it'll output something of vocab size so now let's end that and return our model so now that we have a model the next thing we want to do is build and compile that model so we'll say model it goes build model vocab size equals one of vocab um you know this is one i guess one kind of thing i don't like about the tutorial embedding them there's a little bit of that little bit right there but whatever betting dim and we need rnn units equals rnn units batch size equals batch size so that will make our model and uh let's go ahead and see what type of predictions that model outputs without training so we'll say for input example batch target example batch and data set dot take one now keep in mind this is going to be quite rough because there is no you know there's no training yet so it's going to be garbage but let's just see what we get so we say example batch predictions equals model input example batch print example let's print the shape example batch predictions.shape and that should be batch size sequence length and vocab size and you know what while we're at it let's just print out a model summary so you can see what's going on and see what is what so let's head to the terminal try this again see how many typos i made batch inputs shape is probably a batch input shape online 77 right here uh batch size batch size what have i done something stupid and no doubt oh it's probably here um batch inputs shape there we go try it again okay so you can see that it it has output to something batch size by 100 characters by vocab size makes sense here is the model 4 million or so parameters they're all trainable and you can see that the majority of those are in the gated recurrent unit so let's go back to the text editor and start thinking about start thinking about training the model so we come here let's go ahead and get rid of this print statement we don't need it we can get rid of the model summary as well and think about training our model the first thing we need to train the model is a loss function so we'll pass in labels and logits and return tf keras losses sparse categorical cross entropy labels logits from logits equals true and then since we are good python programmers we will format this a little bit better like that and we can go ahead and start training our model so we will say so we will say model dot compile and optimizer equals atom loss equals loss and say check point directory equals dot slash training checkpoints check point prefix use os path join checkpoint der check yeah checkpoint underscore epoch so epoc is a variable it's going to get passed in by tensorflow or keras in this case and so it'll know whatever you know epoch we're on it'll save a checkpoint with that name checkpoint callback you have to define callbacks uh tf.keras.callbacks.model checkpoint file path equals checkpoint prefix save weights only equals true and so we'll train for in this case i don't know something like uh for reference i trained it for 100 box to generate the text you saw at the beginning of the tutorial but it doesn't really matter all that much so we'll say 25 epochs because it's not the most sophisticated model in the world so we'll say history equals model.fit data set epochs equals epochs callbacks equals checkpoint callback all right let's head to the terminal and run this it says expected string bytes not a tuple okay so as path join um that i probably made some kind of silly mistake says checkpoint der that is a string checkpoint underscore epoch is fine that's interesting now what was that error that is online ninety one oh i understand so i have a comma there at the end so it's an implied tuple okay let's try this again scratching my head trying to figure that one out all right so now it is training so i'm gonna go ahead and uh let this run and i'll be back when it is finished okay so it has finished training and you can see that the loss uh went down by a factor of you know three or four about three or so from two point seven all the way down to point seven seven so it did pretty well in terms of training now this is 25 epochs we don't have to rerun the training because we did the model checkpointing so the next and final order of business is to write the function to generate the predictive text you know the output of the model uh so that way we can kind of get some sort of idea of what sort of shakespearean prose this artificial intelligence can generate let's go ahead and head to our file so the first thing we have to think about is how are we going to handle loading our model and that will require that we don't do the build model up here so we can just get rid of that and we certainly don't want to compile or train the model again we want to load it from a checkpoint so what we'll do is say model it goes build model vocab size embedding dim rnn units and batch size equals what batch size equals one that's right because when we pass in a set of input text we don't want to get out you know a huge batch of output text we just want a single sequence of output text then we see model.load weights tf.train latest checkpoint checkpoint dir so this will scan the directory and get our lotus checkpoint latest checkpoint now we want to build the model by saying tf tensor shape one by none so batch size of one and an arbitrary length of characters so then we'll say model dot summary and we can scroll down a little bit for readability uh so that'll print out the new model to the terminal so the next thing we have to handle is the uh prediction of the prediction problem and generating text so let's say um define generate text model and start string so we need to pass in the model we want to use to generate the text as well as a starting string a prompt for the ai if you will i'm generate equals 1000 that's the number of characters we want to generate uh input eval equals care to idx s4s and start string we have to go to the character representation of sorry the integer representation of our characters and we have to expand that along the batch dimension we need an empty list to keep track of our generated text and a temperature so the temperature kind of handles the so the temperature kind of handles the uh surprising factor of the text so it'll take the text and scale up by some number in this case a temperature one means just whatever the model outputs so uh a lot a smaller number means more more reasonable more predictable text and large number gives you uh some kind of crazy wacky type of stuff so let us reset states on our model and say or i let's scroll down i in range num generates predictions equals model input eval predictions equals tf squeeze along the batch dimension zero predictions equals predictions divided by temperature and predicted id which is the um the prediction of the id of the word returned by the model tf random categorical predictions num samples equals one minus one zero dot num pi then we say input eval equals tf.expand dimms predicted id 0 text generated dot append idx 2 care predicted id so if you're not familiar with this the random categorical as a probability distribution when you have a set of discrete categories and it will predict them oh i forgot a one here that will uh break so it will uh pre it will generate predictions according to the distribution defined by this variable predictions so then we want to return start string and that may be familiar to you if you watch some of my other reinforcement learning tutorials the actual critic methods in particular use the categorical distribution plus mpstring.join text generated so then you want to say print generate text model start string equals romeo colon give it a space as well all right now moment of truth let's see how well our model does write that go to the terminal and try it again so you see it loads the model pretty well and we have our text that is quite quick so king richard iii says i will practice on his son you are beheads for me you henry brutus replies and welcome general and music the while tyrell you know i'm wondering if these aren't the collected works of shakespeare actually now that i'm reading this uh looking at all of the names that's kind of brutus and king richard that sounds like it's uh from a couple of different plays caesar and whatever king richard appears in i don't know again i'm an uncultured swine uh you let me know but you can see that what's really fascinating here is that this model started out with no information about the english language whatsoever it knew nothing at all about english we didn't tell it that there are words we didn't tell there are sentences we didn't tell it that you should add in breaks or periods or any other type of punctuation it knows nothing at all and within i don't know two and a half minutes of training it generates a model that can string together characters and words in a way that almost kind of makes sense now uh you know bernadine says i am a roman and by tenot and me now that is mostly gibberish but i am a roman certainly makes sense uh you know but warwick i have poison that you have heard you know that is kind of something uh to add my own important process of that hung in point okay that's kind of silly uh is is pointing that my soul i love him well so it strings together words in a way that almost makes sense now returning back to the question of which is better a billion monkey hours of typing or this ai my money is solidly on the ai you know these aren't put together randomly these are put together probabilistically and they kind of sort of make sense and you can see how more sophisticated models like the open ai text generator could be somewhat more sophisticated using transformer networks and how they can be better at actually creating text that even makes even more sense although what's interesting is that it's not a you know a significant quote-unquote quantum leap i hate that phrase but it's not a quantum leap over what we've done here in just a few minutes on our own gpu in our own rooms that is quite cool and uh that is something uh that never ceases to amaze me so i hope you found this tutorial enjoyable if you have make sure to hit the subscribe and the bell icon because i know only 14 of you get my notifications and look forward to seeing you all in the next video\n"