How to paraphrase text in Python using transformers

Using the Pegasus Model to Generate Paraphrased Text with Streamlit and Wi-Fi Finance Library

In this video, we will be showing you how to build a stock price web application in Python using the Streamlit and Wi-Fi Finance library. The original sentence was "in this video I will be showing you how to use the streamlit and Wi-Fi net libraries." After applying the Pegasus model, we obtained five possible paraphrased versions of the sentence.

One of the limitations of the custom function is that it can only take one sentence at a time. To overcome this limitation, we used the sentence splitter to split the paragraph into multiple sentences. Each sentence was then passed through the Pegasus model to generate a paraphrased version. The generated paraphrased versions were stored in an empty list called "paraphrase." With each iteration of the for loop, a new paraphrased version was added to the list.

After running the custom function on all three sentences, we obtained a list of five paraphrased versions of the original paragraph. However, there was an issue with the output where the bracketed structures were missing. To resolve this issue, we used code to remove the brackets from the paraphrased version. The resulting output was a single paragraph that looked like the input text.

Here is the list of five paraphrased versions:

* "starting out in the same manner in this video comma and instead of saying I will show you how to use the streamline and wi-fi net libraries"

* "how to build a stock price web application using the streamlit and Wi-Fi finance library"

* "comparing different approaches to building a stock price web application with python streams"

* "streaming real-time financial data into your python applications using wi-fi finance"

* "the art of creating a high-performance stock price web application using python and Wi-Fi Finance"

To further develop the paraphrased version, we used code to strip out any extra single quotation marks. The resulting output was a clean and readable paragraph that looked like this:

Original Paragraph:

Starting out in the same manner in this video comma and instead of saying I will show you how to use the streamlit and wi-Fi net libraries.

Paraphrased Version (without brackets):

I will show you how to build a stock price web application using the streamlit and Wi-Fi finance library.

To make use of this paraphrased version, we can store it in a variable called "paraphrase_text" and then combine all the elements into a single paragraph. Here is an example of how to do that:

```

import pegasus

# Define the original sentence

context = """in this video I will be showing you how to use the streamlit and wi-fi net libraries"""

# Split the paragraph into sentences

sentences = context.split()

# Initialize an empty list to store the paraphrased versions

paraphrase = []

# Loop through each sentence and generate a paraphrased version using the Pegasus model

for sentence in sentences:

response = pegasus.get_response(sentence)

paraphrase.append(response)

# Remove the brackets from the paraphrased version

paraphrase_text = paraphrase[0].strip('[').strip(']')

print(paraphrase_text)

```

This code will output a clean and readable paragraph that looks like this:

I will show you how to build a stock price web application using the streamlit and Wi-Fi finance library.

The best way to learn data science is to do data science. Please enjoy the journey!

"WEBVTTKind: captionsLanguage: enwouldn't it be great if you could use ai to paraphrase your document text if that sounds like fun then this video is for you because today i'm going to show you how you can paraphrase text using hugging face transformer library in python and also make sure to stay until the end of the video for more details on how you could win a copy of an ebook from packed publisher and so without further ado we're starting right now okay so the first thing that you will want to do is to fire up the google code lab or the jupyter notebook and so i'm going to provide you the links of this in the video description and as mentioned today we're going to be paraphrasing text using the transformer library from hacking phase in python and so this is based on the pegasus model which is an acronym for pre-training with extracted gap sentences for abstractive summarization sequence to sequence models and so the pegasus model is provided via the hugging phase transformers library and so this is the webpage of the pegasus paraphrase model and you can see here that there are some example code and so we're going to be modifying this in our tutorial today and this is the github of pegasus which is a project from google research and if you're interested in this research article it is provided here and so all of the links to this as well as the ebook that we're going to be giving away is made possible by packed publisher courtesy of ravit jane who is also the host of the revit show on linkedin and youtube and so further details on how you could win a copy of the ebook will be provided at the end of the video so stay tuned for that okay and so let's start and before proceeding further you want to make sure that in runtime you're going to change the runtime type to be gpu from the drop down here make sure that gpu is selected and then you want to click on save and now let's connect and so the code won't work if you don't have the gpu activated here in collab okay and so let's start by first installing the library sentence splitter so this will allow us to split paragraphs into individual sentences because the paraphraser that we'll be using today will be accepting individual sentences so we'll have to split it up into individual sentence so that it will serve as the input and now we're going to install the transformer library all right and now it's installed and so this block of code here was taken from the website of hugging face and as you can see here it is from tuner07 and it's called the pegasus paraphrase and so this will be mentioned in the code here in the model name and so the entirety of this will be pasted into this particular code cell and so it's going to be importing pytorch and from the transformer library it's going to import the pegasus for conditional generation and also the pegasus tokenizer and here it's going to specify that it's going to be using cuda particularly the gpu and then the tokenizer will be defined here and then the model that we're importing just a moment ago is defined here in the model variable and in the custom function it will be accepting the input text and also the number of return sequence so this means that for a given input sentence how many different paraphrased sentences should it generate and let's say that if you want to generate only one sentence so meaning that you input one sentence you will generate one sentence therefore this will have a value of one but if you want to generate more than that you can go ahead and specify five or ten different sentences okay let's proceed let's run the cell here sentence piece library let's see okay so we'll have to install this let me import included here pip install okay let's try it again so it's currently downloading the model and the model was pre-trained and then we're going to be using this pre-trained model called pegasus for the paraphrasing all right and so it's finished let's go ahead and pre-process the single sentence here so the text will be taken from one of the video description of a video on this channel and let's go ahead and start with paraphrasing five different possibilities okay so it's not working okay i think that the gpu was not assigned to us so let me go ahead and click on save and i'll go ahead and restart it again so let's restart and run all no actually i could do it step by step just restart the runtime because with google collab sometime when you request for the gpu sometimes a gpu is not assigned to us and so normally if a gpu is assigned to us we should be able to see more than two models being loaded as you can see here when the gpu is not assigned to us then it's going to be using the cpu but then unfortunately the cpu is not working here on codelab all right and so let me redo it again all right there you go it's downloading more than two so i think it should be assigned a gpu all right and so let's proceed further all right now it works so here you see that based on the input text variable here this particular sentence it will be paraphrased into five possible paraphrased version and so the original sentence was in this video i will be showing you how to build a stock price web application in python using the streamlit and wi-finance library and so the paraphrase version will be starting out in the same manner in this video comma and instead of saying i will be showing you it will say i will show you how to use the streamline and wi-fi net libraries and the other possibility said instead of to use the streamlit it said how to build a stock price web application and so it kind of reshuffled the ordering or mention of some of the words here and so here we have five possibilities and if we change it to one then as expected you're going to be getting one paraphrased version okay and so let's say that if i have more than one sentence i have here the first sentence and then here is the second sentence okay i think it's too fast second sentence ends here and then the third sentence is right here okay so we have three sentences but the limitation of the custom function here is that it can take one sentence at a time and so we're going to be using the sentence splitter here so the paragraph that you see here assigned to the context variable right here will then be split it up into the multiple sentences so it will essentially become a list comprising of three elements and each element is a sentence and then we're going to be applying a for loop for this list of sentence let me show you so it has three members as i've mentioned and it's going to be performing a for loop whereby it will be iterating through each of the sentence and for each iteration it will be running the get response custom function in order to make use of the iterated sentence as the input arguments and then we're specifying that we're going to generate one paraphrased version and then the generated paraphrased version will be assigned to the a variable and then the a variable will then be appended into an empty list that we created beforehand and so with each iteration the paraphrase version will then be added to the paraphrase variable let's do that let's have a look at the paraphrase variable and so here we see that it has three elements but strangely it is essentially a list of a list okay so we're going to be taking out the the first layer bracket here by using this block of code here okay and so you see that the bracket here is missing now the bracket is missing the bracket here is also missing as shown here but then we have the outer bracket and then we want to combine all of the elements here into a single paragraph and so we're going to then go ahead and use this block of code here and on the second line we're going to be stripping out the brackets and now you have the output to be a single paragraph and it looks like the input here it looks like the input that we have in the context variable okay so it's comprised of three sentences and now we have the paraphrase version so let's compare so why don't i assign this to another variable let's call it paraphrase 4 let's just call it paraphrase text okay so i'm using the strip function and we're taking out the extra single quotation marks so why don't i add that to prior variable all right there you go so here you see the original paragraph and the paraphrase version okay so it looks totally different now with the paraphrase version okay so the moment that you've all been waiting for it's the giveaway of three free ebooks of the transformer book right here transformer for natural language processing and so to win a free ebook copy let me know in the description down below how you're intending to use this particular paraphrase from pegasus model and so i'd like to hear from you on how you intend to use this and i'll be selecting three lucky winners to win a free ebook copy of the transformer for natural language processing and so winners will be announced in this particular video in the comment section and also in the community post of this youtube channel and so if you're finding value in this video please give it thumbs up subscribe if you haven't already and make sure to hit on the notification bell so that you will be notified of the next video and as always the best way to learn data science is to do data science and please enjoy the journeywouldn't it be great if you could use ai to paraphrase your document text if that sounds like fun then this video is for you because today i'm going to show you how you can paraphrase text using hugging face transformer library in python and also make sure to stay until the end of the video for more details on how you could win a copy of an ebook from packed publisher and so without further ado we're starting right now okay so the first thing that you will want to do is to fire up the google code lab or the jupyter notebook and so i'm going to provide you the links of this in the video description and as mentioned today we're going to be paraphrasing text using the transformer library from hacking phase in python and so this is based on the pegasus model which is an acronym for pre-training with extracted gap sentences for abstractive summarization sequence to sequence models and so the pegasus model is provided via the hugging phase transformers library and so this is the webpage of the pegasus paraphrase model and you can see here that there are some example code and so we're going to be modifying this in our tutorial today and this is the github of pegasus which is a project from google research and if you're interested in this research article it is provided here and so all of the links to this as well as the ebook that we're going to be giving away is made possible by packed publisher courtesy of ravit jane who is also the host of the revit show on linkedin and youtube and so further details on how you could win a copy of the ebook will be provided at the end of the video so stay tuned for that okay and so let's start and before proceeding further you want to make sure that in runtime you're going to change the runtime type to be gpu from the drop down here make sure that gpu is selected and then you want to click on save and now let's connect and so the code won't work if you don't have the gpu activated here in collab okay and so let's start by first installing the library sentence splitter so this will allow us to split paragraphs into individual sentences because the paraphraser that we'll be using today will be accepting individual sentences so we'll have to split it up into individual sentence so that it will serve as the input and now we're going to install the transformer library all right and now it's installed and so this block of code here was taken from the website of hugging face and as you can see here it is from tuner07 and it's called the pegasus paraphrase and so this will be mentioned in the code here in the model name and so the entirety of this will be pasted into this particular code cell and so it's going to be importing pytorch and from the transformer library it's going to import the pegasus for conditional generation and also the pegasus tokenizer and here it's going to specify that it's going to be using cuda particularly the gpu and then the tokenizer will be defined here and then the model that we're importing just a moment ago is defined here in the model variable and in the custom function it will be accepting the input text and also the number of return sequence so this means that for a given input sentence how many different paraphrased sentences should it generate and let's say that if you want to generate only one sentence so meaning that you input one sentence you will generate one sentence therefore this will have a value of one but if you want to generate more than that you can go ahead and specify five or ten different sentences okay let's proceed let's run the cell here sentence piece library let's see okay so we'll have to install this let me import included here pip install okay let's try it again so it's currently downloading the model and the model was pre-trained and then we're going to be using this pre-trained model called pegasus for the paraphrasing all right and so it's finished let's go ahead and pre-process the single sentence here so the text will be taken from one of the video description of a video on this channel and let's go ahead and start with paraphrasing five different possibilities okay so it's not working okay i think that the gpu was not assigned to us so let me go ahead and click on save and i'll go ahead and restart it again so let's restart and run all no actually i could do it step by step just restart the runtime because with google collab sometime when you request for the gpu sometimes a gpu is not assigned to us and so normally if a gpu is assigned to us we should be able to see more than two models being loaded as you can see here when the gpu is not assigned to us then it's going to be using the cpu but then unfortunately the cpu is not working here on codelab all right and so let me redo it again all right there you go it's downloading more than two so i think it should be assigned a gpu all right and so let's proceed further all right now it works so here you see that based on the input text variable here this particular sentence it will be paraphrased into five possible paraphrased version and so the original sentence was in this video i will be showing you how to build a stock price web application in python using the streamlit and wi-finance library and so the paraphrase version will be starting out in the same manner in this video comma and instead of saying i will be showing you it will say i will show you how to use the streamline and wi-fi net libraries and the other possibility said instead of to use the streamlit it said how to build a stock price web application and so it kind of reshuffled the ordering or mention of some of the words here and so here we have five possibilities and if we change it to one then as expected you're going to be getting one paraphrased version okay and so let's say that if i have more than one sentence i have here the first sentence and then here is the second sentence okay i think it's too fast second sentence ends here and then the third sentence is right here okay so we have three sentences but the limitation of the custom function here is that it can take one sentence at a time and so we're going to be using the sentence splitter here so the paragraph that you see here assigned to the context variable right here will then be split it up into the multiple sentences so it will essentially become a list comprising of three elements and each element is a sentence and then we're going to be applying a for loop for this list of sentence let me show you so it has three members as i've mentioned and it's going to be performing a for loop whereby it will be iterating through each of the sentence and for each iteration it will be running the get response custom function in order to make use of the iterated sentence as the input arguments and then we're specifying that we're going to generate one paraphrased version and then the generated paraphrased version will be assigned to the a variable and then the a variable will then be appended into an empty list that we created beforehand and so with each iteration the paraphrase version will then be added to the paraphrase variable let's do that let's have a look at the paraphrase variable and so here we see that it has three elements but strangely it is essentially a list of a list okay so we're going to be taking out the the first layer bracket here by using this block of code here okay and so you see that the bracket here is missing now the bracket is missing the bracket here is also missing as shown here but then we have the outer bracket and then we want to combine all of the elements here into a single paragraph and so we're going to then go ahead and use this block of code here and on the second line we're going to be stripping out the brackets and now you have the output to be a single paragraph and it looks like the input here it looks like the input that we have in the context variable okay so it's comprised of three sentences and now we have the paraphrase version so let's compare so why don't i assign this to another variable let's call it paraphrase 4 let's just call it paraphrase text okay so i'm using the strip function and we're taking out the extra single quotation marks so why don't i add that to prior variable all right there you go so here you see the original paragraph and the paraphrase version okay so it looks totally different now with the paraphrase version okay so the moment that you've all been waiting for it's the giveaway of three free ebooks of the transformer book right here transformer for natural language processing and so to win a free ebook copy let me know in the description down below how you're intending to use this particular paraphrase from pegasus model and so i'd like to hear from you on how you intend to use this and i'll be selecting three lucky winners to win a free ebook copy of the transformer for natural language processing and so winners will be announced in this particular video in the comment section and also in the community post of this youtube channel and so if you're finding value in this video please give it thumbs up subscribe if you haven't already and make sure to hit on the notification bell so that you will be notified of the next video and as always the best way to learn data science is to do data science and please enjoy the journey\n"