TWiML x CS224n Study Group - Lesson 9

The Challenges of Natural Language Understanding: A Deep Dive into Dependency Graphs and Entity Recognition

As we navigate the complex landscape of natural language understanding, it's becoming increasingly clear that traditional approaches to text analysis are no longer sufficient. The rise of dependency graphs and entity recognition has opened up new avenues for extracting meaning from human language, but it also presents a host of challenges that must be addressed.

One of the key issues is the presence of non-standard English in newspaper headlines. Unlike well-written articles, which follow strict grammar rules, headlines often deviate from conventional syntax and punctuation. This means that traditional part-of-speech taggers or sentence structuring tools may struggle to accurately identify the meaning behind these headlines. Furthermore, this problem is not unique to headlines; it can also be found in other types of text, such as website articles written by novice authors.

This highlights a critical issue: natural language understanding requires the ability to extract meaning from a wide range of linguistic structures and styles. While dependency graphs have proven to be an effective tool for analyzing sentence structure and grammatical relationships, they may not always capture the nuances of human language. As we move forward in this field, it's essential that we develop tools that can handle the variability and complexity of real-world text.

A recent question posed by a listener shed light on the potential applications of dependency graphs in extracting triplets from sentences. The query concerned deriving entities such as "sundar pichai" and relationships like "resigned" from a given paragraph or sentence. This task, known as entity recognition, involves identifying specific concepts within text and connecting them to relevant relationships.

The listener's question sparked an interesting discussion about the role of dependency graphs in this process. It was noted that while dependency structures can provide valuable insights into grammatical relationships, they may not directly facilitate entity recognition. However, some researchers have explored the potential for using these graph-based representations to inform entity extraction tasks.

A key takeaway from this conversation is that natural language understanding is a complex and multifaceted field that requires continued innovation and exploration. By studying the challenges of dependency graphs and entity recognition, we can develop more effective tools for extracting meaning from human language. Whether in the context of analyzing newspaper headlines or extracting triplets from sentences, our goal should always be to improve our ability to understand the nuances of natural language.

In another development, a recent blog post discussed the importance of grammatical relationships encoded in dependency structures. The author noted that these representations allow us to easily recover answers to questions like "who saw whom." This example highlights the potential for dependency graphs to facilitate entity recognition and question answering tasks.

As we move forward in this field, it's essential that we continue to explore new applications and techniques for natural language understanding. By doing so, we can develop more effective tools for extracting meaning from human language and improve our ability to navigate the complexities of natural language.

The root node pointing to "saw" rather than "Alice" in a particular example sparked some discussion about why the main verb is often chosen as the root node. While there isn't a single answer, it's likely that this is due to the structural properties of dependency graphs and the importance of maintaining tree structure. In these representations, the root node serves as a kind of anchor or hub, connecting various other nodes in the graph.

By recognizing the significance of the main verb in sentence structure, we can better understand how dependency graphs facilitate natural language understanding. This highlights the ongoing need for research into the complexities and nuances of human language, as well as the development of more sophisticated tools for analyzing and extracting meaning from text.

Finally, it's worth noting that our upcoming sessions will include a segment on recent developments in language processing and machine learning. If you have any topics or ideas related to these areas that you'd like to discuss, please feel free to let us know. We're committed to providing a platform for researchers and practitioners to share their work and explore new applications of natural language understanding.

"WEBVTTKind: captionsLanguage: enso today we will be discussing the Lex the next lecture lecture five in CS 224 so this is the title says the lecture is mainly and totally about dependency parsing and sentence structure we will see what that is so we are all that means languages or or sentence so so they are starting units so basically we are trying to express something or communicate something we start we use words and we use collection of words to to make up a sentence and express whatever we want to communicate so words are the building blocks or the starting units of language that we use for communication and then when we put few words together it forms a phrase so friend it is nothing perfect words single word and sentence and hormones can combine together to form bigger phrases so these are like building blocks off of the sentences that we use for communication so what we what we see here is some sort of parts of speech tagging for every word in the sentence and we we have to do that because understanding the structure of the sentence is very important part of natural language understanding so this call is phrase structure and as we can see here every word here is given a certain category label like we can see here the word that is a determinant if you must be familiar with the the parts of speech and these are nothing but each word being given a tag depending on what part of speech it belongs to and then that's for the word level and when it comes to phrases we have these certain categories called n P P P prepositional phrases and all these things which which is some sort of a category or representation for certain kinds of configuration like NP ants for a phrase which which is of this configuration determinant adjective noun that's what we see here the cuddly cat though is a determinant cuddly is an adjective and cat is a noun so this together forms a specific phrase category called NP and similarly we have another example here by the door which is called prepositional phrase and this again one thing that we notice here is you did we I think we lost you you next you'll see the slight snuff and the slides are there the connection is a little choppy it is it still bad or is it okey enough okay okay sure thank you so here we have different categories for phrases as well just like we have for words and what we see here on the left is the curly kak is an example for a phrase category called NP which is nothing but a configuration that that comes with this sort of utter speech see is called we have record lake at here been simple faced and the first of all in its a determinant and then we have an adjective and then we have a noun so similarly we have another phrase category called PP by the door and if we notice something here that that contains another nested phrase category NP so this is also possible where you have phrase categories in which we have nested components and that's what we see here we may see some sort of recursive you shouldn't have phrase categories the kudlick ad by the door so these are we are this is how we are using one or two phrases we are putting them together to form another sentence the cuddly cat by the door it's not complete but I think it can be something it can be small information then each of these phrases do individually so these light if you're wondering these are the same CS 224 course so the slides are in two forms one is where you don't have all these scribblings and these are the sites with with the scrip links that are shown during the lecture so you can find them from the course page itself so as you see here we have one group of words here the I'm a and we have another group of words here cat-and-dog and so the width in order to make things easier and proper communication of proper communication we have like every language has its own classes its own syntax so here we have a certain track of what's called determinants that and a and similarly we have another class of words here called nouns so likewise we have other different traces what what's happening here is if we see these are all some examples of how different words can be put together to form different different configuration of sentences so here what we can do is we can say the large cat in a crate that's one example that we can from using Bayes words and then we can say the barking dog by the door so depending on how we put these things together we are going to get different sort of sentences different kinds of sentences and accordingly they're bored and free both boys and screen1 ah sorry again okay so as I was saying these are some of the examples so moving on so what what's happening here is we have have a sentence and each word in it has some sort of category that gets assigned and that somehow conveys how in the sentence is dependent on any other group of words we can see that looking a large crate in the kitchen by the door considering we are taking look as the root and then we are forming here which convey which word is dependent on which words so we here we say that look is a word that that's saying so that is somehow dependent in on these group efforts in the large crate look so we say look so very exactly do we have to look so the next series of words can be they convey the answer which is in the large crate and in the kitchen is also dependent on that so this is one example so why do we need this sort of sentence structuring and word or phrase categories so one of the reasons is that we need to understand those sentence structures so that natural language understanding is possible so we we humans widely and always all the time use language to communicate and speak with other people and intuitively parse the meaning of the syntactic meaning of this spoken by the people around us so that that's something which comes naturally for even sentences but when it comes to mission not easy even sentences which are very trivial for us to understand for the machines to understand so that is why we need intense structuring which will help the machines to better understand and at the end list and the sentences in the way that they have to be understood we when we are communicating with someone we express a lot of things it could be something about something that you did in the morning it could be as simple as that or it could be something that you have that you have been working on or complex idea that you have invented or some it could be as big as that so making machines understand all these all these different levels of communication or language is not that easy so here is one example so San Jose cops kill man with knife so we we were talking about what each of these word is like okay we have word here which is known we have a word here which is a verb so why do we really have to need that now let's say when this sort of sentence is given to the system and we don't have an structuring in place or or a proper part this this sentence could be interpreted in many different ways one way to think about this is Sandra Scott skill man with knife so one liquid to think about the sentence is that there is a man who was who is if and that person that person was K San Jose cops that's one way of interpreting it and then there is another way unjust cops kill man and they did that using a knife so quotation of the same sentence so we can see that there is alone here even with the sentence which is as simple as this now think about sentences which are more complex which have more words and that brings in a lot of a lot of possibilities so which is why we need some sort of partial or structure some sort of parser in place which will help the machines to better understand the sentences so that they will be able to get the right results so this is one example and we have another example here scientists count whales from space so this again depends on how have you read the sentence so we we will be easily able to say that the intention of the sentence is that from space we found the number of whales using some sort of technology that's one way of thinking about it but there is another way where we consider count whales from space which means these whales are present in space but not in ceará it's canvassed a total different painting so this is just another example to convey how ambiguity can can be a problem when sentences or human languages here's another example what we have here is the board approved it's by Royal Tuscola limited of Toronto for $27 at its monthly meeting so as we can see here this is a pretty long sentence when compared to the previous ones that we have seen so what we have here is we have different combination of phrases and we have we have almost four prepositional phrases all of these invite all Tresco limited after inter for $27 a share at its monthly meeting each of these is a prepositional phrase and now depending on which verb or which noun it associates or depends on the meaning is going to change so this is one example so now let's say that we have a parser in place and it is trying to figure out which is of structuring the sentence or parsing this sentence if if that parcel the algorithm of the more action is not optimal we are looking at a problem which is of exponential and we know that that is not good anything exponential is not something which which is much better so he again so shuttle veteran and longtime nasa executive credited to board so depending on how we did it even for us period the sentence meaning is going to change here the shuttle butter the executive can be can be one person the same person who is Fred correctly and who is appointed to the board and it can also be interpreted that we have two persons here one is shuttle veteran and another one is a long term NASA executive Fred Kerry and they were appointed to vote so there's another example so we have a few other examples like this too to explain or to show how the sentences can be ambiguous this is another example and you have another example here so and this again so these are there's multiple examples to convey how ambiguity can be very very tricky for natural language understanding this just if you haven't gone through the lecture or the slides mutilated body washes up on your beach to be used beach volleyball so now to be used for Olympics which volleyball bring to it is referring to on Rio beach it but the way one rates it or the way one understands it it could also convert body is being used for the Olympics beach volleyball which is which is not the right right way of interpreting this sentence so all of these examples proved in structuring and are saying is very important natural language understanding so this this is the structure of the dependency or the semantic relations as we can see here there is some sort of tree structure here the example is that the results demonstrated that ice so with this this is another example where we see the tree structure here so the results demonstrated interacts I hope I'm reading this reading tracks that I see it is a meekly yeah the results demonstrated that Chi C interacts rhythmically with us as a chi a and Chi B so these got something to do with protein interaction so this is just an example so this how the dependency tree is constructed now we can see here this is just one possibility now depending on what parser you use and how the parts are in turn works for the game for the same sentence we might have many other possible dependency trees as well which means sort of selection or choosing that has to happen like the creation or the different dependency trees for the same sentence and the PAS has to make sure that it picks the right one or the one with the highest concore so here is a little bit of our dependency grammar it goes back to why time back 1962 and even in 5th century where panini has done some work on this for the Sanskrit language and we have several other examples here the works of some of the people's like Chomsky so there are different ways of representing this some people I like the way the arrows are represented so this is just an example for that so this like we have annotated data for images or for other other kinds of data even for this we need some sort of annotated and annotated data only then we can have some sort of model that can be trained on that which can do the dependency which can do the power saying for for the new data so this is example of such annotated data that uses the white tree banks we will see that we will see that so so initially when you are getting stuff treebank it seems to be a very so slow process but several advantages like let's say we the case to have these kind of these kinds of dependent or different dependency trees and as language language let's do the each person might have different way of doing which which is not really that helpful but what these three banks do is that they make their reusable like if I'm a person who is working on something and I if there is already some work that is already done related to those related to that language I think that the the previous work can be reused when it's in the form of three banks so it lets you build other parts of speech taggers using the existing tree banks so those are some of these are some of the advantages so here we have each with some constraints yeah so this here we can see that the here we have an example sentence I will give a talk tomorrow and there is also some sort of overlapping or Criss crossing of some of some of the dependency plots so the this is possible it's not not that that frequent but what's important is that I really want cycles to be seen here like if you have a dependent but slight let's say here we have the difference between the words on bootstrapping and you have you have the arrow from which goes like this and then you don't want another same from the same word to the same words in the reverse direction so you don't want to have cycles you should it should avoid the cycles in this kind of situations there are different ways of doing dependency parsing so as we have seen before if you are simply going by all the all the call all the possible configurations we are going to have okay we're going to have an exponential it's going to take exponentially exponential time which is not good so to avoid that we can use dynamic programming so if you are even so you must be aware of dynamic programming from the normal algorithm algorithmic approach where you use the solutions to subproblems or the Riu you reuse the solutions to subproblems and build on top of that you can use dynamic programming in graph algorithms so there are different to do it in the lecture these things are not discussed in detail but rather at the partials that are being used at the moment when the part is that Google uses to parse the web pages and websites but the transition based parsing or deterministic dependency parsing so here again is a little bit of history like greedy transition based parsing in the lecture with its own it was not the detail rather we in the lecture we see that they directly switch to this example as it's believed that the example canvas Knesset better so what we have here is we have two things one is a stack and a buffer to initially start with buffer it's nothing but the list of all the words that we have in our data or let's say in our sentence initially start stack is simply going to have a root element and then we have so here we can see on the right side so we have start where the stack starts with root and then we have a buffer which is nothing but a list of all the words in our sentence and then what we see here 1 2 3 are this are the possible actions that could be performed on stack and buffer and we say that we are done when the buffer is empty so that's a finished state or the indicator that we are done so here is one example so what's happening here is that initially we get started with this where root in our stack and we have the words in our sentence I ate buffer so the first action that is performed here fixed operation so what shift does is it takes the first element in the buffer and that gets pushed to the stack so that is why we see the word I moving from buffer to stack that's what happened then we have since this is the only word that we have and in order to build dependency girls will be needing more words so we have another shift operation which means another word from the buffer is being pushed into the stack so we now have two words which means they can do and so any of the other two actions which is left are called right out and what's happening here is left arc operation is performed which means we are saying that the word 8 is for the word aide the subject is I so this is one dependency that was figured out from this from this state that we have here so here left arc is performed and then we once that is done we have another shift operation which means we now have only fish in the buffer and we have the word 8 in stack so again a similar we have where we have a right ox here which means we have another dependency here the word fish is dependent on 8 like fish is the object of 8 so that's what this means object 8 so then we have another writer cooperation here so even if if you store if you haven't gone to the lecture you may not get it or the first time you are going it's so that's fine the important thing to note here is that it ends here and we somehow we somehow need to do the dependency graph or how these words are dependent a monkey so for that we see that we have a list of actions that are possible so sins are pretty good at identifying how these things are related to each other quickly able to use these operations and see how can be constructed but now think about this from a mission perspective we have these three actions here and the program of the machine has to identify which one to pick next so we started its stack and buffer and we know that initially it would be shift to begin things with but for the subsequent steps how does the machine know whether to go with left arc or right arc so that's where the machine learning comes in so this are we will get to the part where we discussed how machine learning makes that make the selection of action or the next action easier so we have another in this slide what we see here is that in that this discusses the conventional feature representation we are familiar with this where initially before using something like word to back we had we had wooden bearings which are very sparse something moorings which are not really that good and then we move to the presentation using word works so to evaluate the dependency parsing for example given a sentence she saw the video lecture how do we know that the partial has this again is is more like how the how ml we have labeled data which give us the correct label will have the predicted labels and then we are simply going to compare these two and have good the parser has done so on the left here what we see is this this box contains the actual or the right dependences there are two things here one is we can see the all these words are indexed so that it makes it easier for us to interpret or visualize how these dependences go go about so we see that we the words she and saw there is some they there is there's a curve here for these two words and the relation between those two is subject so she the word she is the subject here for the word saw so that's what this this conveys so on the extreme end of this this box we have these word categories like subject root determinant and in an object and what we have here on the left side is how which word is dependent on which other word so that's what we have here the right side box we have the part let's say you have you some parsing algorithm and these are the results that we we and now to evaluate the dependency parsing one we can simply go with accuracy like correct dependence graph dependences that is one way of doing it and then we also have something called you a a S which is unlabeled score and labeled score so comes from unlabeled score what we are going to see is you're not going to consider this half here and subject to you are completely going to ignore this your instead of only going to focus on whether the dependency plots are accurately done or not so just focus on the numbers here so we have one two here which means there is some sort of dependencies between the words one and two and the past or the predicted result also has one and two which means it got the you it's right in the in but here in the third example we see that for the words da and for the words three and you three and four so which is not right so which is why for the unlabeled score we get 80% we only get 80% accuracy so that's one way of measuring the accuracy and they have something called labelled score in which just just give me a minute please you yep what we are going to compare is we are going to compare the predicted categories actual categories and we share route debt whereas here we see that four and five wrong which is why the accuracy here is not that good as compared to you but these are some of the metrics that can be used for evaluating dependency parsing oh why train a neural dependency parser well why do we have to use this so there are three problems here they are missing not so what we have there and this okay yeah so there are some problems with the conventional dependency parsers which are addressed by numeral one is we saw that in in the conventional approach we have sparse representations whereas with you know once we have a distributed dimensions which makes things easier and so that's one of them so a neural dependency part so this is a paper which was this this some work which was done by chain and Manning Manning is the course instructor for serious 2-24 for this course so they have come up with this work a neural dependency partial which as you can see here so think that week notice here is that we have different parts as here the C is not only giving good results like for unlabeled Scobie and for labels code we have 89 which is comparatively better than all of but the important thing is this which is 654 so this is this is time it is taking this is number of sentence that's per second which is maximum and compared to the other parcel so we can see here that the CM c NM 2014 day april than the previous parsers so we have seen in the previous lectures have the distributed representations of word two back has made l has made a lot of things easier so the same thing is with dependency graph dependency as well so instead of using sparse representation we can go with distributed representations and where we can see here that similar words are being clubbed together was easy so good is the totally is a word which is from it totally different class which is why we can see here that it is it is at a different it comes in a different region in in the vector space vector space sorry and then we have another words here Congo which are verbs so you can see like using distributed representations we have we have the repair person mentally separated from among so these what we see here is that these are some son station file for getting the distributed representations all have put together to form the representations of the cat so these things are converted to vector embeddings so this this is a typical neural network architecture we have and the beginning and then we have a hidden layer and then we have a simple softmax oath so this is the moral architecture for this so dependency power sign for sentence structure yeah so as proposed the transition based mural dependency parsing has outperformed the traditional of the conventional approaches if that's because these neural networks are bigger and there we can see that we we know that the neural networks today are much deeper and bigger than how they used to be and so another thing here is that beam search is also used so what the neural net the way the neural network works in this case is that we have seen earlier where the the model has to choose between what step it has to perform next is it shift or is it left or is it right arc now the model might come up with one action always be right which is why at every state we need some sort of hypothesis which says that okay these are the possible two steps that could happen that can be done using beam search so these are all these are all the things that were discussed in the five I have another blog that I would like to go through quick I'll share the screen for that I hope you're all able to see Google a blog now so this this blog on syntax net which is within which is which is a partial so this is one example here Alice saw Bob Alice is a noun saw is a verb and Bob is a noun again so here we are going to take saw as the root Alice is going to be the subject of it and Bob is going to be its object so this is one way of parsing a sentence in this case it's a very simple sentence and we got it right even a parser will get it right and now here is another example Alice who had been reading about syntax net saw Bob all day yesterday so this is a much bigger sentence and more complexing so so what's happening here is that as it's been here yeah the example so why is passing so hard for computers to get right able to see this clearly we have this example analyst who had been reading about yeah Alice drove down the street in her car this and it's it's the same center again here but we can see that the dependency the partial has different sort of tagging for both for the same sentence we have multiple possibilities here this let's focus on the word in which is the preposition here in the first case on the Left we see that the word n has a dependency with the word drove which is a verb so these two are linked here drove and in there isn't the right what we have is the word in has a dependency with the word Street which is a noun so however how is that going to change things so the first in the first sentence we see that Alice drove down the street in her car which means that Alice is driving and she has driven down down the street so she is in the car and she has gone down the street that's how it that's what it means now the sentence on the right depends based on the way it's tagged or structured it means that Alice has had a lisa has driven down the street and the street in here it conveys that the street is actually in her car so this is one example we have the way the a structure makes a lot of difference here we have in a Dan stack and buffer thing that we discussed using the same three shift left talk and write up so this is an example and yeah so this is all that was just I I will go to the chat window now to see if there are any questions so if anyone wants to add anything or ask any questions I think we can do that now so there is a question a part of the problem is these are all headlines of newspaper articles and they do not follow all the rules of English true yes so the problem is not just with the headlines of newspaper articles so when we are dealing with texts they could be from website in which the author is someone who has written every who who has a very well-written article that follows all the grammar whereas it could be just another web page which is full of text but it doesn't follow all the grammar so I think the problem the problem is we will be seeing just with headlines of news for particles but everywhere in which so think about think about it this way so for your part of speech tagger or for any other finality sentence structuring and dependent something which is the core part of natural language understanding so which means for every action that uses language like it could be it could be for chat boards or it could be praying or it could be for any of such examples I think so becomes critical there do we have any other questions you hello hello yeah um I I just had a general question so I have a problem in my workplace so it's regarding like deriving triplets from sentences so do you think in this dependency graph would help in that so suppose you have a sentence like sundar pichai resigned as the CEO of Google and I want to derive I mean extract the leg sundar pichai and then resigned and then Google these three things so I want to extract from a sentence so given any Center given a paragraph I should be able to extract a triplet from that paragraph it could be so I have a set of relations which I am interested in like resigned or joined as director something like that so do you think dependency graph will help in that I think what you're asking about is more of an entity recognition if I'm not wrong it's not just entity so I have to extract entered it will be a subject there will be an object and there will be a relation which connects them again that which I resigned and Google so sundar Pichai and Google are entities whereas resigned is a relation that connects them I will mean this was a problem that I'm facing and I'm I was thinking of using dependency graph as for this yes I think I remember reading somewhere here in the same blog itself that means if I can find it yeah so here it says that the grammatical relationships encoded in dependency structures allow us to easily recover the answers like we have up here Alice I've been reading about syntax with something so now whom did Alice see who saw Bob so these these kind of Aria can be answered using these sort of dependency structures so I think yes I think it must be possible to extract that kind of information using dependency structures though I haven't but I think maybe that's possible okay yeah I'll check out this blog yep Thanks yeah I will be sharing this in the channel ok yeah thank you so we have another question so why is why is the root pointing pointing to saw not to Alice yeah so so in some of the examples the route was chosen as I think most of the time the main verb is the main verb is Rocha even in the example that we saw Luke was root so even the other examples it's like the root is often picked as the verb so maybe it's that's the that's one of the reasons but I'm not sure exactly why that is like why verb always yeah I guess most of the time the main verb in the sentence is the root yeah and maybe that's because and you have in this example let's say we have Alice and Bob and so some sort of dependency with both the words so maybe that has to be the reason like the main verb of the sentence of some sort of link or connection with with the earth so maybe that's why it gets picked as the root you probably because of the tree structure because otherwise you won't be able to maintain tree structure was there any question hi so I was saying it's probably because that's the only way you can maintain the tree structure yes yeah otherwise you won't you will lose the tree structure you'll have backwards and complex yeah so if there are no that think we are done with today study study coupe going to be on Turner networks I think from here on we're going to have more dealer network stuff which so we also have a segment coming so mean really anyone wants to present any of the upcoming sessions or it may not necessarily be related to directly to the lectures if you if you find any topic that is relevant to an LP and you you would like to discuss about it please feel free to let us know so that we can plan accordingly so yeah thank you all for joining see you all next week happy weekendso today we will be discussing the Lex the next lecture lecture five in CS 224 so this is the title says the lecture is mainly and totally about dependency parsing and sentence structure we will see what that is so we are all that means languages or or sentence so so they are starting units so basically we are trying to express something or communicate something we start we use words and we use collection of words to to make up a sentence and express whatever we want to communicate so words are the building blocks or the starting units of language that we use for communication and then when we put few words together it forms a phrase so friend it is nothing perfect words single word and sentence and hormones can combine together to form bigger phrases so these are like building blocks off of the sentences that we use for communication so what we what we see here is some sort of parts of speech tagging for every word in the sentence and we we have to do that because understanding the structure of the sentence is very important part of natural language understanding so this call is phrase structure and as we can see here every word here is given a certain category label like we can see here the word that is a determinant if you must be familiar with the the parts of speech and these are nothing but each word being given a tag depending on what part of speech it belongs to and then that's for the word level and when it comes to phrases we have these certain categories called n P P P prepositional phrases and all these things which which is some sort of a category or representation for certain kinds of configuration like NP ants for a phrase which which is of this configuration determinant adjective noun that's what we see here the cuddly cat though is a determinant cuddly is an adjective and cat is a noun so this together forms a specific phrase category called NP and similarly we have another example here by the door which is called prepositional phrase and this again one thing that we notice here is you did we I think we lost you you next you'll see the slight snuff and the slides are there the connection is a little choppy it is it still bad or is it okey enough okay okay sure thank you so here we have different categories for phrases as well just like we have for words and what we see here on the left is the curly kak is an example for a phrase category called NP which is nothing but a configuration that that comes with this sort of utter speech see is called we have record lake at here been simple faced and the first of all in its a determinant and then we have an adjective and then we have a noun so similarly we have another phrase category called PP by the door and if we notice something here that that contains another nested phrase category NP so this is also possible where you have phrase categories in which we have nested components and that's what we see here we may see some sort of recursive you shouldn't have phrase categories the kudlick ad by the door so these are we are this is how we are using one or two phrases we are putting them together to form another sentence the cuddly cat by the door it's not complete but I think it can be something it can be small information then each of these phrases do individually so these light if you're wondering these are the same CS 224 course so the slides are in two forms one is where you don't have all these scribblings and these are the sites with with the scrip links that are shown during the lecture so you can find them from the course page itself so as you see here we have one group of words here the I'm a and we have another group of words here cat-and-dog and so the width in order to make things easier and proper communication of proper communication we have like every language has its own classes its own syntax so here we have a certain track of what's called determinants that and a and similarly we have another class of words here called nouns so likewise we have other different traces what what's happening here is if we see these are all some examples of how different words can be put together to form different different configuration of sentences so here what we can do is we can say the large cat in a crate that's one example that we can from using Bayes words and then we can say the barking dog by the door so depending on how we put these things together we are going to get different sort of sentences different kinds of sentences and accordingly they're bored and free both boys and screen1 ah sorry again okay so as I was saying these are some of the examples so moving on so what what's happening here is we have have a sentence and each word in it has some sort of category that gets assigned and that somehow conveys how in the sentence is dependent on any other group of words we can see that looking a large crate in the kitchen by the door considering we are taking look as the root and then we are forming here which convey which word is dependent on which words so we here we say that look is a word that that's saying so that is somehow dependent in on these group efforts in the large crate look so we say look so very exactly do we have to look so the next series of words can be they convey the answer which is in the large crate and in the kitchen is also dependent on that so this is one example so why do we need this sort of sentence structuring and word or phrase categories so one of the reasons is that we need to understand those sentence structures so that natural language understanding is possible so we we humans widely and always all the time use language to communicate and speak with other people and intuitively parse the meaning of the syntactic meaning of this spoken by the people around us so that that's something which comes naturally for even sentences but when it comes to mission not easy even sentences which are very trivial for us to understand for the machines to understand so that is why we need intense structuring which will help the machines to better understand and at the end list and the sentences in the way that they have to be understood we when we are communicating with someone we express a lot of things it could be something about something that you did in the morning it could be as simple as that or it could be something that you have that you have been working on or complex idea that you have invented or some it could be as big as that so making machines understand all these all these different levels of communication or language is not that easy so here is one example so San Jose cops kill man with knife so we we were talking about what each of these word is like okay we have word here which is known we have a word here which is a verb so why do we really have to need that now let's say when this sort of sentence is given to the system and we don't have an structuring in place or or a proper part this this sentence could be interpreted in many different ways one way to think about this is Sandra Scott skill man with knife so one liquid to think about the sentence is that there is a man who was who is if and that person that person was K San Jose cops that's one way of interpreting it and then there is another way unjust cops kill man and they did that using a knife so quotation of the same sentence so we can see that there is alone here even with the sentence which is as simple as this now think about sentences which are more complex which have more words and that brings in a lot of a lot of possibilities so which is why we need some sort of partial or structure some sort of parser in place which will help the machines to better understand the sentences so that they will be able to get the right results so this is one example and we have another example here scientists count whales from space so this again depends on how have you read the sentence so we we will be easily able to say that the intention of the sentence is that from space we found the number of whales using some sort of technology that's one way of thinking about it but there is another way where we consider count whales from space which means these whales are present in space but not in ceará it's canvassed a total different painting so this is just another example to convey how ambiguity can can be a problem when sentences or human languages here's another example what we have here is the board approved it's by Royal Tuscola limited of Toronto for $27 at its monthly meeting so as we can see here this is a pretty long sentence when compared to the previous ones that we have seen so what we have here is we have different combination of phrases and we have we have almost four prepositional phrases all of these invite all Tresco limited after inter for $27 a share at its monthly meeting each of these is a prepositional phrase and now depending on which verb or which noun it associates or depends on the meaning is going to change so this is one example so now let's say that we have a parser in place and it is trying to figure out which is of structuring the sentence or parsing this sentence if if that parcel the algorithm of the more action is not optimal we are looking at a problem which is of exponential and we know that that is not good anything exponential is not something which which is much better so he again so shuttle veteran and longtime nasa executive credited to board so depending on how we did it even for us period the sentence meaning is going to change here the shuttle butter the executive can be can be one person the same person who is Fred correctly and who is appointed to the board and it can also be interpreted that we have two persons here one is shuttle veteran and another one is a long term NASA executive Fred Kerry and they were appointed to vote so there's another example so we have a few other examples like this too to explain or to show how the sentences can be ambiguous this is another example and you have another example here so and this again so these are there's multiple examples to convey how ambiguity can be very very tricky for natural language understanding this just if you haven't gone through the lecture or the slides mutilated body washes up on your beach to be used beach volleyball so now to be used for Olympics which volleyball bring to it is referring to on Rio beach it but the way one rates it or the way one understands it it could also convert body is being used for the Olympics beach volleyball which is which is not the right right way of interpreting this sentence so all of these examples proved in structuring and are saying is very important natural language understanding so this this is the structure of the dependency or the semantic relations as we can see here there is some sort of tree structure here the example is that the results demonstrated that ice so with this this is another example where we see the tree structure here so the results demonstrated interacts I hope I'm reading this reading tracks that I see it is a meekly yeah the results demonstrated that Chi C interacts rhythmically with us as a chi a and Chi B so these got something to do with protein interaction so this is just an example so this how the dependency tree is constructed now we can see here this is just one possibility now depending on what parser you use and how the parts are in turn works for the game for the same sentence we might have many other possible dependency trees as well which means sort of selection or choosing that has to happen like the creation or the different dependency trees for the same sentence and the PAS has to make sure that it picks the right one or the one with the highest concore so here is a little bit of our dependency grammar it goes back to why time back 1962 and even in 5th century where panini has done some work on this for the Sanskrit language and we have several other examples here the works of some of the people's like Chomsky so there are different ways of representing this some people I like the way the arrows are represented so this is just an example for that so this like we have annotated data for images or for other other kinds of data even for this we need some sort of annotated and annotated data only then we can have some sort of model that can be trained on that which can do the dependency which can do the power saying for for the new data so this is example of such annotated data that uses the white tree banks we will see that we will see that so so initially when you are getting stuff treebank it seems to be a very so slow process but several advantages like let's say we the case to have these kind of these kinds of dependent or different dependency trees and as language language let's do the each person might have different way of doing which which is not really that helpful but what these three banks do is that they make their reusable like if I'm a person who is working on something and I if there is already some work that is already done related to those related to that language I think that the the previous work can be reused when it's in the form of three banks so it lets you build other parts of speech taggers using the existing tree banks so those are some of these are some of the advantages so here we have each with some constraints yeah so this here we can see that the here we have an example sentence I will give a talk tomorrow and there is also some sort of overlapping or Criss crossing of some of some of the dependency plots so the this is possible it's not not that that frequent but what's important is that I really want cycles to be seen here like if you have a dependent but slight let's say here we have the difference between the words on bootstrapping and you have you have the arrow from which goes like this and then you don't want another same from the same word to the same words in the reverse direction so you don't want to have cycles you should it should avoid the cycles in this kind of situations there are different ways of doing dependency parsing so as we have seen before if you are simply going by all the all the call all the possible configurations we are going to have okay we're going to have an exponential it's going to take exponentially exponential time which is not good so to avoid that we can use dynamic programming so if you are even so you must be aware of dynamic programming from the normal algorithm algorithmic approach where you use the solutions to subproblems or the Riu you reuse the solutions to subproblems and build on top of that you can use dynamic programming in graph algorithms so there are different to do it in the lecture these things are not discussed in detail but rather at the partials that are being used at the moment when the part is that Google uses to parse the web pages and websites but the transition based parsing or deterministic dependency parsing so here again is a little bit of history like greedy transition based parsing in the lecture with its own it was not the detail rather we in the lecture we see that they directly switch to this example as it's believed that the example canvas Knesset better so what we have here is we have two things one is a stack and a buffer to initially start with buffer it's nothing but the list of all the words that we have in our data or let's say in our sentence initially start stack is simply going to have a root element and then we have so here we can see on the right side so we have start where the stack starts with root and then we have a buffer which is nothing but a list of all the words in our sentence and then what we see here 1 2 3 are this are the possible actions that could be performed on stack and buffer and we say that we are done when the buffer is empty so that's a finished state or the indicator that we are done so here is one example so what's happening here is that initially we get started with this where root in our stack and we have the words in our sentence I ate buffer so the first action that is performed here fixed operation so what shift does is it takes the first element in the buffer and that gets pushed to the stack so that is why we see the word I moving from buffer to stack that's what happened then we have since this is the only word that we have and in order to build dependency girls will be needing more words so we have another shift operation which means another word from the buffer is being pushed into the stack so we now have two words which means they can do and so any of the other two actions which is left are called right out and what's happening here is left arc operation is performed which means we are saying that the word 8 is for the word aide the subject is I so this is one dependency that was figured out from this from this state that we have here so here left arc is performed and then we once that is done we have another shift operation which means we now have only fish in the buffer and we have the word 8 in stack so again a similar we have where we have a right ox here which means we have another dependency here the word fish is dependent on 8 like fish is the object of 8 so that's what this means object 8 so then we have another writer cooperation here so even if if you store if you haven't gone to the lecture you may not get it or the first time you are going it's so that's fine the important thing to note here is that it ends here and we somehow we somehow need to do the dependency graph or how these words are dependent a monkey so for that we see that we have a list of actions that are possible so sins are pretty good at identifying how these things are related to each other quickly able to use these operations and see how can be constructed but now think about this from a mission perspective we have these three actions here and the program of the machine has to identify which one to pick next so we started its stack and buffer and we know that initially it would be shift to begin things with but for the subsequent steps how does the machine know whether to go with left arc or right arc so that's where the machine learning comes in so this are we will get to the part where we discussed how machine learning makes that make the selection of action or the next action easier so we have another in this slide what we see here is that in that this discusses the conventional feature representation we are familiar with this where initially before using something like word to back we had we had wooden bearings which are very sparse something moorings which are not really that good and then we move to the presentation using word works so to evaluate the dependency parsing for example given a sentence she saw the video lecture how do we know that the partial has this again is is more like how the how ml we have labeled data which give us the correct label will have the predicted labels and then we are simply going to compare these two and have good the parser has done so on the left here what we see is this this box contains the actual or the right dependences there are two things here one is we can see the all these words are indexed so that it makes it easier for us to interpret or visualize how these dependences go go about so we see that we the words she and saw there is some they there is there's a curve here for these two words and the relation between those two is subject so she the word she is the subject here for the word saw so that's what this this conveys so on the extreme end of this this box we have these word categories like subject root determinant and in an object and what we have here on the left side is how which word is dependent on which other word so that's what we have here the right side box we have the part let's say you have you some parsing algorithm and these are the results that we we and now to evaluate the dependency parsing one we can simply go with accuracy like correct dependence graph dependences that is one way of doing it and then we also have something called you a a S which is unlabeled score and labeled score so comes from unlabeled score what we are going to see is you're not going to consider this half here and subject to you are completely going to ignore this your instead of only going to focus on whether the dependency plots are accurately done or not so just focus on the numbers here so we have one two here which means there is some sort of dependencies between the words one and two and the past or the predicted result also has one and two which means it got the you it's right in the in but here in the third example we see that for the words da and for the words three and you three and four so which is not right so which is why for the unlabeled score we get 80% we only get 80% accuracy so that's one way of measuring the accuracy and they have something called labelled score in which just just give me a minute please you yep what we are going to compare is we are going to compare the predicted categories actual categories and we share route debt whereas here we see that four and five wrong which is why the accuracy here is not that good as compared to you but these are some of the metrics that can be used for evaluating dependency parsing oh why train a neural dependency parser well why do we have to use this so there are three problems here they are missing not so what we have there and this okay yeah so there are some problems with the conventional dependency parsers which are addressed by numeral one is we saw that in in the conventional approach we have sparse representations whereas with you know once we have a distributed dimensions which makes things easier and so that's one of them so a neural dependency part so this is a paper which was this this some work which was done by chain and Manning Manning is the course instructor for serious 2-24 for this course so they have come up with this work a neural dependency partial which as you can see here so think that week notice here is that we have different parts as here the C is not only giving good results like for unlabeled Scobie and for labels code we have 89 which is comparatively better than all of but the important thing is this which is 654 so this is this is time it is taking this is number of sentence that's per second which is maximum and compared to the other parcel so we can see here that the CM c NM 2014 day april than the previous parsers so we have seen in the previous lectures have the distributed representations of word two back has made l has made a lot of things easier so the same thing is with dependency graph dependency as well so instead of using sparse representation we can go with distributed representations and where we can see here that similar words are being clubbed together was easy so good is the totally is a word which is from it totally different class which is why we can see here that it is it is at a different it comes in a different region in in the vector space vector space sorry and then we have another words here Congo which are verbs so you can see like using distributed representations we have we have the repair person mentally separated from among so these what we see here is that these are some son station file for getting the distributed representations all have put together to form the representations of the cat so these things are converted to vector embeddings so this this is a typical neural network architecture we have and the beginning and then we have a hidden layer and then we have a simple softmax oath so this is the moral architecture for this so dependency power sign for sentence structure yeah so as proposed the transition based mural dependency parsing has outperformed the traditional of the conventional approaches if that's because these neural networks are bigger and there we can see that we we know that the neural networks today are much deeper and bigger than how they used to be and so another thing here is that beam search is also used so what the neural net the way the neural network works in this case is that we have seen earlier where the the model has to choose between what step it has to perform next is it shift or is it left or is it right arc now the model might come up with one action always be right which is why at every state we need some sort of hypothesis which says that okay these are the possible two steps that could happen that can be done using beam search so these are all these are all the things that were discussed in the five I have another blog that I would like to go through quick I'll share the screen for that I hope you're all able to see Google a blog now so this this blog on syntax net which is within which is which is a partial so this is one example here Alice saw Bob Alice is a noun saw is a verb and Bob is a noun again so here we are going to take saw as the root Alice is going to be the subject of it and Bob is going to be its object so this is one way of parsing a sentence in this case it's a very simple sentence and we got it right even a parser will get it right and now here is another example Alice who had been reading about syntax net saw Bob all day yesterday so this is a much bigger sentence and more complexing so so what's happening here is that as it's been here yeah the example so why is passing so hard for computers to get right able to see this clearly we have this example analyst who had been reading about yeah Alice drove down the street in her car this and it's it's the same center again here but we can see that the dependency the partial has different sort of tagging for both for the same sentence we have multiple possibilities here this let's focus on the word in which is the preposition here in the first case on the Left we see that the word n has a dependency with the word drove which is a verb so these two are linked here drove and in there isn't the right what we have is the word in has a dependency with the word Street which is a noun so however how is that going to change things so the first in the first sentence we see that Alice drove down the street in her car which means that Alice is driving and she has driven down down the street so she is in the car and she has gone down the street that's how it that's what it means now the sentence on the right depends based on the way it's tagged or structured it means that Alice has had a lisa has driven down the street and the street in here it conveys that the street is actually in her car so this is one example we have the way the a structure makes a lot of difference here we have in a Dan stack and buffer thing that we discussed using the same three shift left talk and write up so this is an example and yeah so this is all that was just I I will go to the chat window now to see if there are any questions so if anyone wants to add anything or ask any questions I think we can do that now so there is a question a part of the problem is these are all headlines of newspaper articles and they do not follow all the rules of English true yes so the problem is not just with the headlines of newspaper articles so when we are dealing with texts they could be from website in which the author is someone who has written every who who has a very well-written article that follows all the grammar whereas it could be just another web page which is full of text but it doesn't follow all the grammar so I think the problem the problem is we will be seeing just with headlines of news for particles but everywhere in which so think about think about it this way so for your part of speech tagger or for any other finality sentence structuring and dependent something which is the core part of natural language understanding so which means for every action that uses language like it could be it could be for chat boards or it could be praying or it could be for any of such examples I think so becomes critical there do we have any other questions you hello hello yeah um I I just had a general question so I have a problem in my workplace so it's regarding like deriving triplets from sentences so do you think in this dependency graph would help in that so suppose you have a sentence like sundar pichai resigned as the CEO of Google and I want to derive I mean extract the leg sundar pichai and then resigned and then Google these three things so I want to extract from a sentence so given any Center given a paragraph I should be able to extract a triplet from that paragraph it could be so I have a set of relations which I am interested in like resigned or joined as director something like that so do you think dependency graph will help in that I think what you're asking about is more of an entity recognition if I'm not wrong it's not just entity so I have to extract entered it will be a subject there will be an object and there will be a relation which connects them again that which I resigned and Google so sundar Pichai and Google are entities whereas resigned is a relation that connects them I will mean this was a problem that I'm facing and I'm I was thinking of using dependency graph as for this yes I think I remember reading somewhere here in the same blog itself that means if I can find it yeah so here it says that the grammatical relationships encoded in dependency structures allow us to easily recover the answers like we have up here Alice I've been reading about syntax with something so now whom did Alice see who saw Bob so these these kind of Aria can be answered using these sort of dependency structures so I think yes I think it must be possible to extract that kind of information using dependency structures though I haven't but I think maybe that's possible okay yeah I'll check out this blog yep Thanks yeah I will be sharing this in the channel ok yeah thank you so we have another question so why is why is the root pointing pointing to saw not to Alice yeah so so in some of the examples the route was chosen as I think most of the time the main verb is the main verb is Rocha even in the example that we saw Luke was root so even the other examples it's like the root is often picked as the verb so maybe it's that's the that's one of the reasons but I'm not sure exactly why that is like why verb always yeah I guess most of the time the main verb in the sentence is the root yeah and maybe that's because and you have in this example let's say we have Alice and Bob and so some sort of dependency with both the words so maybe that has to be the reason like the main verb of the sentence of some sort of link or connection with with the earth so maybe that's why it gets picked as the root you probably because of the tree structure because otherwise you won't be able to maintain tree structure was there any question hi so I was saying it's probably because that's the only way you can maintain the tree structure yes yeah otherwise you won't you will lose the tree structure you'll have backwards and complex yeah so if there are no that think we are done with today study study coupe going to be on Turner networks I think from here on we're going to have more dealer network stuff which so we also have a segment coming so mean really anyone wants to present any of the upcoming sessions or it may not necessarily be related to directly to the lectures if you if you find any topic that is relevant to an LP and you you would like to discuss about it please feel free to let us know so that we can plan accordingly so yeah thank you all for joining see you all next week happy weekend\n"

TWiML x CS224n Study Group - Lesson 9

Random Videos