DeepMind Reinforcement Learning

**The Power of Generative Query Networks**

Those features they aren't hand-coded then the generation network will be asked to predict aka imagine a scene given both they previously unobserved viewpoints and the scene representation created by the first network. The generator essentially learns how to fill in the details given the highly compressed abstract representation created by the first Network inferring likely relationships between objects and regularities in the environment.

**Understanding the Relationship Between Two Networks**

I'd like in the relationship between these two networks - the relationship between a crime scene witness and a sketch artist. The witness remembers fragments of a criminal, their height, their hair color, their choice of Linux distro, and the sketch artist must discern the full picture of the criminal based on a few details inferring the likely other traits based on what they're given by the witness. Put more formally, first the algorithm collects a set of different viewpoints from the training scene, each viewpoint is an image, each fed sequentially into the representation network which is a convolutional neural network best known for image classification tasks. An image is a matrix of numbers and through a series of matrix operations a convolutional network will continually modify that input matrix. The result is the representation it will create as many representations as there are viewpoints, then they performed a summation operation on the representations to create a single representation or R. This representation is then fed to the generation network for the generation Network they used a recurrent neural network since they are capable of processing sequences of data during training. Recurrent networks aren't just continuously fed the next data point in a data set, they are also fed the learned state from the previous time step which is what gives them recurrent knowledge of the past. They learn from what they've learned before allowing for a contextual understanding that incorporates time into their predictions.

**The Role of Latent Variables**

Since they wanted an agent that could predict the next frame in a sequence of 3D environment frames, they needed to use a sequence model and the generator network used what's called a latent variable to mathematically vary the output of it. The generator then generated a likely image for a given viewpoint and that generated image was compared to the actual viewpoint an error value was computed by computing the difference in these two images mathematically. Then they updated both networks using that error to be just a bit more accurate the next training iteration via the popular back propagation technique updating the weight values of each this optimization strategy meant both the representation network and the generation network were improved over time at the same time as the agents navigated whatever environment it was in making this an end-to-end approach.

**From Simple Environments to Complex Ones**

They first trained it on a few simple 7x7 Square Maps with a few objects in them. Over time, it rapidly learned to predict what an entire map looked like. So they gave it a more complex Mais instead and over time I learned how to represent that as well at first it was a bit uncertain of some parts of the map but with more observations and by more I mean only five total its uncertainty disappeared almost entirely eventually.

**Deep Reinforcement Learning**

They wanted to use it to control a robotic arm to grab a colored object in a simulated environment. Because Yolo, not the algorithm deep reinforcement learning is a combination of deep learning aka learning a mapping and reinforcement learning aka learning from trial and error in an environment. It's been behind some of the big AI successes of the past few years like alphago a notorious DQ learner. The idea is that the AI agent learns a policy for playing a game by learning directly from pixels from the game frames no hints as to what the objective of the game is or what the controls mean.

**The Power of Data Efficiency**

The problem with this approach is that it requires a very long training time to converge to good results. So they conducted an experiment where they first trained the GQN to learn how to represent observations of the environment then they use its learned representations as input to a policy algorithm that learned how to control the arm. The representation encapsulated what I saw the arms joint angles the position and the color of the object the colors of the walls in a much more compressed way than just using the raw input pixels. And because of this, they saw that it was substantially more data efficient requiring only a quarter of the training time that a raw pixel version would require. Very impressive indeed GQN is exciting because a major limiting factor on what it can do is computing power if given enough computing power who knows what kind of amazingly detailed environments it could generate and this is exciting for anybody designers artists engineers scientists who could use a tool to help them visualize and create things.

**Three Key Takeaways**

Deep Minds Generative Query Network learned how to perceive and interpret an environment without labels. It consists of a representation network which encodes image frames and a generation network that generates them based on those representations. And it did surprisingly well requiring only a fourth of the training time for a deep reinforcement learning task that a raw pixel focused algorithm would require.

AI is never boring if you want to stay up-to-date with the latest advancements in machine learning and artificial intelligence, then this is an article worth reading. The Generative Query Network is a powerful tool that has the potential to revolutionize many fields including computer vision, robotics, and more.