Hardware vs Software & Digital Video - Computerphile

The Evolution of Video Compression: From Computational Complexity to Specialist Hardware

We've traded computational complexity for compression ratios so now nobody would think of doing HEVC and software is just too hard and you would have very low frame rate or very high power or it would just be horrible now we do it's a specialist hardware block and you can guarantee you know my hardware block will decode you a frame of HEVC in exactly that number of milliseconds come hello high water so uh you sometimes can't do that with software so you can basically build in assumptions because you know the statistics you you can and we do we work absolutely to the worst case and we say well what is the most complex frame we could reasonably receive it's this okay can we decode that in less than a frame's worth of time yes we can you know and we prove it it's harder to do that with software so to be honest it's better using specialist hardware.

Digital Video Compression: Understanding the Basics

Digital video works on compressing an image, starting with a 1080p frame. You have 2000 pixels, each with a color expressible as 24 bits or something. This results in a large quantity of data if you were streaming that data, it would be a lot of megabits per second completely unfeasible just doesn't work. To address this, we compress the video down and to do that, what we do is observe that from if you right take a picture and you look at that, how much of that picture actually changes in a 30th of a second. You're looking at my head and shoulders, yeah um, what of that is actually changing? Well, actually it's mainly my mouth and I'm waving my fingers from time to time, you know the shirt isn't changing in the background behind me, the wall certainly isn't changing.

What we do is work out which bits change and which bits don't change. Often what happens in video is the bits that change they don't really change, they move. So you go well that bit was there on this frame and that bit is there on that frame, so you encode the difference by saying well it moved in that direction, that far, and then when it got there, the colors were slightly different. You produce the deltas of the colors, the direction of the motion vector, the length of the motion vector, and you encode all of that down into a bunch of numbers and transmit those, that's how digital video works.

The Elements of Compression Standards

There are lots of different compression standards but the elements are the same. The eye doesn't see most of those changes, are very sensitive to certain types of things than it is others. If it's brighter or darker, the eyes quite sensitive to that if it subtly changes color, the eye is less sensitive to that and if it moves, the eye is incredibly sensitive to that. So we trade those things off against each other in different compression standards so that's why if somebody sees a glitch when they're watching a streamed video you see some of the screen stay where it is and maybe some yeah, it basically is a problem with receiving the data.

Leaving the previous frame up and then replacing it with each bit of new data that you receive. If the data stream gets corrupted, you usually get a bit of the last frame on top of the last one, your eye doesn't like that because if you're watching motion, you think yes my hand should move slowly across like that, and if my hand goes like that, you think that's wrong. Um, and if there's any sort of sparkle in it so you get sort of points of bright white light in the middle of it and things like that, eye is incredibly sensitive to those sort of areas.

Better Compression Standards: Tailored for Human Eyes

These compression standards are tailored for human eyes. I'm guessing yes, if you talk to the guys who invent this they talk about the psycho visual understanding of how the eye works. There's still fundamental research going on into how the eye and the ear work because we do the same sort of thing for audio. Is how the brain actually works because if we knew how the brain really worked as opposed to how we thought it worked, we could probably do a better job of this.

There's plenty of scope for getting this better in future. A lot of very clever scientists working on it by subtracting these frames from each other we can virtually shroud the influence of daylight so we can even eliminate cast shadows which would be cast by direct sunlight.

"WEBVTTKind: captionsLanguage: enfrom what i understand from professor steve ferber part of the idea of reducing destruction sets was to do more stuff in software is that am i going along the right lines there you absolutely are so the original risk principle reduce instruction set computing is you should never do something in hardware that you can do in software because uh you keep the hardware simple uh you keep the hardware low power high efficiency you don't waste time and effort and watts joules doing things that you could just do in software well we've put a few wrinkles on that since we needed to put things in hardware that were too complex to get right in software so if it's really really hard to guarantee that you've got it correct probably the best place to do that is the hardware about 12 years ago i came to work directly for arms so i'm not one of the really early boys at all i knew several of the founders myself they were friends and still are but i only joined about 12 years ago at the point of which arm was realizing they had to take software a whole bunch more seriously and we just designed a multi-processor cpu so we had more you know multi-core cpus and putting those together um was introducing some complexities in the software and the linux movement was really taking off big time and so companies had to work out who was going to write this who was going to support it how do we encourage other people to use this software once we've sent it out there in the world and it was it was a really interesting time it's the right time to come in for that and i worked very much at the junction of software and hardware for many years so my counterpart on the hardware side you know he and i would have long discussions about well we could do this and why can't you just do that in software and you know it would always go around in arguments like that my understanding of that from a video point of view is quite often it's easier to have a hardware decoder to make sure that you can play video smoothly is that the sort of thing we're talking about absolutely so we have a business in the media processing group doing hardware decode and hardware encode and the answer is that we can build specialist hardware to do it much more efficiently than you could do it in software and cpu is 10x 10x the sort of now 10 times is that yeah 10 times so it's interesting these things change all the time so when um when we were in the realm of mpeg and cpus got really good it looked like oh it's not worth doing this anymore but actually we then went from mpeg to h.264 and then went from h264 to huvc or h.265 and we've traded computational complexity for compression ratios so now nobody would think of doing hevc and software is just just too hard and you would have very low frame rate or very high power or it would just be horrible now you do it's doing a specialist hardware block and you can guarantee you know my hardware block will decode you a frame of hevc in exactly that number of milliseconds come hello high water so uh you sometimes can't do that with software so you can basically build in assumptions because you know the statistics you you can and we do we work absolutely to the worst case and we say well what is the most complex frame we could reasonably receive it's this okay can we decode that in less than a frame's worth of time yes we can you know and we prove it it's harder to do that with sulfur what does the most complicated video frame look oh um so all right digital video uh works on compressing an image so you start with if it's a 1080p frame you've got 2000 pixels all of which has a color which is expressible as 24 bits or something and so that's quite a large quantity of data if you were streaming that data that would be a lot of megabits per second completely unfeasible just doesn't work so what we do is we compress the video down and to do that what we do is we observe that from if you right take a picture and you look at that how much of that picture actually changes in a 30th of a second so you're i don't know what you're looking at let's say you're looking at my you know my head and shoulders yeah um what of that is actually changing well actually it's mainly my mouth and i'm waving my fingers from time to time you know the shirts not changing in the background behind me the wall certainly isn't changing and so what what we do in digital videos i was using a tripod to be honest if you were using a tripod your videos would be smaller what we do is we work out which bits change work out which bits don't change and often what happens in video is the bits that change they don't really change they move so you go well that bit was there on this frame and that bit is there on that frame and so what we do is we encode the difference by saying well it moved in that direction that far and then when it by the time it got there the colors were slightly different so you you produce the deltas of the colors the direction of the motion vector the length of the motion vector and you encode all of that down into a bunch of numbers and you transmit those that's how digital video works and it's all much of a muchness there are lots of different compression standards but the elements are the same which are that the eye doesn't see most of those changes are very and it's much more sensitive to certain types of things than it is others so if it's brighter or darker the eyes quite sensitive to that if it subtly changes color the eye is less sensitive to that and if it moves the eye is incredibly sensitive to that so we trade those things off against each other in different compression standards so that's why if somebody sees a glitch when they're watching a streamed video you see some of the screens stay where it is and maybe some yeah it basically is a problem with receiving the data also yeah so what usually happens is um you leave the previous frame up and then replace it with each bit of new data that you receive so if the data stream gets corrupted you usually get a bit of the last frame on and this one on top of the last one your eye doesn't like that because if you particularly if you're watching motion you think yes my hand should move slowly across like that and if my hand goes like that you think that's wrong um and if there's any sort of sparkle in it so you get sort of points of bright white light in the middle of it and things like that eye is incredibly sensitive to those sort of areas so better compression standards and better implementations um have special features in it to try and smooth out the errors so basically these compression standards are tailored for human eyes i'm guessing yes if you talk to the guys who invent this they talk about the psycho visual understa understanding of how the eye works and there's still fundamental research going on into how the eye and the ear because we do the same sort of thing for audio is how the brain actually works because if we knew how the brain really worked as opposed to how we thought it worked we could probably do a better job of this there's plenty of scope for getting this better in future a lot of very clever scientists working on it by subtracting these frames from each other we can virtually shroud the influence of daylight so we can even eliminate cast shadows which would be cast by direct sunlightfrom what i understand from professor steve ferber part of the idea of reducing destruction sets was to do more stuff in software is that am i going along the right lines there you absolutely are so the original risk principle reduce instruction set computing is you should never do something in hardware that you can do in software because uh you keep the hardware simple uh you keep the hardware low power high efficiency you don't waste time and effort and watts joules doing things that you could just do in software well we've put a few wrinkles on that since we needed to put things in hardware that were too complex to get right in software so if it's really really hard to guarantee that you've got it correct probably the best place to do that is the hardware about 12 years ago i came to work directly for arms so i'm not one of the really early boys at all i knew several of the founders myself they were friends and still are but i only joined about 12 years ago at the point of which arm was realizing they had to take software a whole bunch more seriously and we just designed a multi-processor cpu so we had more you know multi-core cpus and putting those together um was introducing some complexities in the software and the linux movement was really taking off big time and so companies had to work out who was going to write this who was going to support it how do we encourage other people to use this software once we've sent it out there in the world and it was it was a really interesting time it's the right time to come in for that and i worked very much at the junction of software and hardware for many years so my counterpart on the hardware side you know he and i would have long discussions about well we could do this and why can't you just do that in software and you know it would always go around in arguments like that my understanding of that from a video point of view is quite often it's easier to have a hardware decoder to make sure that you can play video smoothly is that the sort of thing we're talking about absolutely so we have a business in the media processing group doing hardware decode and hardware encode and the answer is that we can build specialist hardware to do it much more efficiently than you could do it in software and cpu is 10x 10x the sort of now 10 times is that yeah 10 times so it's interesting these things change all the time so when um when we were in the realm of mpeg and cpus got really good it looked like oh it's not worth doing this anymore but actually we then went from mpeg to h.264 and then went from h264 to huvc or h.265 and we've traded computational complexity for compression ratios so now nobody would think of doing hevc and software is just just too hard and you would have very low frame rate or very high power or it would just be horrible now you do it's doing a specialist hardware block and you can guarantee you know my hardware block will decode you a frame of hevc in exactly that number of milliseconds come hello high water so uh you sometimes can't do that with software so you can basically build in assumptions because you know the statistics you you can and we do we work absolutely to the worst case and we say well what is the most complex frame we could reasonably receive it's this okay can we decode that in less than a frame's worth of time yes we can you know and we prove it it's harder to do that with sulfur what does the most complicated video frame look oh um so all right digital video uh works on compressing an image so you start with if it's a 1080p frame you've got 2000 pixels all of which has a color which is expressible as 24 bits or something and so that's quite a large quantity of data if you were streaming that data that would be a lot of megabits per second completely unfeasible just doesn't work so what we do is we compress the video down and to do that what we do is we observe that from if you right take a picture and you look at that how much of that picture actually changes in a 30th of a second so you're i don't know what you're looking at let's say you're looking at my you know my head and shoulders yeah um what of that is actually changing well actually it's mainly my mouth and i'm waving my fingers from time to time you know the shirts not changing in the background behind me the wall certainly isn't changing and so what what we do in digital videos i was using a tripod to be honest if you were using a tripod your videos would be smaller what we do is we work out which bits change work out which bits don't change and often what happens in video is the bits that change they don't really change they move so you go well that bit was there on this frame and that bit is there on that frame and so what we do is we encode the difference by saying well it moved in that direction that far and then when it by the time it got there the colors were slightly different so you you produce the deltas of the colors the direction of the motion vector the length of the motion vector and you encode all of that down into a bunch of numbers and you transmit those that's how digital video works and it's all much of a muchness there are lots of different compression standards but the elements are the same which are that the eye doesn't see most of those changes are very and it's much more sensitive to certain types of things than it is others so if it's brighter or darker the eyes quite sensitive to that if it subtly changes color the eye is less sensitive to that and if it moves the eye is incredibly sensitive to that so we trade those things off against each other in different compression standards so that's why if somebody sees a glitch when they're watching a streamed video you see some of the screens stay where it is and maybe some yeah it basically is a problem with receiving the data also yeah so what usually happens is um you leave the previous frame up and then replace it with each bit of new data that you receive so if the data stream gets corrupted you usually get a bit of the last frame on and this one on top of the last one your eye doesn't like that because if you particularly if you're watching motion you think yes my hand should move slowly across like that and if my hand goes like that you think that's wrong um and if there's any sort of sparkle in it so you get sort of points of bright white light in the middle of it and things like that eye is incredibly sensitive to those sort of areas so better compression standards and better implementations um have special features in it to try and smooth out the errors so basically these compression standards are tailored for human eyes i'm guessing yes if you talk to the guys who invent this they talk about the psycho visual understa understanding of how the eye works and there's still fundamental research going on into how the eye and the ear because we do the same sort of thing for audio is how the brain actually works because if we knew how the brain really worked as opposed to how we thought it worked we could probably do a better job of this there's plenty of scope for getting this better in future a lot of very clever scientists working on it by subtracting these frames from each other we can virtually shroud the influence of daylight so we can even eliminate cast shadows which would be cast by direct sunlight\n"