The Problem with Benchmarks

**The Limitations of Benchmarking: Understanding the Real-World Performance of High-End Computing Hardware**

Cinebench, a popular benchmarking tool, is often used to measure the performance of high-end computing hardware. However, this software is obviously designed to be run on multiple cores, which means that not all software can take advantage of its full capabilities. In fact, some tasks simply cannot be broken down into smaller, threadable parts. This limitation becomes apparent when considering a 32-core CPU versus a less powerful processor with only four or six cores.

For instance, a software application may be able to run concurrently on multiple cores, but if that's the case, having an excessively high number of cores might not provide any significant benefits. In such scenarios, a processor with fewer, more efficient cores might actually be quicker for specific tasks. Additionally, operating systems can become more efficient at multitasking when using multiple cores, which means that even if individual software packages cannot utilize all the available performance, the system as a whole will still benefit.

**The Reality of High-End Hardware: Setting Realistic Expectations**

When it comes to high-end hardware like Apple's M3 Max and M2 Max chips, benchmarks can be misleading. While these processors boast significant improvements in raw computational performance, not all benefits are immediately apparent in real-world scenarios. For example, many applications rely on single-threaded performance rather than multi-core capabilities. In such cases, a faster single-core score might actually be more important than the multi-core score for tasks like web browsing or working in spreadsheets.

**Optimizations Matter: The Importance of Contextual Benchmarking**

Benchmarks often focus on raw performance in ideal circumstances but fail to account for other benefits that can significantly impact real-world performance. Two key examples illustrate this point:

* **Unified Memory Architecture**: Apple's unified memory architecture allows the GPU to access the same memory space as the CPU, reducing unnecessary data copying and increasing efficiency. This feature provides a significant advantage in tasks like video rendering, where access to large amounts of memory can be critical.

* **Video Acceleration**: Custom accelerators built into Apple silicon chips have significantly improved performance when working with 10-bit 422 video content. These accelerators are particularly effective for decoding and encoding this type of content, which is often used in professional video editing.

**The Importance of Contextual Research: Avoiding Misleading Benchmarks**

Benchmarks can be misleading if not understood within the context of specific workflows or applications. For instance:

* **GPU vs. CPU**: When comparing graphics cards, benchmarks may prioritize raw performance over real-world efficiency. In reality, factors like memory access and acceleration play a significant role in determining performance.

* **Workstation Performance**: When choosing a workstation for professional video editing, it's essential to consider the specific tasks that will be performed. While higher-end hardware might score well in benchmarks, more affordable options might not match its performance in real-world scenarios.

Ultimately, researching and understanding the strengths and limitations of different computing hardware is crucial for making informed purchasing decisions. By considering a range of factors beyond raw benchmark scores, users can find the best fit for their specific needs and workflows.

"WEBVTTKind: captionsLanguage: enbenchmarks are a great way to compare the performance of different computers and components if you go and watch a review of a new laptop or a graphics card on YouTube you can pretty much guarantee that the reviewer is going to be using benchmarks in order to draw conclusions and that's fine provided that we understand the limitations of benchmarks and then explain those to our viewers because the simple truth is that Benchmark numbers don't always represent real world experiences so if you make a purchasing decision based solely on the benchmarks you might end up with the wrong system for you so in this quick video let me just try and explain those limitations so that you can make informed purchasing decisions and we'll start with the ubiquitous geekbench 6 CPU test now this is a great Benchmark for comparing the performance of different computer processors there's a whole Suite of different tests that are designed to represent a typical workflow great but what are these different tests well if we look at the detail behind the geekbench CPU score we can see those individual results and we've got things like file compression HTML 5 browsing photo library and PDF rendering and those will likely be part of every users's workflow but some of the tests are more focused we've got a navigation test which assesses the ability of the CPU to generate directions between a sequence of locations and that's really useful for evaluating a smart phone but perhaps less relevant to a desktop computer we've also got tests for the clang compiler text processing and asset compression things that would be very typical for a software developer's workflow but not necessarily your workflow and there are machine learning and image editing workloads things like object detection background blur object removal Horizon detection photo filters and HDR photo blending and if you're someone who spends a lot of time editing images then those tests will be relevant to you and finally we can also see there are some 3D Graphics tests like Ray tracing and structure from motion which might be relevant to your work if say you're a 3D artist using CPU rather than GPU for rendering so not all of these tests will be applicable to your specific workload but it's all of these individual scores Blended together that create the final overall score so perhaps you can see the problem already with only using that overall score to make a purchasing decision and it could be that system a has a higher overall score than system B but perhaps system B is actually better at the things you do on a daily basis so these fine details are really important to consider especially when you're comparing two different architectures for example Apple silicon Max with Intel PCS but here's another issue it's often the multicore score that's used to compare the speed difference of two processors so for example let's take the newly released M3 Max which scores 21,067 whereas the previous M2 Max Max scores 14,786 and that's a huge leap forward in performance so a tech reviewer might do a graph of both of those scores and then point out that the M3 Max is 42% more performant than the M2 Max and that would be a true statement but do you need to rush out to upgrade what you need to know here is that the benchmark test is designed to fully utilize every processor core now we can see this really clearly in the cinebench rendering benchmark here for example we'll run the test on a modern PC notebook with a 12th generation Intel i7 and the performance is pretty good this CPU is interesting because it has two different types of cores you've got two high performance dual threaded cores and eight power efficient single threaded cores but the test makes the most of all of those cores to deliver the best possible result now let's run the test on my workstation which has a 32 core 64 thread AMD thread Ripper Pro CPU as expected the test run signal significantly quicker because again it can make use of all of that performance if we only look at the final scores and that huge performance difference we might assume that the workstation is going to be vastly quicker than the notebook in all areas and we'd be wrong because in actual fact there are a number of day-to-day tasks that will run faster on the notebook why because this efficient use of all of the cores doesn't happen automatically the software has to be written in such a way so as to split the work into threads that can be distributed across all of the cores cinebench is obviously written in a way to do that but not all software is written that way in fact not all software could be written that way because there are some tasks that you just can't break apart like this now it may be that a software application can be threaded across just four cores and if that's the case having a 32 core CPU isn't really going to make any difference to that piece of software in fact a processor with six or eight cores might might actually be considerably quicker for that particular task it's also true that your operating system will be able to multitask more efficiently by using additional cores to run software apps concurrently so for some users having more cores is beneficial even if the individual software packages can't use all the performance now just come back to the M3 Max and M2 Max chips yes the M3 Max has 42% more Peak Performance and it's certainly true to say that but we need to moderate our expectations with reality because it's likely that we won't see those gains in real world performance many apps actually rely on single threaded performance so a faster single course score might actually be more important than the multi-core score for example web browsing or working in a spreadsheet let me highlight one more area where benchmarks don't tell the full story and that's with optimizations benchmarks they're normally a measurement of raw performance in ideal circumstances but they don't always factor in other benefits let me give you two examples uh if you measure the raw computational performance of the Apple silicon gpus you'll look on the rankings list and you can see that there are PC graphics cards that score much higher but the Apple silicon chips have optimizations that allow them to perform better than expected in the real world and that's something that the raw Benchmark tests don't always factor in take Apple's unified memory architecture it allows the GPU to access the exact same memory space as the CPU avoiding unnecessary data copying but it also allows the GPU to access all of the available memory in the system this notebook's got 32 gigabytes and in theory the GPU can access as much of that as it needs and that gives it a huge advantage in things like video rendering it is true that you can go and buy a PC notebook with a much more powerful graphics card in it but it will also have its own dedicated Ram perhaps 8 GB so for the most part the PC would be faster and would score higher until you need to do a video render that requires more than that 8 gigs of video RAM then the Mac that can access more memory can do it faster and let me give you a second example I recently upgraded my PC workstation with an RTX 4090 GPU and the performance is incredible if you put it side by side with even the most powerful Apple silicon Mac I'd expect the RTX 490 to win pretty much everywhere however the Apple silicon chips have got really great video accelerators and these close the gap much more than the Benchmark numbers would suggest when editing video with multiple streams of 6K raw our M2 Max notebook is able to get close to the experience of the PC but throwing in some heavily compressed hvvc 10bit content and the M2 Max starts to offer an identical if not slightly smoother experience than the PC with the 490 but if we then throw some 10bit 422 content into the mix the the M2 Max notebook destroys the 4090 equi PC why well it's nothing to do with Graphics power and everything to do with these custom accelerator chips because Apple silicon has accelerator chips that can decode and encode 10 bit 422 content normally in a PC this would be handed off to a custom accelerator in the CPU and recent Intel chips have accelerators for exactly this type of content but my AMD thread repper Pro doesn't so the result is that this topof thee line very expensive PC that has all the great Benchmark numbers slows to an impossible crawl while something with lower Benchmark numbers actually wins now just to be clear here I'm highlighting a niche case and I'm not saying that Macs are better than PCS I edit the majority of my videos on that workstation and I love it I wouldn't change it but I also won't be using it if I ever need to work with 10bit 422 video content so benchmarks are great and we absolutely need to use them but they can also be misleading if you don't understand how they apply to your specific workflow there's no need to upgrade your machine every moment some new piece of Hardware comes out with a higher Benchmark score because the reality is you might not even notice the difference so it's really important to do your research and to seek a blend of opinions before you take the plunge as always I'm looking forward to your views and comments thanks for supporting the channel see you again soon for some more geekerbenchmarks are a great way to compare the performance of different computers and components if you go and watch a review of a new laptop or a graphics card on YouTube you can pretty much guarantee that the reviewer is going to be using benchmarks in order to draw conclusions and that's fine provided that we understand the limitations of benchmarks and then explain those to our viewers because the simple truth is that Benchmark numbers don't always represent real world experiences so if you make a purchasing decision based solely on the benchmarks you might end up with the wrong system for you so in this quick video let me just try and explain those limitations so that you can make informed purchasing decisions and we'll start with the ubiquitous geekbench 6 CPU test now this is a great Benchmark for comparing the performance of different computer processors there's a whole Suite of different tests that are designed to represent a typical workflow great but what are these different tests well if we look at the detail behind the geekbench CPU score we can see those individual results and we've got things like file compression HTML 5 browsing photo library and PDF rendering and those will likely be part of every users's workflow but some of the tests are more focused we've got a navigation test which assesses the ability of the CPU to generate directions between a sequence of locations and that's really useful for evaluating a smart phone but perhaps less relevant to a desktop computer we've also got tests for the clang compiler text processing and asset compression things that would be very typical for a software developer's workflow but not necessarily your workflow and there are machine learning and image editing workloads things like object detection background blur object removal Horizon detection photo filters and HDR photo blending and if you're someone who spends a lot of time editing images then those tests will be relevant to you and finally we can also see there are some 3D Graphics tests like Ray tracing and structure from motion which might be relevant to your work if say you're a 3D artist using CPU rather than GPU for rendering so not all of these tests will be applicable to your specific workload but it's all of these individual scores Blended together that create the final overall score so perhaps you can see the problem already with only using that overall score to make a purchasing decision and it could be that system a has a higher overall score than system B but perhaps system B is actually better at the things you do on a daily basis so these fine details are really important to consider especially when you're comparing two different architectures for example Apple silicon Max with Intel PCS but here's another issue it's often the multicore score that's used to compare the speed difference of two processors so for example let's take the newly released M3 Max which scores 21,067 whereas the previous M2 Max Max scores 14,786 and that's a huge leap forward in performance so a tech reviewer might do a graph of both of those scores and then point out that the M3 Max is 42% more performant than the M2 Max and that would be a true statement but do you need to rush out to upgrade what you need to know here is that the benchmark test is designed to fully utilize every processor core now we can see this really clearly in the cinebench rendering benchmark here for example we'll run the test on a modern PC notebook with a 12th generation Intel i7 and the performance is pretty good this CPU is interesting because it has two different types of cores you've got two high performance dual threaded cores and eight power efficient single threaded cores but the test makes the most of all of those cores to deliver the best possible result now let's run the test on my workstation which has a 32 core 64 thread AMD thread Ripper Pro CPU as expected the test run signal significantly quicker because again it can make use of all of that performance if we only look at the final scores and that huge performance difference we might assume that the workstation is going to be vastly quicker than the notebook in all areas and we'd be wrong because in actual fact there are a number of day-to-day tasks that will run faster on the notebook why because this efficient use of all of the cores doesn't happen automatically the software has to be written in such a way so as to split the work into threads that can be distributed across all of the cores cinebench is obviously written in a way to do that but not all software is written that way in fact not all software could be written that way because there are some tasks that you just can't break apart like this now it may be that a software application can be threaded across just four cores and if that's the case having a 32 core CPU isn't really going to make any difference to that piece of software in fact a processor with six or eight cores might might actually be considerably quicker for that particular task it's also true that your operating system will be able to multitask more efficiently by using additional cores to run software apps concurrently so for some users having more cores is beneficial even if the individual software packages can't use all the performance now just come back to the M3 Max and M2 Max chips yes the M3 Max has 42% more Peak Performance and it's certainly true to say that but we need to moderate our expectations with reality because it's likely that we won't see those gains in real world performance many apps actually rely on single threaded performance so a faster single course score might actually be more important than the multi-core score for example web browsing or working in a spreadsheet let me highlight one more area where benchmarks don't tell the full story and that's with optimizations benchmarks they're normally a measurement of raw performance in ideal circumstances but they don't always factor in other benefits let me give you two examples uh if you measure the raw computational performance of the Apple silicon gpus you'll look on the rankings list and you can see that there are PC graphics cards that score much higher but the Apple silicon chips have optimizations that allow them to perform better than expected in the real world and that's something that the raw Benchmark tests don't always factor in take Apple's unified memory architecture it allows the GPU to access the exact same memory space as the CPU avoiding unnecessary data copying but it also allows the GPU to access all of the available memory in the system this notebook's got 32 gigabytes and in theory the GPU can access as much of that as it needs and that gives it a huge advantage in things like video rendering it is true that you can go and buy a PC notebook with a much more powerful graphics card in it but it will also have its own dedicated Ram perhaps 8 GB so for the most part the PC would be faster and would score higher until you need to do a video render that requires more than that 8 gigs of video RAM then the Mac that can access more memory can do it faster and let me give you a second example I recently upgraded my PC workstation with an RTX 4090 GPU and the performance is incredible if you put it side by side with even the most powerful Apple silicon Mac I'd expect the RTX 490 to win pretty much everywhere however the Apple silicon chips have got really great video accelerators and these close the gap much more than the Benchmark numbers would suggest when editing video with multiple streams of 6K raw our M2 Max notebook is able to get close to the experience of the PC but throwing in some heavily compressed hvvc 10bit content and the M2 Max starts to offer an identical if not slightly smoother experience than the PC with the 490 but if we then throw some 10bit 422 content into the mix the the M2 Max notebook destroys the 4090 equi PC why well it's nothing to do with Graphics power and everything to do with these custom accelerator chips because Apple silicon has accelerator chips that can decode and encode 10 bit 422 content normally in a PC this would be handed off to a custom accelerator in the CPU and recent Intel chips have accelerators for exactly this type of content but my AMD thread repper Pro doesn't so the result is that this topof thee line very expensive PC that has all the great Benchmark numbers slows to an impossible crawl while something with lower Benchmark numbers actually wins now just to be clear here I'm highlighting a niche case and I'm not saying that Macs are better than PCS I edit the majority of my videos on that workstation and I love it I wouldn't change it but I also won't be using it if I ever need to work with 10bit 422 video content so benchmarks are great and we absolutely need to use them but they can also be misleading if you don't understand how they apply to your specific workflow there's no need to upgrade your machine every moment some new piece of Hardware comes out with a higher Benchmark score because the reality is you might not even notice the difference so it's really important to do your research and to seek a blend of opinions before you take the plunge as always I'm looking forward to your views and comments thanks for supporting the channel see you again soon for some more geeker\n"