Compilers Versus Kernels: A False Dichotomy
In recent years, there has been an ongoing debate between compilers and kernels in the field of high-performance computing. One side argues that compilers are the key to achieving optimal performance, while the other side claims that kernels are the way forward. However, this dichotomy is not as clear-cut as it seems. In reality, both compilers and kernels are essential components of a high-performance system, and combining their strengths can lead to significant improvements in performance.
One area where compilers excel is in producing optimized kernels or operations right from the start. These intrinsics are highly specific to the problem at hand and can be tailored to take advantage of specific hardware features. By incorporating these intrinsics into the compiler pipeline, developers can create optimized kernels that are specifically designed for a particular workload or deployment. This approach allows compilers to produce high-quality, optimized code without requiring manual intervention.
However, there is a catch. Intrinsic optimization is not always possible, and in some cases, it may be difficult or impossible to find the optimal solution. Additionally, optimizing intrinsics can be time-consuming and requires significant expertise in low-level programming and compiler internals. Furthermore, as new instruction sets and architectures emerge, compilers must adapt quickly to take advantage of these changes. This is where kernels come into play.
Kernels are a lower-level way of expressing code that allows developers to write custom instructions for specific hardware features. Unlike intrinsics, which are typically high-level abstractions, kernels require direct access to the underlying hardware and are often written in low-level assembly language. By using kernels, developers can achieve fine-grained control over the optimization process and take advantage of specific hardware features that may not be supported by intrinsics.
One approach to combining compilers and kernels is to use a hybrid system where the compiler generates high-quality code that can be optimized further using custom kernels. This approach allows developers to leverage the strengths of both compilers and kernels, without being limited by either one. By utilizing micro-kernels in the design, it is possible to create a system where the compiler pipeline targets specific hardware units, which can then be optimized using custom kernels.
This approach has several advantages over traditional compiler-based optimization techniques. Firstly, it allows developers to take advantage of specific hardware features that may not be supported by compilers alone. Secondly, it provides fine-grained control over the optimization process, allowing developers to tailor the optimization strategy to the specific requirements of their workload or deployment. Finally, it reduces the complexity and cost associated with traditional compiler-based optimization techniques.
In addition to reducing complexity and improving performance, hybrid systems also provide a more flexible and maintainable solution. By using kernels, developers can easily adapt to changing instruction sets and architectures, without having to rewrite large portions of the code. Moreover, by utilizing micro-kernels, it is possible to create a system where the compiler pipeline can be switched on or off, depending on the requirements of the workload or deployment.
The benefits of this approach are not limited to performance improvements alone. By providing fine-grained control over the optimization process and reducing complexity, hybrid systems also improve maintainability and support for new workloads and deployments. Moreover, by leveraging the strengths of both compilers and kernels, developers can create optimized systems that are better suited to specific use cases.
At Eerie, this approach has been successfully applied in several projects. By utilizing high-level intrinsics and custom kernels, developers have achieved significant performance improvements across a range of workloads and deployments. Moreover, by combining the strengths of both compilers and kernels, it is possible to create optimized systems that are more maintainable and easier to adapt to changing requirements.
One notable example of this approach is in the development of custom TPU kernels. Historically, TPUs were designed with a limited set of operations in mind, and developers had to rely on software-based workarounds to achieve better performance. However, as machine learning models have evolved, so too have the requirements for TPU-based optimization techniques.
To address this challenge, researchers at Eerie introduced custom TPU kernels as a way to enable portability and decrease developer space and support. While these kernels were initially designed as a fallback solution, they have proven to be highly effective in achieving better performance than software-based workarounds.
In addition to their performance benefits, custom TPU kernels also provide fine-grained control over the optimization process. By allowing developers to write low-level assembly code, these kernels enable developers to tailor the optimization strategy to specific requirements and take advantage of unique hardware features that may not be supported by software-based solutions.
The success of this approach has been recognized within the industry, with many researchers and developers adopting similar techniques for optimizing TPU-based systems. Moreover, as machine learning models continue to evolve and require more computational resources, the need for optimized optimization techniques will only grow.
In conclusion, compilers and kernels are not mutually exclusive concepts, but rather two essential components of a high-performance system. By combining their strengths and leveraging the benefits of hybrid approaches, developers can achieve significant improvements in performance, maintainability, and support for new workloads and deployments. As machine learning models continue to evolve and require more computational resources, the importance of optimized optimization techniques will only grow.