Making Deep Learning Go Brrrr From First Principles
2024-04-07
This guide discusses optimising deep learning performance by understanding and acting upon the three main components involved in running models: compute, memory bandwidth, and overhead. It explains the importance of identifying whether a model is compute-bound, memory-bandwidth-bound, or overhead-bound, offering specific strategies for each case, such as maximising GPU utilisation, operator fusion, and minimising Python overhead. The post emphasises the power of first principles reasoning in making deep learning systems more efficient and the role of specialised hardware in enhancing compute capabilities.
Was this useful?