<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Gpu on giacolees - Tech Blog</title><link>https://giacolees.github.io/tags/gpu/</link><description>Recent content in Gpu on giacolees - Tech Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 14 Mar 2026 19:30:19 +0100</lastBuildDate><atom:link href="https://giacolees.github.io/tags/gpu/index.xml" rel="self" type="application/rss+xml"/><item><title>Hardware-Aware Programming for Dummies!</title><link>https://giacolees.github.io/posts/hardware-aware-programming-for-dummies/</link><pubDate>Sat, 14 Mar 2026 19:30:19 +0100</pubDate><guid>https://giacolees.github.io/posts/hardware-aware-programming-for-dummies/</guid><description>TL;DR Hardware-aware programming requires matching your computational task to the right processor architecture while aggressively minimizing data movement bottlenecks. While CPUs use large caches and complex logic to minimize latency for sequential tasks, GPUs use massive parallel arrays to maximize throughput for parallel workloads. However, the ultimate performance killer is data movement latency across the PCIe bus between the CPU and GPU; for small workloads, this transfer time completely eclipses the actual compute speed.</description></item></channel></rss>