1、1Architecture Analysis for ETL Processing:CPU vs GPUNikolay Sakharnykh,Senior AI Developer Technology Manager Jason Lowe,Distinguished System Software EngineerData+AI Summit 2024Performance limiters of Database OperationsCPU and GPU architectures overviewGPU performance on join and full SQL QueriesR
2、APIDS Accelerator for Apache SparkRAPIDS Accelerator Benchmarks2AgendaCan GPUs accelerate ETL?Database Operations and Their Performance LimitersKeyPayloadk1p1k2p2k3p3k4p4 Sequential access to input/output tables(read/write)Random fine-grained access to hash tables(CAS/read)Integer computations for h
3、ashingHash tablek2,p2k1,p1k3,p3k4,p4Hash join-buildHash join-probeKeyPayloadk1p1k2p2k3p3k4p4Hash tablecompare and swapreadwriteKeyPayloadk1p1k2p2k3p3k4p43CPU and GPU ArchitecturesSML2HBMXBARH U BPCIe I/OPCIe BusGPUL2L2HBML2SMSMSMSMSMSMSMNVLINKSMSMSMSMSMSMSMSMCPU45CPU and GPU Architectures Memory Spe
4、edsHBMPCIe BusCPU GPUBandwidth is calculated as the size of access multiplied by the number of accesses divided by timeHBMPCIe I/O NVLINKHSM SM SM SM U SM SM SM SMBSM SM SM SM SM SM SM SMXBARL2 L2 L2 L2Intel Xeon Platinum 8470Q(DDR5,8 channels)Peak:307 GB/sAchieved seq:238 GB/sAchieved 8B rnd:24 GB/
5、sNVIDIA H100-96GB(HBM3)Peak:4023 GB/sAchieved seq:3638 GB/sAchieved 8B rnd:306 GB/sPCIe Gen5:128 GB/s(bi-dir)6Grace Hopper SuperchipHOPPERGPUGRACECPUNVLINK C2C 900 GB/sCPU LPDDR5X 480 GBCPU LPDDR5XNVLINK NETWORK 256 GPUsHIGH-SPEEDI/OGPU HBM3 96GBGPU HBM318x NVLINK 4900 GB/sHardware Consistency4x16x
6、PCIe-5512 GB/shttps:/ Micro-benchmarkGrace Hopper achieved perf is 8-10 x faster than projected best x86 performance Performance limiter random 64-bit CAS/read7NVIDIA Decision Support-H BenchmarkNVIDIA Decision Support-H(NDS-H)is our adaptation of the TPC-H benchmark often used by database customers