1、The nanoPUnanoPU:A Nanosecond Network Stackfor DatacentersStephen Ibanez,Alex Mallery,Serhat Arslan,Theo Jepsen,*Muhammad Shahbaz,Changhoon Kim,Nick McKeownStanford University,*Purdue Universitywww.SmartNICsS San Jose,CA April 26-28,2022The Need to Minimize RPC Latency and Software OverheadsLarge On
2、line Interactive Services Web Search Recommendation systems Online transaction processing2Fine-grained Computing Video encoding(ExCamera NSDI17)Object classification(Sprocket SoCC18)Software compilation(gg ATC19)MapReduce-style analytics(Locus NSDI19)Flash Bursts(NSDI 21).RPCQuestion:What would it t
3、ake to absolutely minimize RPC median and tail latency as well as software processing overheads?www.SmartNICsS San Jose,CA April 26-28,2022Previous Approaches are Insufficient3ApproachLimitationWire-to-Wire LatencyRPC ThroughputDataplane operating systems(e.g.Shinjuku,Shenango)Too coarse grainedMedi
4、an:2-5sTail:10-100s100MrpsRDMA NICsNeed low latency to remote compute,not memoryMedian:700nsN/AIntegrated NICs(e.g.NeBuLa)Still room for improvement of latency and throughputMedian:100nsTail:2-5s20Mrps/corewww.SmartNICsS San Jose,CA April 26-28,2022The nanoPU4dProgrammable NICKey Features:Integrated
5、 NIC Efficient core selection in HW Programmable transport in HW Direct path to CPU register file Hardware-accelerated thread schedulingLLCMain MemoryCore 0Core N-1DMA PathHWTransportCoreSelectionWire-to-wire latency:69nsSingle-core throughput:118Mrpswww.SmartNICsS San Jose,CA April 26-28,2022The na
6、noPU Core5HW Thread SRXnetTXRegistersL1 I$CoreL1 D$RX QueueTX QueueMVSwapwww.SmartNICsS San Jose,CA April 26-28,2022The nanoPU Core6HW Thread SRXnetTXRegistersL1 I$CoreL1 D$RX QueuesTX QueuesMVP=1P=0www.SmartNICsS San Jose,CA April 26-28,2022The nanoPU Fast Path7dPISAIngressEgressEthernet MAC+Serial