1、High-Parallel In-Memory NTT Engine with Hierarchical Structure and Even-Odd Data Mapping Institute of Microelectronics,Chinese Academy of Sciences1Capital Normal University2Institute of Computing Technology,Chinese Academy of Sciences3,University of Chinese Academy Sciences4Bing Li1,Huaijun Liu2,Yib
2、o Du3,4,Ying Wang3,4OutlineBackground and MotivationProposed Method Overview Architecture&Data MappingEvaluation and ResultsConclusionFully Homomorphic EncryptionMedical TreatmentCloud ComputingMachine LearningFitness App FHE ReviewViand A,et al.,S&P 2021 Data Security Powerful Functionality High Co
3、mputational OverheadClassic NTT Challenges&Advantagesa0A0!#a4A1-1a2A2!#a6A3-1a1A4!#a5A5-1a3A6!#a7A7-1!#$#!#$#!#%#$#-1-1-1-1-1-1-1-1Stage1Stage2Stage3Algorithm In-Place Cooley-Tukey-based NTTInput:a=(an1,.,a0)R,n-th root of unity in%with bit-reversed orderOutput:A=NTT(a)in bit-reversed order1:=2:fo
4、r(=1;=2)do3:=/24:for(=0;n-1 3.t2=t1 mu4.t3=t2 n+15.r1=c%2n+16.r2=(t3 q)%2n+17.r=r1-r2Condition:r q/2?(r-q):rReturn rImplementing in CIMCalculation:r=c mod q(q:n bit)1.x=cn-1;2.a=x q/2?(r-q):rReturn rOptimizationMod Algorithm Optimization Adapt the original Barrett algorithm to the efficient implemen
5、tation on CIM111010111111111000000001110101000000001110101Right shift000000001110101cxxa829,qn=829,qn=MSBMSBLSBLSB111010111111111c000000000001011na000000100000000t00b1000000000000000000000011111111()tb+()cna+Sub(a)Shift in CIM(b)Subtraction in CIM Low Latency Low Energy Left shiftMod Algorithm Optim
6、izationMod Module-Data MappingRTLA0,msbA0,lsbA3,msbA3,lsbSense AmplifierSubArray0SubArray64Read/Write&ComparatorWL Decoder&DriverSense AmplifierSubArray128Sense AmplifierSubArray192Sense AmplifierSubArray191Sense AmplifierSubArray255Sense AmplifierMOD PEMOD PEResult A0qmsbqlsbqmsbqlsbResult A3RTLA25