1、Simplifying The Threat Surface from a Hackers PerspectiveDan McInerney,Threat Researcher,Protect AIGenerative AITHE PLATFORM FOR AI AND ML SECURITY|PROTECT AI RESTRICTED|DO NOT DISTRIBUTEAttack SurfacePrompt Injection AttacksAttackers craft inputs that“inject malicious instructions into the prompt,m
2、anipulating the models behavior or bypassing safety filters.Jailbreak AttacksA subset of prompt injection,these are designed to force the model to ignore its built-in ethical or safety guidelines and produce prohibited outputs.Adversarial ExamplesSlightly perturbed or carefully engineered inputs cau
3、se the model to generate incorrect,harmful,or unintended outputs.Model Extraction AttacksBy querying the model extensively(often via public APIs),adversaries attempt to reconstruct a surrogate model or infer proprietary parameters and architecture details.Membership Inference AttacksAttackers analyz
4、e outputs to determine whether specific data points were included in the models training dataset,potentially compromising privacy.Model Inversion AttacksThese attacks aim to reconstruct or reveal sensitive aspects of the training data by“inverting the models outputs.Data Poisoning AttacksMalicious d
5、ata is introduced into the training process so that the model learns incorrect or harmful behaviorsthis can include backdoor or Trojan triggers.Backdoor/Trojan AttacksSimilar to data poisoning,but with a focus on embedding hidden triggers that,when activated by specific inputs,cause the model to beh
6、ave in a controlled(and usually harmful)way.Evasion AttacksInputs are crafted specifically to bypass moderation filters or detection mechanisms,often allowing harmful content to be generated or disseminated.Adversarial ReprogrammingAn adversary repurposes the model to perform tasks it wasnt intended