1、On Cooperative Self-Explaining NLP Models2023/06/17Jun WangSummary Interpretability Cooperative Self-Explaining Framework and Spurious Correlations Our insights on Cooperative Games and Solutions for Spurious Correlations in RNP Future workInterpretability-Growing concern about the model interpretab
2、ility in various critical fieldsInterpretability in the LLM EraGPT-3&Beyond,Christopher Potts,Stanford,2023/01Generating post-hoc explanations that seem highly plausibleLLMs remain a huge black box,which may pose a problem for scenarios that require an interpretable underlying mechanism to ensure tr
3、ustworthiness.The processing cost and speed for LLMs when handling vast amounts of data,such as user reviews on large-scale websites,also pose a challenge.Expectation on Interpretability Both faithful(reflecting the models actual behavior)and plausible(aligning with human understanding)Various Metho
4、ds for Interpretability Post-hoc methods require additional surrogate models to explain the existing models being interpreted difficult to ensure faithfulness,especially for black-box models Ante-hoc models(self-explaining)incorporate interpretability into the model design and ensure faithfulness mo
5、del predictions are based on the informative explanations generated by the model itself.Lei et al.2016,Rationalizing Neural Predictions,EMNLP-2016Cooperative Self-Explaining Framework:RNP and Spurious CorrelationsCooperative Self-Explaining Framework:RNP Rationalizing Neural Predictions(RNP)utilizes
6、 a cooperative game between an explainer(or generator)and a predictor the explainer identifies a human-interpretable subset of the input(referred to as rationale)and passes it to the subsequent predictor for making predictions Significant Advantage:Certification of Exclusion Guarantees that any unse