Журнал «Труды Института системного анализа Российской академии наук» - D.E. Namiot, E.A. Ilyushin, I.V.Chizov "On the Practical Generation of Counterfactual Examples"

One of the important elements in evaluating the stability of machine learning systems are the so-called adversarial examples. These are specially selected or artificially created input data for machine learning systems that interfere with their normal operation, are interpreted or processed incorrectly. Most often, such data are obtained through some formal modifications of the real source data. This article considers a different approach to creating such data, which takes into account the semantic significance (meaning) of the modified data - counterfactual examples. The purpose of the work is to present practical solutions for generating counterfeit examples. The consideration is based on the real use of counterfactual examples in assessing the robustness of machine learning systems.

1. Namiot, Dmitry, Eugene Ilyushin, and Ivan Chizhov. “On a formal verification of machine learning systems.” International Journal of Open Information Technologies 10.5 (2022): 30-34.

2. Li, Huayu, and Dmitry Namiot. “A Survey of Adversarial Attacks and Defenses for image data on Deep Learning.” International Journal of Open Information Technologies 10.5 (2022): 9-16.

4. Buchsbaum, Daphna, et al. “The power of possibility: Causal learning, counterfactual reasoning, and pretend play.” Philosophical Transactions of the Royal Society B: Biological Sciences 367.1599 (2012): 2202-2212.

5. Sterelny, Kim. “Language, gesture, skill: the co-evolutionary foundations of language.” Philosophical Transactions of the Royal Society B: Biological Sciences 367.1599 (2012): 2141-2151.

6. Kasirzadeh, Atoosa and Andrew Smart. “The use and misuse of counterfactuals in ethical machine learning.” Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 2021.

7. Amir-Hossein Karimi, Gilles Barthe, Borja Belle, and Isabel Valera. 2019. Model-Agnostic Counterfactual Explanations for Consequential Decisions. arXiv preprint arXiv:1905.11190 (2019)

8. Barocas, Solon, Andrew D. Selbst, and Manish Raghavan. “The hidden assumptions behind counterfactual explanations and principal reasons.” Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 2020.

9. Duong, Tri Dung, Qian Li, and Guandong Xu. “Prototype-based Counterfactual Explanation for Causal Classification.” arXiv preprint arXiv: 2105.00703 (2021).

12. Thiagarajan, Jayaraman J., et al. “Treeview: Peeking into deep neural networks via feature-space partitioning.” arXiv preprint arXiv: 1611.07429 (2016).

13. Boz, Olcay. “Extracting decision trees from trained neural networks.” Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 2002.

14. Santos, Raul T., Júlio C. Nievola, and Alex A. Freitas. “Extracting comprehensible rules from neural networks via genetic algorithms.” 2000 IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks. Proceedings of the First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (Cat. No. 00. IEEE, 2000.

15. Andrews, Robert, Joachim Diederich, and Alan B. Tickle. “Survey and critique of techniques for extracting rules from trained artificial neural networks.” Knowledge-based systems 8.6 (1995): 373-389.

16. Krishnan, Sanjay, and Eugene Wu. “Palm: Machine learning explanations for iterative debugging.” Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. 2017.

17. Henelius, Andreas, et al. “A peek into the black box: exploring classifiers by randomization.” Data mining and knowledge discovery 5 (2014): 1503-1529.

18. Selvaraju, Ramprasaath R., et al. “Grad-cam: Visual explanations from deep networks via gradient-based localization.” Proceedings of the IEEE international conference on computer vision. 2017.

19. Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “Model-agnostic interpretability of machine learning.” arXiv preprint arXiv: 1606.05386 (2016).

20. Gohel, Prashant, Priyanka Singh, and Manoranjan Mohanty. “Explainable AI: current status and future directions.” arXiv preprint arXiv: 2107.07045 (2021).

21. Sari, Leda, Mark Hasegawa-Johnson, and Chang D. Yoo. “Counterfactually Fair Automatic Speech Recognition.” IEEE/ACM Transactions on Audio, Speech, and Language Processing (2021).

23. Teney, Damien, Ehsan Abbasnedjad, and Anton van den Hengel. “Learning what makes a difference from counterfactual examples and gradient supervision.” Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16. Springer International Publishing, 2020.

24. Roelofs, Rebecca, et al. “A meta-analysis of overfitting in machine learning.” Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019.

25. Heinze-Deml, Christina, and Nicolai Meinshausen. “Conditional variance penalties and domain shift robustness.” arXiv preprint arXiv: 1710.11469 (2017).

27. Das, Abhishek, et al. “Human attention in visual question answering: Do humans and deep networks look at the same regions?.” Computer Vision and Image Understanding 163 (2017): 90-100.

29. Madaan, Nishtha, et al. “Generate your counterfactuals: Towards controlled counterfactual generation for text.” arXiv preprint arXiv: 2012.04698 (2020).

30. Ribeiro, M.T., Wu, T., Guestrin, C. and Singh, S. 2020. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. arXiv preprint arXiv:2005.04118 .

33. Dhurandhar, Amit, et al. “Explanations based on the missing: Towards contrastive explanations with pertinent negatives.” arXiv preprint arXiv: 1802.07623 (2018).

37. Namiot, Dmitry, Eugene Ilyushin, and Oleg Pilipenko. “On Trusted AI Platforms.” International Journal of Open Information Technologies 10.7 (2022): 119-127. (in Russian)

38. Ilyushin, Eugene, Dmitry Namiot, and Ivan Chizhov. “Attacks on machine learning systems-common problems and methods.” International Journal of Open Information Technologies 10.3 (2022): 17-22. (in Russian)