Publications

AI Debaters are More Persuasive when Arguing in Alignment with Their Own Beliefs

Accepted for presentation at the First Workshop on Multi-Turn Interactions in Large Language Models, NeurIPS 2025, San Diego, Estados Unidos.

Authors: María Victoria Carro, Denise Alejandra Mester, Facundo Nieto, Oscar Agustín Stanchi, Guido Ernesto Bergman, Mario Alejandro Leiva, Eitan Sprejer, Luca Nicolás Forziati Gangi, Francisca Gauna Selasco, Juan Gustavo Corvalán, Gerardo I Simari, María Vanina Martinez.


A Conceptual Framework for AI Capability Evaluations

Accepted for presentation at the Technical AI Governance Workshop, ICML 2025, Vancouver, Canada.

Authors: María Victoria Carro, Denise Alejandra Mester, Francisca Gauna Selasco, Luca Nicolás Forziati Gangi, Matheo Sandleris Musa, Lola Ramos Pereyra, Mario Leiva, Juan Gustavo Corvalan, Maria Vanina Martinez and Gerardo Simari.


Flattering to Deceive: The Impact of Sycophantic Behavior on User Trust in Large Language Model

This work was funded by BlueDot Impact as the final project for the AI Alignment course. Following the decision by OpenAI to withdraw its model due to issues related to sycophancy, the author was interviewed by CNN UK about the findings.

Author: María Victoria Carro.

Do Large Language Models Show Biases in Causal Learning? Insights from Contingency Judgment

Accepted for presentation at the First Workshop on CogInterp: Interpreting Cognition in Deep Learning Models, NeurIPS 2025, San Diego, Estados Unidos.

Authors: María Victoria Carro, Denise Alejandra Mester, Francisca Gauna Selasco, Giovanni Franco Gabriel Marraffini, Mario Alejandro Leiva, Gerardo I Simari, María Vanina Martinez.

Are UFOs Driving Innovation? The Illusion of Causality in Large Language Models

Accepted for presentation at the Causality and Large Models, NeurIPS 2024, Vancouver, Canada.

Authors: María Victoria Carro, Francisca Gauna Selasco, Denise Alejandra Mester and Mario Alejandro Leiva.

Subscribe

Enter your email below to receive updates.