Research

My research focuses on improving the safety, security, and reliability of artificial intelligence systems. In my Master's work, I study how to identify datasets used in the training of large language models and develop fairer methods for evaluating their capabilities. I am especially interested in applying statistical techniques to machine learning problems and creating methods with provable guarantees based on statistical theory.

My earlier research also explored the connection between machine learning and quantum computing. In my Bachelor's thesis, I investigated the use of Boltzmann Machines for boson sampling, combining ideas from physics, computation, and learning theory.

More broadly, I am interested in mathematical approaches to AI, particularly optimization methods in neural networks and how they can improve the behavior and performance of modern AI systems.

Publications

Practical Identification of Training Datasets in Large Language Models

Nima Dindarsafa, Bihe Zhao, Piotr Trzaskowski, Filip Szympliński, Krzysztof Wodnicki, Franziska Boenisch, Adam Dziedzic

Under Review · NeurIPS 2026

This research aims to develop a practical black-box Dataset Inference method for determining whether an LLM was trained on a suspect dataset by estimating per-token probabilities from label-only generated outputs, avoiding the need for gray-box probability access. It also replaces rigid p-value testing with e-value–based sequential testing, enabling anytime-valid and adaptive evidence accumulation as more samples are queried. Together, these methodological advances allow dataset inference to be applied to real-world LLM APIs under fully black-box access.

Read the paper →

Figure illustrating the 1D-CNN seismic collapse prediction framework

A 1D-CNN deep learning framework for seismic collapse prediction of jacket offshore platforms with Bayesian neural architecture search

Mohammad Zarrin, Liborio Cavaleri, Amirhosein Rezaei, Nima Dindarsafa

Ocean Engineering Journal · 2026

This study proposes a methodology for seismic limit-state and collapse prediction of Jacket Type Offshore Platforms using a 1D-CNN trained directly on earthquake accelerogram time-series data. The study develops a systematic model-selection framework based on Bayesian optimization to tune the neural architecture and hyperparameters, reducing reliance on manual trial-and-error. It further uses nested cross-validation for unbiased performance evaluation and introduces a stochastic optimization approach for stratified K-fold dataset preparation based on incremental dynamic analysis. The optimized 1D-CNN framework is compared with tuned MLP and SVM models and applied to estimate collapse fragility curves for a case-study offshore platform.

Read the paper →