DATA-TRUST - From Big Data to Good Data: Smart-Sized Benchmarking for Trustworthy Artificial Intelligence.

Project Overview

The main goal of the DATA-TRUST project is to develop a framework for smart-sized benchmarking that improves the reliability and generalizability of AI systems.

Current AI systems are often trained on all available data. While large datasets can improve performance, they may also introduce bias, redundancy, and overfitting, reducing the ability of models to perform well on new unseen data.

DATA-TRUST addresses this challenge by investigating:

1. How to represent benchmark datasets and optimization problems.
2. How to select representative subsets of data.
3. How to measure generalization ability.

The framework will be validated across two domains:

1. Single-objective continuous optimization.
2. Time-series analysis.

Although the methods are validated in these two domains, the developed concepts are expected to be applicable across a broad range of AI applications.

The project is carried out at the Jožef Stefan Institute through collaboration between:

1. The Computer Systems Department.
2. The Department of Knowledge Technologies.

Objectives

The project pursues the following objectives:

1. Increase trust in AI systems by developing a framework for smart-sized benchmarking.
2. Develop unified meta-representations of benchmark problems and datasets.
3. Design methods for selecting representative benchmark instances using clustering and graph-based techniques.
4. Define quantitative indicators of generalization to evaluate AI performance on unseen data.
5. Validate the framework across multiple AI domains, particularly optimization and time-series analysis.
6. Disseminate results through publications, workshops, training sessions, and open research outputs.

Implementation Phases

The project will be implemented over three phases.

Phase 1 — Foundations and Representation Learning

(Months 1–12)

Development of unified representations for benchmark datasets and optimization problems.

Phase 2 — Representative Benchmark Selection

(Months 7–24)

Development of clustering- and graph-based methods for selecting representative benchmark data and reducing dataset bias.

Phase 3 — Validation and Dissemination

(Months 19–36)

Validation of the framework through experimental studies, scientific publications, and dissemination activities.

Project Team

The DATA-TRUST project involves researchers from the Jožef Stefan Institute.

Dr. Tome Eftimov

Project Lead

Senior researcher specializing in machine learning, statistics, and operational research.

Prof. Dr. Barbara Koroušić Seljak

Senior Researcher

Computer Systems Department, Jožef Stefan Institute

Prof. Sašo Džeroski

Senior Researcher

Machine learning and AI systems

Gjorgjina Cenikj

Researcher

Postdoctoral researcher (meta-learning and graph theory)

Ana Nikolikj

Researcher

Young researcher (machine learning and optimization)

Sintija Stevanoska

Researcher

Young researcher (machine learning)

Jan Drole

Researcher

Master’s student (representation learning)

Project Information

Project Overview

Objectives

Implementation Phases

Phase 1 — Foundations and Representation Learning

(Months 1–12)

Phase 2 — Representative Benchmark Selection

(Months 7–24)

Phase 3 — Validation and Dissemination

(Months 19–36)

Project Team

Dr. Tome Eftimov

Prof. Dr. Barbara Koroušić Seljak

Prof. Sašo Džeroski

Gjorgjina Cenikj

Ana Nikolikj

Sintija Stevanoska

Jan Drole