PartnershipUpdated on 15 January 2026
A distributed (and federated) data mesh for scientific data and data derived artifacts
Postdoctoral researcher at Forschungszentrum Juelich GmbH
Bielefeld, Germany
About
This project proposes a distributed data mesh that exposes scientific data and data-derived artifacts (such as indexes, containers, reference datasets, and models) through a scalable, cache-optimised filesystem. It provides a single logical, read-oriented namespace that can be mounted as a POSIX file system across clouds, clusters, and HPC systems with minimal local storage and operational overhead.
The mesh is both distributed and federated: multiple data providers publish to a shared space, while institutions operate local caches and gateways near compute resources to ensure low-latency access and efficient bandwidth utilisation. This architecture enables reproducible workflows, stable references to data and artifacts, and consistent environments for large-scale, data-intensive analysis across sites.
Combined with rich metadata, PID, versioning, and policy-driven publishing, this data mesh forms a foundational layer for scientific platforms, workflow engines, and virtual research environments that require reliable, FAIR access to shared data products across organisational and national boundaries.
Many researchers repeatedly download the same datasets, even within a single group, and these often mirror identical data, and they also rebuild data-derived artifacts. This results in redundant storage, unnecessary network traffic, and additional compute to rebuild the same results. A shared, cache-based data mesh aims to serve common datasets and artifacts from a few well-placed, shared caches instead of many scattered copies, so data is fetched once and then reused close to where it is computed. By reducing redundant transfers and storage, this approach reduces energy use and hardware overhead, helping research infrastructures move toward greener, more resource-efficient operations.
We are looking forward to partnerships/collaborations. If you are interested, let's discuss.
Looking for
- Hosting
- Onboarding
- Co-development
- Use case
- Piloting
- Scientific workflows and services
- Other
Organisation
Similar opportunities
Product
Build a distributed data ecosystem with Onedata for seamless collaboration
- VRE
- Hosting
- Use Case
- Piloting
- Onboarding
- Co-development
- Federated sync-and-shares
- Federated Compute & Storage
- Scientific workflows and services
- Integrating scientific data repositories
- Service Catalogues, Interoperability, & Integration
Lukasz Opiola
IT System Engineer & Researcher @ Onedata.org (Cyfronet AGH) at Polish EOSC Node
Kraków, Poland
Service
Onedata for distributed data ecosystems supporting data & metadata management
- Federated sync-and-shares
- Federated Compute & Storage
- Integrating scientific data repositories
Lukasz Opiola
IT System Engineer & Researcher @ Onedata.org (Cyfronet AGH) at Polish EOSC Node
Kraków, Poland
Service
- Federated Compute & Storage
- Scientific workflows and services
- Service Catalogues, Interoperability, & Integration
Enol Fernández
Principal Software Architect at EOSC Data Commons
Amsterdam, Netherlands