einfra logoDocumentation
For Repository Administrators

Operating Repositories and Services in the NRP

This documentation covers use cases of running repositories and related services in the National Repository Platform (NRP).

In general, the reader should be familiar with current conditions for creating repositories in the NRP that describe the process from organisational view.

In this document, we describe a more technical level, we discuss procedures and models of dividing responsibilities in operation of services in the NRP. The basic division is whether you intend to use core repository systems (i.e. CESNET-Invenio, CLARIN-DSpace, ASEP/ARL) or a specific implementation of a repository or of a related service (we told you that you need to read the conditions linked above first!). In addition to that, there are subtleties depending on the level of customisation your user group requests for your repository even in case of core repository systems.

Generic NRP Architecture

To discuss its usage at this level, it is necessary to understand the basic architecture of the NRP.

NRP Storage

Storage layer of the NRP consists of Ceph clusters exporting S3 service. The clusters are geographically distributed throughout the Czech Republic. Data in the clusters is accessed by the repositories and services, on a conceptual level, the data is not for direct user access (there are technical exceptions when the repository creates a pre-signed request for the user, but 1. it is controlled by the repository, 2. it is an optimisation that doesn’t change the principles).

The clusters are implemented as “Ceph stretch clusters”, it means that the repository uses a single S3 endpoint and the cluster is responsible for replicating the data onto separate geographical locations in a requested number of copies. The data is thus made resilient against concurrent loss of a number of disks.

Current configuration (in 2026, i.e. on NRP1 and on emerging NRP2 clusters physically located in Ostrava and Ústí nad Labem) is 3 replicas in each location, 6 data replicas in total. It makes the data resilient to the loss of up to 5 disks and, e.g. to the total physical destruction of one of the sites. Additional physical locations will be added later.

For ensuring data integrity, the files are equipped with checksums. In addition to standard Ceph data scrubbing to ensure data consistency, the checksums are also periodically verified. Should a mismatch be detected, the situation is handled by the infrastructure operators.

For data confidentiality, the content of the disks is encrypted. While the infrastructure itself logically handles plain text of the files (unless additional layer of encryption is deployed for sensitive data), this ensures that the content of disks removed for replacement is completely unusable for whoever removes the equipment. Additional layer of protection is added by supplier contracts that require the suppliers to destroy the content of the disks before shipping to the manufacturer etc.

We call this concept a virtually reliable S3.

Again, this storage layer is typically used by the repository, not by the repository end user.

Note that due to the shear volume of the infrastructure, no standard backups are performed on the S3 storage, we rely solely on the properties of the stretch cluster.

Running Applications in NRP Containers

The applications such as repositories are run in Kubernetes clusters. The clusters are co-located with the storage facilities and take advantage of geographical distribution. Operation of containers running in NRP is resilient to outages of whole sites, just the throughput of the system would be reduced.

Repositories and Applications

As it is obvious at this point, repositories and other applications run in NRP Containers and store data to the NRP Storage.

Involved Personnel, Roles, and Responsibilities

The storage and containers of the NRP are operated by the infrastructure personnel, directly reachable through L3 and storage support channels.

The repository itself is managed by the repository administrator (you have read the conditions linked on the top, haven’t you? And note that the repo admin is a role and responsibility, not a single person!). Who exactly operates the technical part of the repository depends on many details. We provide a generic overview. Keep in mind that the details of the setup always need to be carefully negotiated for individual cases.

Running Specific Applications and/or Other Repository Implementations

For running services and repository implementations that are not directly supported by the NRP, the infrastructure offers the storage and environment to run containers only. The repository administrator must ensure the technical operation of the repository itself (including sufficient personnel to prepare the deployment and to guarantee a reasonable quality of service throughout the whole life cycle).

Running an instance of CLARIN-DSpace

FIXME

Running an instance of ASEP/ARL

FIXME

Running an instance of CESNET-Invenio

There are two basic models of operating CESNET-Invenio-based repositories that depend whether there are customised components developed and maintained by the repository administrator.

With a bit of simplification, the code base of a CESNET-Invenio-based repository consists of

  • the common base maintained by CERN,
  • common parts especially related to the NRP developed by CESNET (those together are known as CESNET-Invenio and may include components useful for various repositories),
  • components specific for a specific repository, such as
    • user interface styling (logos/colour schemes),
    • specific metadata model including metadata validation logic and the user interface especially for metadata input,
    • specific search support, data visualisation components, data validation,
    • anything else specific for the repository.

CESNET-Invenio Fully Managed by the NRP

In general, simple customisations like look&feel and small extensions of standard metadata models will usually be done by CESNET’s development team and the full code base of such repository instance will be completely managed by the infrastructure.

In that case, the repository can be fully managed by the NRP staff from development to full operation. It includes updates and ensuring compatibility of custom code. In other words, the repository administrator does not need to touch the code nor the deployment ever.

Highly Customised Instances of CESNET-Invenio Repositories

Should the repository administrator require more extensive customisations (large metadata models especially with complex verification logic, fully custom components), then the repository administrators must take part of their development.

In that case, the repository administrator becomes the top-level integrator of the repository instance. Then

  • the customised code must be kept in a git,
  • the NRP will supply test and staging environments for the repository administrator,
  • the NRP will provide their best effort to inform the repository administrator about emerging changes of the main CESNET-Invenio code that may affect compatibility and will provide documentation and recommendations how to handle development and operation of custom components,
  • but the repository administrator is responsible for compatibility of their custom components and for maintaining them (it cannot be expected that the NRP has capacity to take part in their development),
  • this is especially important in the (unlikely but possible) case when a security update must be performed on the infrastructure that cannot be reasonably done without breaking compatibility of CESNET-Invenio interfaces,
  • the production instance of the repository will be operated by the NRP team, i.e. when the repository administrators are happy to release a new version, they will ask the infrastructure support team to do so. The infrastructure team is responsible for day-to-day operation of the repository.

The repository administrators are always advised to discuss their particular cases with CESNET-Invenio developers to clarify status of intended development work (there may be ready components for some of the functionalities, there may be components that turn out universally useful so that the CESNET-Invenio team will implement those). Kindly note that the CESNET-Invenio team reserves the right to decide on the status of each feature in question.

Last updated on

publicity banner

On this page

einfra banner