Storage Resource Manager


The Storage Resource Management technology was initiated by the Scientific Data Management Group at Lawrence Berkeley National Laboratory and developed in response to growing needs of managing large datasets on a variety of storage systems.
Dynamic storage management is essential to ensure:
  1. prevention of data loss,
  2. decrease of error rates of data replication, and
  3. decrease of the analysis time by ensuring that analysis tasks have the storage space to run to completion.
There are already numerous examples where data from simulations running on leadership class machines were lost because they were not moved in time to a mass storage system. Storage Resource Managers address such issues by coordinating storage allocation, streaming the data between sites, and enforcing secure interfaces to the storage systems For example, in a production environment, using SRMs has reduced error rates of large-scale replication from 1% to 0.02% in the STAR project. Furthermore, SRMs can prevent job failures. When running jobs on clusters some of the local disks get filled before the job finishes, resulting in loss of productivity, and therefore a delay in analysis. This occurs because space was not dynamically allocated and previous unneeded files were not removed. While there are tools for dynamically allocating compute and network resources, SRMs are the only tool available for providing dynamic space reservation, guaranteeing secure file availability with lifetime support, and automatic garbage collection that prevents clogging of storage systems.
The SRM specification has evolved into an international de facto standard, and many projects have committed to use this technology, especially in the HEP and HENP communities, such as the Worldwide Large Hadron Collider Computing Grid that supports ATLAS and CMS.
The SRM approach is to develop a uniform standard interface that allows multiple implementations by various institutions to interoperate. This approach removes the dependence on a single implementation, and permits multiple groups to develop SRM systems for their specific storage resources. This approach became crucial to the interoperation of storage systems for such large scale projects that have to manage and distribute massive amounts of data efficiently and securely. Without such a unifying technology, such projects cannot scale, and are bound to fail. This problem will only grow over time as computing facilities move into the petascale regime.
Another important problem that SRMs address is storage clogging. Storage clogging is a critical problem for large scale shared storage systems, since the removal of files after they are used is not automated. This increases the cost of storage, and slows the analysis and discovery process. SRMs help unclog temporary storage systems, by providing lifetime management of accessed files. This capability is crucial to efficient usage of storage under cost constraints.
SRMs also serve as gateways to secure data access. By limiting external access to all storage systems through a standard SRM interface, one can assure not only authenticated access, but also the enforcement of authorized access to files.
The SRM technology was highly successful in SciDAC-1, and is currently used in production in several large collaborations. SRM implementations that interoperate have been developed at LBNL, FNAL and TJNAF, as well as several sites in Europe. Furthermore, this technology increases the scientist’s productivity by eliminating the tedious and time consuming tasks of managing storage, performing robust data movement, and dealing with security requirements at various storage sites.
In addition to leading the SRM standard development by coordinating with multiple institutions, the LBNL team has developed SRM systems to disk storage and mass storage systems, including HPSS. These SRMs have been used in several application domains, including multiple projects at the SDM center, Earth System Grid, the STAR experiment, and the Open Science Grid. As data sets continue to grow and become ever more complex, these projects depend on the continued development and support of the SRM implementations from LBNL. It is essential to capitalize on the SciDAC-1 successes and sustain current projects that depend on the SRM technology, further improving and deploying SRMs in additional projects and application domains, and continued evolution of the SRM standard. Specifically, based on past experience, we have identified important features that require further development and coordination. These include sophisticated aspects of resource monitoring that can be used for performance estimation, authorization enforcement, and accounting tracking and reporting for the purpose of enforcing quota usage in SRMs. Another aspect that needs further development is SRMs for multi-component storage systems. Such systems, made of a combination of multiple disk arrays, parallel file systems, and archival storage are becoming more prevalent as the volume of data that need to be managed grow exponentially with petascale computing.

Use of SRMs in real applications

The SRM interfaces have been cooperatively defined and multiple implementations developed in the US and Europe. LBNL has introduced the concepts and subsequently led a coordinated effort of defining a community-based common interface. Several implementations have been deployed in various applications including HEP, HENP, ESG as well as new application domains, such as Fusion simulation, biology, and others. Some specifics of SRM usage to date are:
List of Storage Resource Manager software: