Version 0.15 October 2012 (working draft)

Intended audience:  research managers and researchers, information managers and professionals


This Pathway is an introduction to the main areas of technical requirements and planning which will need to be addressed in order to develop a repository. The outcome of following this Pathway will be an understanding of the key resource areas which are necessary for the repository development, and the planning required. It introduces software and hardware requirements, metadata and interoperability, ontologies (thesauri), workflows, and the communication strategy necessary to keep other stakeholders in the institution informed and supportive of the development. Case studies are provided of successful institutional, national and thematic repository developments.

What do you need to do?

Planning and Resourcing:   Planning involves identifying what is needed and what policies the repository will have. Issues to be considered should include:

  • Software, hardware and interface design.
  •  Metadata requirements and interoperability.
  • Workflow and how this might fit with existing workflows in your institution.
  • Content type (what will the repository include? – research papers, technical papers, conference papers, training materials, books, field reports, etc). 
  • Quality control – who should be responsible, and how?
  • Archiving policies – what is to be archived and for how long?
  • Copyright and licensing issues.
  • Advocacy strategies.
  • ">

These requirements will lead to solutions which will be specific to your institution or network and its’ repository. Every institution or network will have its’ own range of requirements and solutions.

Hardware: You may decide to run your own repository on site, or to outsource to a hosting organization, or to join an existing network. If you are developing your own repository on site you will require hardware. Repositories are typically run on servers housed in an institution's computer room with air conditioning, networking, and so on.  Repositories can sit on dedicated servers, shared services (perhaps a shared web server) or as virtual machines on larger servers. It is likely that in the first few years of operation you will find that a quite basic or moderately specified server will be sufficient.

Software and platform:There are three main technical approaches to setting up a repository:

  1. You can program it yourself in-house - the Do It Yourself (DIY) approach;
  2. You can install a standard package in-house (EPrints from the University of Southampton and DSpace from MIT, are two widely used off-the-shelf  packages);
  3. You can have the repository hosted with an external service provider.  

Each of these options has its pros and cons which are discussed in more detail by the Repositories Support Project. These pages also present comparisons of different software packages available for repository management. The Information Management Resource Kit (IMARK) also covers many aspects of the technical development of repositories. See in References below. 

Metadata and interoperability: You will also need to consider metadata (how your data is ‘described’), the file formats in which you store your data, and how your data is harvested by search engines. These issues are central to how interoperable your data is with that held by other repositories around the world.  

Metadata are structured data that provide a short summary about any information resource, print or electronic, and facilitate the description, navigation, presentation, administration, and preservation of that resource.

Metadata describes your content and means therefore that it can be found more efficiently. Categorization of content using controlled keywords (or indexing) is necessary for effective metadata – this is best done using accepted agricultural thesauri such as Agrovoc or CAB Thesaurus.  

Metadata is also important in the requirement to access content across many digital repositories. The Open Archives Initiative (OAI) has defined a mechanism (OAI-PMH) to harvest metadata from repositories to make it available from a central service.

There is detailed coverage of metadata standards and interoperability and advice and information on them  in the Information Management Resource Kit (IMARK) – see in References below. 

Workflow:You should assess whether other workflows for publications and metadata already exist in your organization. It may be that there are established cycles of information management which can be adapted to fit with the operation of the repository. It may also be that metadata about your content is already collected somewhere (for instance in the library) and this could be integrated with the repository.

Selection of material: What material are you going to deposit in the repository and who controls this? Submissions need to fit within the agreed categories of publication (peer reviewed articles, contributions to professional journals, reports, books, etc.).  Publications in external journals are usually included although the full-text can sometimes not be made accessible immediately because of copyright restrictions.  

Value Added Services: A repository can be useful to users in various ways by offering value added services. These may facilitate submission of articles to journals, provide alerting services, and offer citation links. Further, metadata may be submitted to internet search engines and other services. Or linkage may be provided between different elements within the repository so as to provide an integrated information service to both internal and external users. 

Communication and marketing:  It is important that the repository is used and populated by the staff on your site. For this to succeed your institution should carry out an advocacy campaign to raise awareness and usage. You should consider benchmarking the current status of information management in your institution or network. The results of this analysis will guide you in the development of a marketing campaign and workshops and other training opportunities for staff. 

Case Studies

Summaries of the development and management of the following repositories are given in accompanying documents: 

References

These web sites and resources contain a wide range of useful information about repositories, technical approaches to creating repositories, and other resources to help those who are creating a repository: 

  • Digital Libraries, Repositories and Documents - Learning module.  This updated module covers a wide range of information and instruction on planning and building repositories and the tools needed for their technical management
  • CONTENTdm from OCLC is a repository development software package
  • DSpace provides background and advice on repository development and provides an off the shelf software package
  • EPrints, from Southampton University, provides background and advice on repository development, and off-the-shelf software
  • Information Management Resource Kit (IMARK) Module on ‘Digitization and Digital Libraries’, Lesson 3.1 - General Overview of Metadata Standards, covers metadata and interoperability
  • Open Access Scholarly Information Sourcebook (OASIS) gives information and guidance on open access and repositories
  • RSP (Repositories Support Project) provides detailed information and advice on the issues in this Pathway and provides further case histories of successful repository developments; In particular these pages give detailed guidance on creating a repository, and these provide detail on technical approaches and software options.