1. Introduction
2. User Workflow
3. API Interfaces
4. Metadata
5. Experiences using SWORD
6. Usability
7. Conclusions
Links
next >>

EM-Loader Project Documentation

...or Click to send your publications to the repository

The EM Loader project ("Extracting Metadata to Load for Open Access Deposit") is a collaboration between Textensor Limited, the developers of PublicationsList and EDINA, developers of The Depot, a UK-wide repository for academic papers. The project was funded by the JISC research council [EM-Loader page on JISC website] in order to investigate ways of using convenient researcher-friendly front-ends like PublicationsList as a way of making it easier for researchers to add their papers to a repository.

The project used web services including SWORD to deposit into The Depot's installation of ePrints - but the same interface will also work with other repository systems such as DSpace or Fedora.

EM-Loader project contacts: Fred Howell (Textensor & publicationslist.org); Ian Stuart and Theo Andrews (EDINA & The Depot)
Project home page: http://publicationslist.org/em-loader/

Post to: Twitter    Delicious    Facebook    StumbleUpon

Summary

The EM-Loader project demonstrates a new workflow for batch deposit of a researcher's publications to an institutional repository using the SWORD interface, with a sample implementation connecting PublicationsList to the Depot's installation of ePrints.

Researchers have not had a particular reason to send their papers to institutional repositories to date, and as a result many repositories are mostly empty of content, or only contain a small fraction of a university's research output. The user interfaces for depositing into repository software (such as eprints / dspace) have not helped either, forcing users go through multiple data-entry screens to enter bibliographic metadata by hand for each paper, acting as a deterrent for all but the most enthusiastic advocates of open access.

The EM-Loader project addresses two key problems:

Sites like publicationslist.org which provide an easy-to-use service to help researchers maintain their own web page attract researchers without requiring mandates because they save people time doing something which is part of their normal workflow. Every researcher needs an academic web page with their publications. Researchers also put effort into their profile pages on networking sites like LinkedIn, Facebook (and also academic projects such as myExperiment). If a user's profile page can be made the place where a researcher maintains a list of publications (with structured bibliographic data), then this can be sent in batch mode to appropriate repositories with minimal extra effort.

The lessons learnt are not specific to PublicationsList.org or the Depot's installation of the ePrints repository, and offer a path to substantially more research materials being deposited into repositories.

Why would a researcher want to store their publications in an institutional repository in the first place?

Many institutions have set up or plan to set up institutional repositories, expecting or hoping that their staff members will store copies of their papers there, with several aims for the institution, including showcasing and improving access to the university's research output; making measurements of research impact easier; and making open access versions of preprints available.

These repositories typically represent a significant investment in public money to set up, customise and maintain, but to date only a few of these repositories have been used much (e.g. see Innkeeper at the Roach Motel for a discussion of the issues). Some institutions are introducing mandates in an attempt to force staff to submit papers to their repository, but academics don't often respond well to being forced to do anything, especially if there are no immediate benefits to them.

In some fields researchers submit preprints to subject-based repositories such as arXiv.org which offer useful services such as email alerts; and PubMed Central (which is mostly auto-populated by open access journals rather than author submission). There is a clear reason for submitting preprints to a subject repository - it is likely to make it more visible to colleagues, who already go to that repository to check for new research.

Most researchers also store copies of papers (or at least the bibliographic details) on their personal or group web pages, to present their own work (without being compelled to do so - nobody forced the thousands of users of PublicationsList.org or academia.edu to sign up and upload their papers.).

So what is an institutional repository for, from a researcher's point of view? The collection of papers from people who happen to be at a single institution is not a particularly useful set in itself for searching, so one wouldn't expect many to use the search facilities of a given institution's repository. Searching across an index of all repositories in the world (were they populated), e.g. via Google Scholar would be a useful way to find research, so "improving the pagerank of your papers in Google" may well be a motivation to storing papers in a repository, especially as university repository sites typically rank higher than individual staff web pages in search results.

The likely research benefits which might prompt researchers to deposit (in an institutional or subject repository) are:

Reasons for sending items to an institutional repository specifically could include:

It is just as important to consider the reasons why a researcher would not use their institution's repository - if they have a bad user experience the first attempt they are likely to ignore any mandate attempts in future.

If the aim is open access to research for those not members of rich institutions, then as long as articles are deposited it doesn't really matter where they are sent initially - website, departmental publications database, university repository, subject repository - as long as it is then possible for other repositories to harvest their own copies of material and metadata without requiring too much extra effort.

Which comes first - the web page or the repository entry?

The traditional approach would be to view generation of publications lists by author as an output of the repository database rather than as an input stage. Generating output lists from the repository is indeed a useful function, but it doesn't answer the question of how authors get their structured bibliographic metadata and full text into the repository in the first place.

Making the researcher's web page the focus of deposit has a number of advantages:

1. Introduction
2. User Workflow
3. API Interfaces
4. Metadata
5. Experiences using SWORD
6. Usability
7. Conclusions
Links
next >>