| next >> |
The EM Loader project ("Extracting Metadata to Load for Open Access Deposit") is a collaboration between Textensor Limited, the developers of PublicationsList and EDINA, developers of The Depot, a UK-wide repository for academic papers. The project was funded by the JISC research council [EM-Loader page on JISC website] in order to investigate ways of using convenient researcher-friendly front-ends like PublicationsList as a way of making it easier for researchers to add their papers to a repository.
The project used web services including SWORD to deposit into The Depot's installation of ePrints - but the same interface will also work with other repository systems such as DSpace or Fedora.
EM-Loader project contacts: Fred Howell
(Textensor & publicationslist.org);
Ian Stuart and Theo Andrews (EDINA & The Depot)
Project home page: http://publicationslist.org/em-loader/
Summary
The EM-Loader project demonstrates a new workflow for batch deposit of a researcher's publications to an institutional repository using the SWORD interface, with a sample implementation connecting PublicationsList to the Depot's installation of ePrints.
Researchers have not had a particular reason to send their papers to institutional repositories to date, and as a result many repositories are mostly empty of content, or only contain a small fraction of a university's research output. The user interfaces for depositing into repository software (such as eprints / dspace) have not helped either, forcing users go through multiple data-entry screens to enter bibliographic metadata by hand for each paper, acting as a deterrent for all but the most enthusiastic advocates of open access.
The EM-Loader project addresses two key problems:
the motivation problem by focussing on providing tools for saving a researcher time in maintaining their personal web page listing their publications.
the usability problem by importing metadata in batch mode where possible (e.g. from PubMed, bibtex, endnote, web of knowledge etc) and then sending all items to the repository in batch mode too with minimal manual intervention
Sites like publicationslist.org which provide an easy-to-use service to help researchers maintain their own web page attract researchers without requiring mandates because they save people time doing something which is part of their normal workflow. Every researcher needs an academic web page with their publications. Researchers also put effort into their profile pages on networking sites like LinkedIn, Facebook (and also academic projects such as myExperiment). If a user's profile page can be made the place where a researcher maintains a list of publications (with structured bibliographic data), then this can be sent in batch mode to appropriate repositories with minimal extra effort.
The lessons learnt are not specific to PublicationsList.org or the Depot's installation of the ePrints repository, and offer a path to substantially more research materials being deposited into repositories.
Why would a researcher want to store their publications in an institutional repository in the first place?
Many institutions have set up or plan to set up institutional repositories, expecting or hoping that their staff members will store copies of their papers there, with several aims for the institution, including showcasing and improving access to the university's research output; making measurements of research impact easier; and making open access versions of preprints available.
These repositories typically represent a significant investment in public money to set up, customise and maintain, but to date only a few of these repositories have been used much (e.g. see Innkeeper at the Roach Motel for a discussion of the issues). Some institutions are introducing mandates in an attempt to force staff to submit papers to their repository, but academics don't often respond well to being forced to do anything, especially if there are no immediate benefits to them.
In some fields researchers submit preprints to subject-based repositories such as arXiv.org which offer useful services such as email alerts; and PubMed Central (which is mostly auto-populated by open access journals rather than author submission). There is a clear reason for submitting preprints to a subject repository - it is likely to make it more visible to colleagues, who already go to that repository to check for new research.
Most researchers also store copies of papers (or at least the bibliographic details) on their personal or group web pages, to present their own work (without being compelled to do so - nobody forced the thousands of users of PublicationsList.org or academia.edu to sign up and upload their papers.).
So what is an institutional repository for, from a researcher's point of view? The collection of papers from people who happen to be at a single institution is not a particularly useful set in itself for searching, so one wouldn't expect many to use the search facilities of a given institution's repository. Searching across an index of all repositories in the world (were they populated), e.g. via Google Scholar would be a useful way to find research, so "improving the pagerank of your papers in Google" may well be a motivation to storing papers in a repository, especially as university repository sites typically rank higher than individual staff web pages in search results.
The likely research benefits which might prompt researchers to deposit (in an institutional or subject repository) are:
-
Carrot 1: Improve search index rank for your papers, making it easier for others to find and cite your work
-
Carrot 2: Longer term storage than on your personal web page, backed by institutional funding / commitment
Reasons for sending items to an institutional repository specifically could include:
Stick 1: Forced to do so by an institutional or funder open access mandate
Stick 2: Forced to do so as part of a research assessment exercise
Carrot 3: Once they are in the repository, they will appear in departmental website automatically, (iff the department has integrated the repository with their website)
It is just as important to consider the reasons why a researcher would not use their institution's repository - if they have a bad user experience the first attempt they are likely to ignore any mandate attempts in future.
Rather plain look of repository websites (although this hasn't proved a problem for arXiv.org which has a basic look but the functionality of searching and browsing works well)
Effort required to enter metadata by hand into poorly designed forms, or multiple pages required just to submit a single article, a problem afflicting most repository systems including ePrints and DSpace. See a comparison of ePrints and PublicationsList.org.
Not being able to submit relevant work because of repository restrictions (e.g. journal articles only)
Uncertainty over journal copyright issues
Barriers to setting up an account in the first place
If the aim is open access to research for those not members of rich institutions, then as long as articles are deposited it doesn't really matter where they are sent initially - website, departmental publications database, university repository, subject repository - as long as it is then possible for other repositories to harvest their own copies of material and metadata without requiring too much extra effort.
Which comes first - the web page or the repository entry?
The traditional approach would be to view generation of publications lists by author as an output of the repository database rather than as an input stage. Generating output lists from the repository is indeed a useful function, but it doesn't answer the question of how authors get their structured bibliographic metadata and full text into the repository in the first place.
Making the researcher's web page the focus of deposit has a number of advantages:
part of existing workflow - every researcher needs an academic page anyway
immediate reward - the researcher can see their web page immediately rather than having to wait for approval of repository submissions or for an overnight batch script to regenerate pages.
complete control - no need to conform to any restrictions a repository places on types of content
a manageable initial step - once papers plus structured metadata have been gathered, they can be sent to / fetched by the repository with minimal extra effort
author name disambiguation - an author will only add their own papers to their page, so entries are tied to particular authors. This has been a problem with automatic generation of pages from repositories where multiple authors share the same name.
more choice - a researcher could choose from a number of services to maintain their web publications list, rather than being forced to use whichever repository interface their institution has chosen.
not tied to a particular institution - researchers move between many universities during their career, but they just need one publications list / CV.
encourages competition between academic web page providers - researchers can choose the one with the most convenient user interface for deposit
quicker and easier than existing repository deposit interfaces - no need to manually enter metadata information into forms
| next >> |
Twitter
Delicious
Facebook
StumbleUpon