1. Introduction
2. User Workflow
3. API Interfaces
4. Metadata
5. Experiences using SWORD
6. Usability
7. Conclusions
Links
<< prevnext >>

API Interfaces used

The SWORD interface uses the Atom publishing protocol for one task: depositing single items plus metadata into a repository. It does not specify which metadata format (e.g. MODS, bibtex, JSON) or packaging format (e.g. .zip, METS) should be used, however, so it is not sufficient to know a repository has a SWORD module: it must have a SWORD module which can interpret the particular metadata and packaging format you need to send.

We needed a number of machine-machine interfaces in addition to SWORD to implement the workflow connecting publicationslist.org to the DEPOT. The interfaces we required were:

SWORD was sufficient for item 1, and could be extended simply for item 2: but all the others needed new interfaces; it is likely that these will be requirements for most other systems which deposit too, as 'fire and forget' is not sufficient.

1. Single item deposit using SWORD

For single item deposit using SWORD, we used the sample Java SWORD_1.0 client written by Stuart Lewis, while working at Aberystwyth University:

java -cp $CLASSPATH org.purl.sword.client.ClientFactory 
   -cmd -t post -file "$ZIPFILE" -href "$SWORDURL" -formatNamespace JSON 
   -filetype application/zip -u $PLUSER -p $PLPASS -onBehalfOf $REPOUSER

The item was packaged as a .zip file containing METS xml, JSON-bibtext metadata, and the PDF full text version (if available).

We created a privileged user on the repository for publicationslist.org; this user was permitted to submit papers on behalf of other users, so to use SWORD we just needed to know a researcher's user name on the repository system.

Issues we encountered:

Despite these issues, and the somewhat experimental nature of the sample SWORD module for ePrints, we were able to get single item submission working via SWORD.

2. Sending updates / corrections to repository entries

The initial version of SWORD specifically excluded support for updates to existing entries - but this feature was essential for our use, as users would correct / update metadata entries over time, and attach full text some time after initially entering metadata. We implemented this feature by setting a eprintid field when sending an updated version of metadata for an item. This was tied in to the ePrints feature for versioning - which creates a new repository ID for the new version of the item, and maintains copies of all versions. It would be better if updates were supported in SWORD and if versioning was handled more elegantly by ePrints than at present.

3. Fetching status of deposited items

SWORD is only concerned with sending single items, but we needed to be able check the status of items deposited by a user, i.e. do a metadata fetch from the repository. Many repositories have an approval process; users submit items, repository staff check and correct the metadata some time later. The publicationslist needs to be updated with the correct links once a repository item has been accepted. For this we needed an extra interface: "fetch latest status of items deposited by user X"

We implemented this as a new module for ePrints which returned the status of deposited items as an Atom feed. Sample call using CURL to do a HTTP GET with basicauth (shown on multiple lines for clarity):

  curl -u $user:$pass "http://sample.repository.edu/m2m/PLOrg?
       output=PLOrg_Status            // Output format requires (Atom/JSON-bibtex)
       &user=a.n.other"               // Repository user name

The return value is an ATOM feed like below:

<?xml version="1.0" encoding="utf-8" ?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>the Depot: Authors/Editors is "howell" </title>
  <link href="http://art-abstracts.edina.ac.uk/"></link>
  <updated>2009-03-19T13:47:57Z</updated>
  <id>tag:repository.edina.ac.uk,2009:feed:PublicationsList.Org - status list</id>
  <entry>
    <id>tag:repository.edina.ac.uk:item:/391</id>
    <category>accepted</category>
    <link>repository.edina.ac.uk/391</link>
    <updated>2009-03-03T15:01:52Z</updated>
    <content>
{
  "eprintid":"391",
  "succeeds":"384",
  "author":"Howell, F W; Stuart, I M",
  "type":"article",
  "abstract":"....",
  "title":"Tuning the Lucas Distributor"
  // ... for all fields see metadata page.
}
    </content>
  </entry>
  <entry>
    <!-- ... more entries -->
  </entry>
</feed>

We chose an Atom feed as the format of the response, partly because repositories already include support for generating Atom output. For the structured metadata for each article, we chose a JSON encoding, sent in the content field of each Atom entry. This is simpler and more flexible than devising a new XML format for encoding bibliographic metadata.

We used the standard Atom fields id, updated. We use the category field to send the status of the deposited item (accepted, rejected, pending).

4. Search repository for existing items by author

Before submitting papers, it would be useful if a researcher could search the local repository (or even all repositories) to see which of their papers have already been stored by a co-author. This would ideally involve a search and select interface where the researcher can look for publications matching their name, and select the ones which are really theirs (rather than by another author with the same name). A similar interface for searching the PubMed database using the NCBI eUtils web service has proven popular for publications list users [publicationslist.org/pubmed.html]

The interface needs to return rich bibtex-style metadata if the resulting data is to be used to populate a publications list - dublin core (e.g. as returned by OAI-PMH) is not sufficient.

We developed an additional module for ePrints to perform this task, which does a simple search on author name and returns an Atom feed with rich JSON metadata for the items.

curl -u $user:$pass "http://repository.edina.ac.uk/cgi/m2m/authors?
                     name=einstein,a"    // Author name supplied as surname,initials

This returns Atom / JSON format search results in the same format as searches by depositor ID (section 3 above). This machine interface would be sufficient for implementing a 'search/select/import' GUI similar to the one used for PubMed for importing data from a repository into publicationslist.org, but we didn't implement a GUI for this within the scope of this project. A production implementation of such a search API would need additional parameters to paginate return values for very large numbers of search results (only likely to be an issue for subject repositories spanning more than one university).

5. Creating repository accounts using web services

Users are already authenticated with PublicationsList.org, and typically will not already have an account on their institutional repository. We wanted to be able to automate account creation, so users could just press a 'Click to create repository account' link and not have to fill out registration forms. We created a new module create_acc to create user accounts on the repository.

 curl -u $user:$pass "http://repository.edina.ac.uk/cgi/create_acc
    ?username=test1000             // suggested repository user name
    &email=test1000@example.com    // user's email
    &password=0DMwbBTkPUz8I"       // user's password for new account (crypt() version)

The return value is a simple XML document: on success:

<returns>
  <return>
    <result>OK</result>
    <message>User created</message>
    <code>0</code>
    <email>test1001@example.com</email>
    <username>test1000</username>
  </return>
</returns>

If either user name or email is in use, it returns error messages:

<returns>
  <return>
    <result>err</result>
    <message>email in use: test1000@example.com => test1000</message>
    <email>test1000@example.com</email>
    <username>test1000</username>
    <code>3</code>
  </return>
  <return>
    <result>err</result>
    <message>username in use: fwh => fred@textensor.com</message>
    <email>fred@textensor.com</email>
    <username>fwh</username>
    <code>2</code>
  </return>
</returns>

A user interface can use these return values to prompt the user for a different user name (if it is taken), or to find out the existing repository user name for the user's email (if they already have a repository account).

6. List of the interface modules used

A sample JSON configuration file is given below: these are the web services interfaces needed to configure PublicationsList.org to use a particular repository. One of the links is the SWORD module (the depositurl); the others are for the other machine interfaces needed to implement the workflow.

{
  // The main website of the repository:
  "url" :              "http://repository.edina.ac.uk/",

  // A short description:
  "desc" :             "The DEPOT",

  // The URL for users to create accounts manually:
  "registerurl" :      "http://repository.edina.ac.uk/cgi/register",

  // The SWORD endpoint for deposit:
  "depositurl" :       "http://repository.edina.ac.uk/app/deposit/archive",

  // The interface for getting status of items deposited by a user
  "statusurl" :        "http://repository.edina.ac.uk/cgi/m2m/PLOrg?output=Atom_JSON&user=",

  // The URL for creating new repository accounts
  "createaccounturl" : "http://repository.edina.ac.uk/cgi/m2m/create_acc",
  
  // The URL for searching repository by author name
  "authorsearchurl" :  "http://repository.edina.ac.uk/cgi/m2m/authors?output=Atom_JSON&name=",

  // The username / password for admin user on repository:
  "pluser" :           "admin",
  "plpass" :           "xxxxx"
}

7. Summary

In order to implement a smooth workflow for maintaining publications list on publicationslist.org and synchronising it with deposits in a repository, we were able to use SWORD but also needed a small number of extra machine interfaces for fetching status of deposited items from the repository. We also used a simple JSON encoding for sending bibtex-style metadata across the wire, along with Atom feeds.

1. Introduction
2. User Workflow
3. API Interfaces
4. Metadata
5. Experiences using SWORD
6. Usability
7. Conclusions
Links
<< prevnext >>