1. Introduction
2. User Workflow
3. API Interfaces
4. Metadata
5. Experiences using SWORD
6. Usability
7. Conclusions
Links
<< prevnext >>

Bibliographic Metadata Formats

SWORD itself does not specify a metadata format should be used for the transfer. We needed to transfer citation information both ways between publicationslist.org and the repository, so some simple format for coding the metadata was needed.

We looked at a number of possible formats / encodings:

The format we chose is JSON using the field names / schema from bibtex. This combines the simplicity of json with the well established and tested bibtex schema, without any of the disadvantages associated with using latex. It also makes it easy to add extra fields without modifying any schema documents.

JSON-BIBTEX Bibliographic Metadata format

The selected bibliographic metadata format is based on bibtex coded using JSON [www.json.org]. This offers a rich, simple and well supported set of fields for representing citation metadata without the data loss associated with converting to dublin core. The fields are documented in the standard wikipedia entry on bibtex [en.wikipedia.org/wiki/BibTeX]. Most repositories already have routines to import / export to BibTex, which will include the mappings from their internal field names to the bibtex standard. The fields are:

  address: 
    Publisher's address (usually just the city, but can be the full address 
    for lesser-known publishers)
  author: 
    The name(s) of the author(s) (in the case of more than one author, 
    separated by and)
  booktitle: 
    The title of the book, if only part of it is being cited
  chapter: 
    The chapter number
  edition: 
    The edition of a book, long form (such as "first" or "second")
  editor: 
    The name(s) of the editor(s)
  eprint: 
    A specification of an electronic publication, often a preprint or a 
    technical report
  howpublished: 
    How it was published, if the publishing method is nonstandard
  institution: 
    The institution that was involved in the publishing, but not necessarily 
    the publisher
  journal: 
    The journal or magazine the work was published in
  month: 
    The month of publication (or, if unpublished, the month of creation)
  note: 
    Miscellaneous extra information
  number: 
    The "number" of a journal, magazine, or tech-report, if applicable. 
    (Most publications have a "volume", but no "number" field.)
  organization: 
    The conference sponsor
  pages: 
    Page numbers, separated either by commas or double-hyphens. For books, 
    the total number of pages.
  publisher: 
    The publisher's name
  school: 
    The school where the thesis was written
  series: 
    The series of books the book was published in (e.g. "The Hardy Boys")
  title: 
    The title of the work
  url: 
    The WWW address
  volume: 
    The volume of a journal or multi-volume book
  year: 
    The year of publication (or, if unpublished, the year of creation)

In addition to the standard fields, the following optional fields are used to provide extra database IDs:

  refid:       --- the client's ID for this entry
  urllink:     --- a link to the HTML version
  pdflink:     --- a link to the PDF
  keywords:    --- a list of tags, comma separated
  doi:         --- the DOI number
  pubmed:      --- the pubmed ID {optional}
  articletype: --- 'article', 'book', 'inproceedings' etc

To simplify parsing of the list of authors / editors, these are also supplied as structured JSON list fields, e.g.:

 author: "Bloggs, Joe M; Other, A.N.",
 editor: "A Z Other",
 author_list : [{first:"Joe", middle:"M", last:"Bloggs"},
                {first:"A", middle:"N", last:"Other"}],
 editor_list : [{first:"A", middle:"Z", last:"Other"}, {...} ]

The field 'type' is used to denote the type of entry:

article
    An article from a journal or magazine.
    Required fields: author, title, journal, year
    Optional fields: volume, number, pages, month, note
book
    A book with an explicit publisher.
    Required fields: author/editor, title, publisher, year
    Optional fields: volume, series, address, edition, month, note, pages
booklet
    A work that is printed and bound, but without a named publisher or 
    sponsoring institution.
    Required fields: title
    Optional fields: author, howpublished, address, month, year, note
conference
    The same as inproceedings, included for Scribe (markup language) 
    compatibility.
    Required fields: author, title, booktitle, year
    Optional fields: editor, pages, organization, publisher, address, month, note
inbook
    A part of a book, usually untitled. May be a chapter (or section or 
    whatever) and/or a range of pages.
    Required fields: author/editor, title, chapter/pages, publisher, year
    Optional fields: volume, series, address, edition, month, note
incollection
    A part of a book having its own title.
    Required fields: author, title, booktitle, year
    Optional fields: editor, pages, organization, publisher, address, month, note
inproceedings
    An article in a conference proceedings.
    Required fields: author, title, booktitle, year
    Optional fields: editor, pages, organization, publisher, address, month, note
manual
    Technical documentation.
    Required fields: title
    Optional fields: author, organization, address, edition, month, year, note
mastersthesis
    A Master's thesis.
    Required fields: author, title, school, year
    Optional fields: address, month, note
misc
    For use when nothing else fits.
    Required fields: none
    Optional fields: author, title, howpublished, month, year, note
phdthesis
    A Ph.D. thesis.
    Required fields: author, title, school, year
    Optional fields: address, month, note
proceedings
    The proceedings of a conference.
    Required fields: title, year
    Optional fields: editor, publisher, organization, address, month, note
techreport
    A report published by a school or other institution, usually numbered 
    within a series.
    Required fields: author, title, institution, year
    Optional fields: type, number, address, month, note
unpublished
    A document having an author and title, but not formally published.
    Required fields: author, title, note
    Optional fields: month, year 

Sample JSON-Bibtex file

The JSON format is a simple language-independent notation for structured data. It is based on a subset of the Javascript Object Notation and includes object definitions (field / value pairs), arrays, and simple data types (strings, numbers). The JSON-Bibtex structure is simply a set of name/value pairs:

{  
  "refid":"3",
  "type":"article",

  // Standard bibtex fields:
  "title":"Catalyzer: a novel tool for integrating, managing and publishing \
           heterogeneous bioscience data",
  "year":"2006",
  "author":"Howell, F W; Cannon, R C; Goddard, N H",
  "journal":"Concurrency Computat.: Pract. Exper.",
  "volume":"19",
  "number":"",
  "pages":"207-221",
  "month":"",

  // Extended bibtex fields:
  "doi":"10.1002\/cpe.1044",
  "pubmed":"",
  "pdflink":"",
  "urllink":"http:\/\/www3.interscience.wiley.com\/...",
  "abstract":"The integrative ambitions of systems biology - (...)",
  "note":"",
  "keywords":"XML, Semantic Web, e-Science"
}

Summary

We found the use of the 'JSON-Bibtex' format for exchanging data between publications list and ePrints to work well; most repositories already have facilities for importing / exporting bibtex, so the extra work needed to support a JSON encoding is not that great. The few extra fields we needed which are not part of standard BibTex were very easy to add to the JSON objects.

1. Introduction
2. User Workflow
3. API Interfaces
4. Metadata
5. Experiences using SWORD
6. Usability
7. Conclusions
Links
<< prevnext >>