Abstract: The Open Notebook Science Solubility Challenge
Solubility is an important consideration for many chemistry applications. Synthetic chemists usually use a solvent to perform reactions and knowledge of the solubility of the starting materials or products can be very useful to pick an appropriate solvent. Analytical chemists can use solubility to design separation techniques and factor in dynamic range considerations. Physical chemists can create and evaluate their models of how molecules interact in the solubilization and precipitation processes.
Solubility data can be obtained from a variety of online and offline sources. As with all chemical data, it can be a challenge to evaluate reported measurements. Some databases offer no references while others provide citations to peer reviewed journal articles. Given the choice, more weight is generally given to the latter. This is reasonable in most cases because more information about the purity of compounds and the methods used are available in peer-reviewed articles.
However, the information for how a specific measurement was obtained within a journal article is not generally provided. General methods are provided but the raw data for a specific measurement are typically not published. Peer review is not intended to validate individual measurements - its function is to ensure that the authors made appropriate conclusions based on their processed datasets and the state of knowledge in the field.
The Open Notebook Science Challenge was initiated in the fall of 2008 as the result of a discussion on a train in the UK between Jean-Claude Bradley and Cameron Neylon.[1,2] The concept was very simple: create a crowdsourcing opportunity for the chemistry community to contribute solubility measurements under Open Notebook Science conditions. This method of publication entails providing immediate public access to the chemist's laboratory notebook, as well as all raw data used to compute the measurements.[3,4]
On Sept 3, 2008 the first ONSC measurements were recorded by Bradley and Neylon at the University of Southampton in Neylon's laboratory.[5] The project was soon sponsored by Submeta, offering ten $500 awards for students in the US or the UK who best recorded how they performed their experiments.[6] Furthermore, the first 3 winners also received one year subscriptions to Nature magazine, thanks to a sponsorship from the Nature Publishing Group.[7] Sigma-Aldrich supported the contest by donating chemicals upon request.[8]
Students were evaluated by a group of judges who convened once a month to deliberate the next award. Judges also provided feedback to the students by commenting on their lab notebook pages directly on the wiki. Their expertise ranged from chemistry to mathematics, spectroscopy and molecular biology.
Techniques
Participants in the ONS Challenge were not required to use a specific method to measure solubility - although they were required to properly document their experiments and analyses. Due to its simplicity, most measurements in the past year were made using the SAMS NMR technique, requiring no volume measurement or calibration curves.[9] Two assumptions are made with this method. The first is that the volume of solute and solvent are additive, with the error becoming negligible at low solubility values. The second is that NMR integration values are proportional to the amount of solvent and solute. Some deviations from this have been observed for default NMR parameters and in later experiments long relaxation times are introduced into the protocol (D1 = 50s).[10]
Data Curation
Since an Open Notebook approach is used in this work, those interested in the validity of the measurements can assess the methods used - both for the preparation of saturated solutions and the raw data from the measurements. Over time, values in the database are likely to improve and possibly some errors may be uncovered and corrected. However, on the whole, we feel that the values provided in this work should be of use to chemists trying to gain an appreciation of solubility for most applications. This is especially the case for values that are not obtainable from any other source.
When clearly erroneous data points are discovered, they are flagged in the database as "DONOTUSE". This way interfaces with the dataset can ignore these values while allowing anyone to investigate why the data points were flagged. This might happen when early experiments did not allow for sufficient mixing or NMR D1 relaxation times were long enough to fully integrate peaks of interest. Out of 681 reported measurements, 51 are currently marked in this way. A shared Google Spreadsheet is used to collect and curate the dataset. This allows easy data entry while providing a simple way to interrogate the database for visualization applications via the Google API.[11]
Literature data and format conversions
An additional 400 solubility measurements from the literature are included in the database. These generally correspond to compounds that are structurally identical or similar to the compounds measured by the ONS Challenge participants. These values are averaged in with the values from the participants, with appropriate references provided. In order to compare values, conversions from molar fraction or g solute/100g solvent to molarity were made by assuming that the volumes are additive and obtaining the density of the solutes in most cases from the predicted values in ChemSpider.[12]
For the convenience of chemists with diverse applications, all three formats are provided. For the cases where solutes are miscible with the solvent, the molarity reported is simply the solute's density. The practical interpretation of this is that solutions of any molarity below the solute's density can be prepared.
In the process of converting units and averaging heterogeneous data sources, no attempt has been made to track significant figures. Those interested in any information about the precision of measurements should consult each individual data source. This may not be an easy task for measurements only carried out once and where factors such as the quality of spectral peaks and baselines are not optimal.
This collection will be most valuable for those who do not require highly precise measurements for their applications. For example, synthetic chemists can easily use rough estimates of solubility to select appropriate solvents for a reaction. In any case, one would be wise to consider all measurements as provisional, regardless of the source. As more data are collected, subsequent editions of this book will adjust values accordingly.
Searching the database
The values in this database can be accessed and filtered in various ways. More information is available at the ONS Challenge wiki[13] and Chapter 16 of the book "Beautiful Data".[14]
Database version
Archived as Excel Spreadsheet by WebCite on December 11, 2009.[15]
References
[1] Bradley, JC Open Notebook Science Challenge, UsefulChem blog (2008) http://usefulchem.blogspot.com/2008/09/open-notebook-science-challenge.html
[2] Open Notebook Science Challenge Wikipedia entry http://en.wikipedia.org/wiki/Open_Notebook_Science_Challenge
[3] Bradley, JC Open Notebook Science, Drexel CoAS E-Learning Blog (2006) http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html
[4] Open Notebook Science Wikipedia entry http://en.wikipedia.org/wiki/Open_Notebook_Science
[5] Bradley, JC; Neylon, C UsefulChem Experiment 207 http://usefulchem.wikispaces.com/Exp207
[6] Bradley, JC Submeta Open Notebook Science Awards, UsefulChem Blog (2008) http://usefulchem.blogspot.com/2008/11/submeta-open-notebook-science-awards.html
[7] Bradley, JC Nature Sponsors Open Notebook Science, UsefulChem Blog (2008) http://usefulchem.blogspot.com/2008/11/nature-sponsors-open-notebook-science.html
[8] Bradley, JC Sigma-Aldrich First Official Sponsor of Open Notebook Science Challenge, UsefulChem Blog (2008) http://usefulchem.blogspot.com/2008/09/sigma-aldrich-first-official-sponsor-of.html
[9] Bradley, JC Semi-Automated Measurement of Solubility, UsefulChem Blog (2009) http://usefulchem.blogspot.com/2009/03/semi-automated-measurement-of.html
[10] Bradley, JC NMR Integration Progress for Solubility Measurements, UsefulChem Blog (2009) http://usefulchem.blogspot.com/2009/06/nmr-integration-progress-for-solubility.html
[11] Bradley, JC Interactive Visualization of ONS Solubility Data, UsefulChem Blog (2009) http://usefulchem.blogspot.com/2009/01/interactive-visualization-of-ons.html
[12] ChemSpider database http://www.chemspider.com
[13] ONS Challenge List of Experiments Page http://onschallenge.wikispaces.com/list+of+experiments
[14] Bradley, J.-C.; Guha, R.; Lang, A.S.I.D.; Lindenbaum, P; Neylon, C.; Williams, A.J. & Willighagen, E. Chapter 16: Beautifying Data in the Real World from Beautiful Data. O'Reilly Media, Eds: Segaran, T. & Hammerbacher, J. (2009)
[15] Bradley, Jean-Claude; Lang Andrew. Solubilities Summary Sheet. Open Notebook Science Challenge. 2009-12-11. URL:http://spreadsheets.google.com/pub?key=plwwufp30hfq0udnEmRD1aQ&output=xls. Accessed: 2009-12-11. (Archived by WebCite® at http://www.webcitation.org/5lx5ry3BV)
Abstract: Background
The Blue Obelisk movement was established in 2005 as a response to the lack of Open Data, Open Standards and Open Source (ODOSOS) in chemistry. It aims to make it easier to carry out chemistry research by promoting interoperability between chemistry software, encouraging cooperation between Open Source developers, and developing community resources and Open Standards.
Results
This contribution looks back on the work carried out by the Blue Obelisk in the past 5 years and surveys progress and remaining challenges in the areas of Open Data, Open Standards, and Open Source in chemistry.
Conclusions
We show that the Blue Obelisk has been very successful in bringing together researchers and developers with common interests in ODOSOS, leading to development of many useful resources freely available to the chemistry community.
Abstract: The multi-user virtual environment of Second Life is not limited to the realms of role-play or romance. It can be a platform for the interactive and collaborative visualisation of molecules, spectra and experimental data.
Abstract: The developing “information age” is continually unraveling new ways of discovering, presenting and sharing information. Most new academic material is digitally formatted upon its creation and is thus easy to find and query. However, there remains a good deal of material from times prior to the “information age” that has yet to be converted to digital form. Much of this material can be found in library collections—whether academic, public or private—and thus remains available only to a limited number of locals or willing-and-able sojourners. Using OCR technology, most typeset documents can be digitized and made available online; and there are several projects underway to do exactly this. However, there remains little to be done for handwritten materials. Those who own collections of handwritten documents are increasingly wanting to make the content thereof available to the general public. Unfortunately, traditional transcription models typically prove to be expensive or inefficient and pdf snapshots are not searchable. We have developed a model for digital transcription using Google Docs and Amazon’s Mechanical Turk. Using this model, one can use an online workforce to efficiently transcribe handwritten texts and perform quality control at a cost much lower than professional transcription services. To illustrate the model we used Amazon’s Mechanical Turk to transcribe and then proofread the Frederick Douglass Diary which we have made available on a public searchable wiki. The total cost of transcription and proofreading for the 72 page diary was less than $25.00 with some pages being transcribed and proofread for as little as $0.04. Our results show that using Amazon’s Mechanical Turk holds great promise for providing an affordable transcription method for hand-written historical documents making them easily sharable and fully searchable.
Abstract: This article describes the benefits and drawbacks of using Wolfram|Alpha as the platform for teaching calculus concepts in the lab setting. It is a result of our experiences designing and creating an entirely new set of labs using Wolfram|Alpha. We present the reasoning behind our transition from using a standard computer algebra system (CAS) to Wolfram|Alpha in our differential and integral calculus labs, together with the positive results from our experience. We also discuss the current limitations of Wolfram|Alpha, including a discussion on why we still use a CAS for our multivariate calculus labs.
Abstract: We demonstrate the usefulness of Second LifeTM as a platform for enlivening major concepts in chemistry education. These concepts include absorption spectra, selection rules, quantum numbers, and atomic orbital shapes. We have built several exhibits in Second Life which provide 3-dimensional interactivity for each of those areas: an interactive experiment showing the absorption spectrum of hydrogen, an interactive model of selection rules showing allowed and forbidden transitions for each state, a 3-dimensional grid of orbitals showing the constraints on the values of quantum numbers, and a large-scale interactive orbital display allowing the user to choose and rotate to-scale atomic orbitals based on quantum numbers.
Abstract: We report on the implementation of the Spectral Game, a web-based game where players try to match molecules to various forms of interactive spectra including 1D/2D NMR, Mass Spectrometry and Infrared spectra. Each correct selection earns the player one point and play continues until the player supplies an incorrect answer. The game is usually played using a web browser interface, although a version has been developed in the virtual 3D environment of Second Life. Spectra uploaded as Open Data to ChemSpider in JCAMP-DX format are used for the problem sets together with structures extracted from the website. The spectra are displayed using JSpecView, an Open Source spectrum viewing applet which affords zooming and integration. The application of the game to the teaching of proton NMR spectroscopy in an undergraduate organic chemistry class and a 2D Spectrum Viewer are also presented.
Abstract: This review will focus on the current level on chemistry research, education, and visualization possible within the multi-user virtual environment of Second Life. We discuss how Second Life has been used as a platform for the interactive and collaborative visualization of data from molecules and proteins to spectra and experimental data. We then review how these visualizations can be scripted for immersive educational activities and real-life collaborative research. We also discuss the benefits of the social networking affordances of Second Life for both chemists and chemistry students.
Abstract: The frequency spectrum of the Casimir effect between parallel plates is studied. Calculations are performed for both the massless scalar field and the electromagnetic field cases, first using a spectral weight function, and then via the Fourier transform of the renormalized expectation of the Casimir energy-momentum operator. The Casimir force is calculated using the spectrum for two plates which are perfectly transparent in a frequency band. The result of this calculation suggests a way to detect the frequency spectrum of the Casimir effect.
Abstract: This paper presents a mathematical model for basketball free throws. It is intended to be a supplement to an existing calculus course and could easily be used as a basis for a calculus project. Students will learn how to apply calculus to model an interesting real-world problem, from problem identification all the way through to interpretation and verification. Along the way we will introduce topics such as optimization (univariate and multiobjective), numerical methods, and differential equations.
Abstract: The optimal design of a passive, mass-spring-damper device to attenuate the
coning motion of non-rigid, spinning spacecraft under thrust is investigated.
Both one-dimensional and two-dimensional mass motions in a plane that is
perpendicular to the thrust/spin axis are considered. The optimal location of a
one-dimensional device is determined without making use of earlier
approximations on inertia ratio. Necessary and sufficient conditions for
asymptotic stability are presented in terms of fundamental system properties.
These conditions are interpreted in view of the physical system dynamics.
Results are presented from attempts to perform constrained optimization of the
two-dimensional device. Performance is compared with a similar device that is
constrained to one-dimensional motion.
Abstract: The optimal design of a passive, mass-spring-damper device to attenuate the
coning motion of non-rigid, spinning spacecraft under thrust is investigated. In
contrast to earlier studies of this type, the device is capable of two-dimensional
mass motions in a plane that is perpendicular to the thrust/spin axis. Intelligent
choices for device characteristics and location result in significant spacecraft
coning attenuation rates.
The solution to this problem was obtained from studying similar works on the
design of optimal nutation dampers for non-thrusting, spinning spacecraft.
These methods were applied to produce expressions for stiffness and damping
of a near-optimal two-dimensional passive coning attenuator (tuned massspring-
damper) for a symmetric, spinning spacecraft under thrust. An
expression was also obtained for the near-optimal relative stability of such a
device. Exact optimality was not achieved since the suggested result was not
physically realizable A numerical example provides a point of comparison.
Performance is compared with a similar device that is constrained to onedimensional
motion.
Abstract: We compute the expectations of the squares of the electric and magnetic fields in the vacuum region outside a half-space filled with a uniform non-dispersive dielectric. This gives predictions for the Casimir-Polder force on an atom in the `retarded' regime near a dielectric. We also find a positive energy density due to the electromagnetic field. This would lead, in the case of two parallel dielectric half-spaces, to a positive, separation-independent contribution to the energy density, besides the negative, separation-dependent Casimir energy. Rough estimates suggest that for a very wide range of cases, perhaps including all realizable ones, the total energy density between the half-spaces is positive.