Heather A Piwowar - Publications List

Journal articles

2011

DOI
PMID

Heather A Piwowar (2011) Who Shares? Who Doesn't? Factors Associated with Openly Archiving Raw Research Data. PLoS One 6: 7. 07

Abstract: Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn't, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication.Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%-35% in 2007-2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available.First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available.These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let's learn from those with high rates of sharing to embrace the full potential of our research output.

Notes:

DOI
PMID

Heather A Piwowar, Todd J Vision, Michael C Whitlock (2011) Data archiving is a good investment. Nature 473: 7347. May

Abstract:

Notes:

DOI
PMID

Alex Garnett, Louise Whiteley, Heather Piwowar, Edie Rasmussen, Judy Illes (2011) Neuroethics and fMRI: mapping a fledgling relationship. PLoS One 6: 4. 04

Abstract: Human functional magnetic resonance imaging (fMRI) informs the understanding of the neural basis of mental function and is a key domain of ethical enquiry. It raises questions about the practice and implications of research, and reflexively informs ethics through the empirical investigation of moral judgments. It is at the centre of debate surrounding the importance of neuroscience findings for concepts such as personhood and free will, and the extent of their practical consequences. Here, we map the landscape of fMRI and neuroethics, using citation analysis to uncover salient topics. We find that this landscape is sparsely populated: despite previous calls for debate, there are few articles that discuss both fMRI and ethical, legal, or social implications (ELSI), and even fewer direct citations between the two literatures. Recognizing that practical barriers exist to integrating ELSI discussion into the research literature, we argue nonetheless that the ethical challenges of fMRI, and controversy over its conceptual and practical implications, make this essential.

Notes:

DOI

Lucinda A McDade, David R Maddison, Robert Guralnick, Heather A Piwowar, Mary Liz Jameson, Kristofer M Helgen, Patrick S Herendeen, Andrew Hill, Morgan L Vis (2011) Biology Needs a Modern Assessment System for Professional Productivity BioScience 61: 8. 619-625

Abstract: In Conclusion: The authors and endorsers of this essay commit to taking the following steps immediately: Add nontraditional forms of productivity to our CVs, job applications, and tenure and promotion packages. Include software developed; Web site contributions; openly archived data sets; identification of organisms; numbers of specimens collected, curated, identified; digitized or georeferenced historical museum specimen collection records. Include usage data for these contributions where they are available. Count these annually just as we count other more traditional forms of accomplishment. Encourage junior scientists to do the same. Write favorably about alternative forms of productivity in letters of recommendation and letters of evaluation. Speak directly to the importance of a personâ��s curation efforts; collecting activities; and sharing of images, data, and specimens as appropriate. Commend those who make their products available in the most usable formats for people inside and outside the sciences. Value these alternative forms of productivity when we sit on departmental promotion and tenure committees. Strive to make sure that they are included in reporting on the outcome of our committeesâ�� deliberations. Broaden job descriptions so that they better correspond to the ways in which modern systematic biology operates. Cite all forms of research reuse in our publications, including published articles, preprints, blogs, data sets, databases, and software. Attribute the original products (and not just secondary publications that describe them) in the formal citations list. As reviewers and editors, remind colleagues about best-practice attribution practices. As authors, reviewers, and editors, work to achieve the citation of publications in which taxa are described, as well as of revisionary and floristic or faunistic works that enabled the identification of the organisms. Ideally, these should appear in the literature cited along with other contributions that critically underpin the new publication. Formally acknowledge collectors whose efforts support systematics research; include them as coauthors when appropriate. Acknowledge collections from which we borrow material using collections IDs when they are available. If collections data from portals (e.g., GBIF) are used, cite both the portal and the individual collections that provided those data. Seek the implementation of peer-review systems for nontraditional publications and digital resources. Finally, educate administrators about the value of an improved assessment model.

Notes:

2010

Heather A Piwowar, Wendy W Chapman (2010) Recall and bias of retrieving gene expression microarray datasets through PubMed identifiers Journal of Biomedical Discovery and Collaboration [accepted]:

Abstract: Background The ability to locate publicly available gene expression microarray datasets effectively and efficiently facilitates the reuse of these potentially valuable resources. Centralized biomedical databases allow users to query dataset metadata descriptions, but these annotations are often too sparse and diverse to allow complex and accurate queries. In this study we examined the ability of PubMed article identifiers to locate publicly available gene expression microarray datasets, and investigated whether the retrieved datasets were representative of publicly available datasets found through statements of data sharing in the associated research articles. Results In a recent article, Ochsner and colleagues identified 397 studies that had generated gene expression microarray data. Their search of the full text of each publication for statements of data sharing revealed 203 publicly available datasets, including 179 in the Gene Expression Omnibus (GEO) or ArrayExpress databases. Our scripted search of GEO and ArrayExpress for PubMed identifiers of the same 397 studies returned 160 datasets, including six not found by the original search for data sharing statements. As a proportion of datasets found by either method, the search for data sharing statements identified 91.4% of the 209 publicly available datasets, compared to 76.6% found by our search for PubMed identifiers. Searching GEO or ArrayExpress alone retrieved 63.2% and 46.9% of all available datasets, respectively. Studies retrieved through PubMed identifiers were representative of all datasets in terms of research theme, technology, size, and impact, though the recall was highest for datasets published by the highest-impact journals. Conclusions Searching database entries using PubMed identifiers can identify the majority of publicly available datasets. We urge authors of all datasets to complete the citation fields for their dataset submissions once publication details are known, thereby ensuring their work has maximum visibility and can contribute to subsequent studies.

Notes:

DOI

Heather A Piwowar, Wendy W Chapman (2010) Public sharing of research datasets: A pilot study of associations Journal of Informetrics 4: 2. 148-156 April

Abstract: The public sharing of primary research datasets potentially benefits the research community but is not yet common practice. In this pilot study, we analyzed whether data sharing frequency was associated with funder and publisher requirements, journal impact factor, or investigator experience and impact. Across 397 recent biomedical microarray studies, we found investigators were more likely to publicly share their raw dataset when their study was published in a high-impact journal and when the first or last authors had high levels of career experience and impact. We estimate the USA's National Institutes of Health (NIH) data sharing policy applied to 19% of the studies in our cohort; being subject to the NIH data sharing plan requirement was not found to correlate with increased data sharing behavior in multivariate logistic regression analysis. Studies published in journals that required a database submission accession number as a condition of publication were more likely to share their data, but this trend was not statistically significant. These early results will inform our ongoing larger analysis, and hopefully contribute to the development of more effective data sharing initiatives.

Notes: Earlier version presented at ASIS&T and ISSI Pre-Conference: Symposium on Informetrics and Scientometrics 2009. Raw data: http://www.researchremix.org/wordpress/wp-content/uploads/2009/09/Piwowar_Metrics2009_rawdata.csv Statistics file: http://www.researchremix.org/wordpress/wp-content/uploads/2009/09/Piwowar_Metrics2009_statistics.R

2008

DOI
PMID

Heather A Piwowar, Michael J Becich, Howard Bilofsky, Rebecca S Crowley (2008) Towards a data sharing culture: recommendations for leadership from academic health centers. PLoS Medicine 5: 9. e183

Abstract: Sharing biomedical research and health care data is important but difficult. Recognizing this, many initiatives facilitate, fund, request, or require researchers to share their data. These initiatives address the technical aspects of data sharing, but rarely focus on incentives for key stakeholders. Academic health centers (AHCs) have a critical role in enabling, encouraging, and rewarding data sharing. The leaders of medical schools and academic-affiliated hospitals can play a unique role in supporting this transformation of the research enterprise. We propose that AHCs can and should lead the transition towards a culture of biomedical data sharing.

Notes:

2007

DOI
PMID

Heather A Piwowar, Roger S Day, Douglas B Fridsma (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate PLoS ONE 2: 3. e308

Abstract: Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available. We examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression. This correlation between publicly available data and increased literature impact may further motivate investigators to share their detailed research data.

Notes: Presentation slides: http://www.slideshare.net/hpiwowar/presentations; Raw bibliometric data used in the analysis, combining data extracted from Thomson ISI Web of Science, PubMed, the Ntzani and Ioannidis 2003 Lancet paper, and the author's own investigations) are at http://www.researchremix.org/data/PLoSONE2007%20Piwowar%20Data.zip

Conference papers

2009

Heather A Piwowar, Wendy W Chapman (2009) Public Sharing of Research Datasets: A Pilot Study of Associations In: ASIS&T and ISSI Pre-Conference: Symposium on Informetrics and Scientometrics [Published with modifications in Journal of Informetrics, 2010] November 7 2009, Vancouver Canada

Abstract: The public sharing of primary research datasets potentially benefits the research community but is not yet common practice. In this pilot study, we analyzed whether data sharing frequency was associated with funder and publisher requirements, journal impact factor, or investigator experience and impact. Across 397 recent biomedical microarray studies, we found investigators were more likely to publicly share their raw dataset when their study was published in a high-impact journal, when their study was published in a journal with an enforceable data-sharing requirement, and when the first or last authors had high levels of career experience and impact. We estimate the NIH data sharing policy applied to only 19% of the studies in our cohort; being subject to the NIH data sharing plan requirement was not found to correlate with increased data sharing behavior in multivariate logistic regression analysis. Studies published in journals that required a database submission accession number as a condition of publication were more likely to share their data, but this trend was not statistically significant. These early results will inform our ongoing larger analysis, and hopefully contribute to the development of more effective data sharing initiatives.

Notes: Raw data: http://www.researchremix.org/wordpress/wp-content/uploads/2009/09/Piwowar_Metrics2009_rawdata.csv Statistics file: http://www.researchremix.org/wordpress/wp-content/uploads/2009/09/Piwowar_Metrics2009_statistics.R Presentation: http://www.slideshare.net/hpiwowar/metrics2009-piwowar-presentation-20091016key

2008

Heather A Piwowar (2008) Proposed Foundations for Evaluating Data Sharing and Reuse in the Biomedical Literature In: JCDL Doctoral Consortium 2008 Published in the Bulletin of IEEE Technical Committee on Digital Libraries Volume 4, Issue 2

Abstract: Science progresses by building upon previous research. Progress can be most rapid, efficient, and focused when raw datasets from previous studies are available for reuse. To facilitate this practice, funders and journals have begun to request and require that investigators share their primary datasets with other researchers. Unfortunately, it is difficult to evaluate the effectiveness of these policies. This study aims to develop foundations for evaluating data sharing and reuse decisions in the biomedical literature by developing tools to answer the following research questions, within the context of biomedical gene expression datasets: What is the prevalence of biomedical research data sharing? Biomedical research data reuse? What features are most associated with an investigatorâ��s decision to share or reuse a biomedical research dataset? Does sharing or reusing data contribute to the impact of a research article, independently of other factors? What do the results suggest for developing efficient, effective policies, tools, and initiatives for promoting data sharing and reuse? I suggest a novel approach to identifying publications that share and reuse datasets, through the application of natural language processing techniques to the full text of primary research articles. Using these classifications and extracted covariates, univariate and multivariate analysis will assess which features are most important to data sharing and reuse prevalence, and also estimate the contribution that sharing data and reusing data make to a publicationâ��s research impact. I hope the results will inform the development of effective policies and tools to facilitate this important aspect of scientific research and information exchange.

Notes: Presentation slides: http://www.slideshare.net/hpiwowar/presentations

PMID

Heather A Piwowar, Wendy W Chapman (2008) Identifying data sharing in biomedical literature. In: AMIA Annual Symposium 2008 596-600

Abstract: Many policies and projects now encourage investigators to share their raw research data with other scientists. Unfortunately, it is difficult to measure the effectiveness of these initiatives because data can be shared in such a variety of mechanisms and locations. We propose a novel approach to finding shared datasets: using NLP techniques to identify declarations of dataset sharing within the full text of primary research articles. Using regular expression patterns and machine learning algorithms on open access biomedical literature, our system was able to identify 61% of articles with shared datasets with 80% precision. A simpler version of our classifier achieved higher recall (86%), though lower precision (49%). We believe our results demonstrate the feasibility of this approach and hope to inspire further study of dataset retrieval techniques and policy evaluation.

Notes: Presentation slides: http://www.slideshare.net/hpiwowar/presentations

Heather A Piwowar, Wendy W Chapman (2008) Linking database submissions to primary citations with PubMed Central In: BioLINK workshop at ISBM July 18-20 2008, Toronto Canada

Abstract: Background: Dataset submissions are growing exponentially. Links between dataset submissions and primary literature that describe the data collection are useful for many reasons: rich documentation, proper attribution, improved information retrieval, and enhanced text/data integration for analysis. Unfortunately, many database submissions do not include primary citation links, as database submissions are often made prior to publication. We suggest that automated tools can be developed to help identify links between dataset submissions and the primary literature. These tools require full text to differentiate cases of data sharing from data reuse and other contexts. In this study, we explore the possibility that deep analysis of full text may not be necessary, thereby enabling the querying of all reports in PubMed Central. Methods: We trained machine learning tree and rule-based classifiers on full-text open-access article unigram vectors, with the existence of a primary citation link from NCBIâ��s Gene Expression Omnibus (GEO) database submission records as the binary output class. We manually combined and simplified the classifier trees and rules to create a query compatible with the interface for PubMed Central. Results: The query identified 40% of non-OA articles with dataset submission links from GEO (recall), and 65% of the returned articles without dataset submission links were manually judged to include statements of dataset deposit despite having no link from the database (applicable precision). Conclusion: We hope this work inspires future enhancements, and highlights the opportunities for simple full-text queries in PubMed Central given the mandated influx of NIH-funded research reports.

Notes: Presentation slides: http://www.slideshare.net/hpiwowar/presentations Evaluation data from manual curation can be found at http://www.google.com/notebook/public/05528518921351292683/BDQkmSwoQ0IGz6pUj

Heather A Piwowar, Wendy W Chapman (2008) A review of journal policies for sharing research data In: Open Scholarship: Authority, Community, and Sustainability in the Age of Web 2.0 - Proceedings of the 12th International Conference on Electronic Publishing (ELPUB) June 25-27 2008, Toronto Canada

Abstract: Sharing data is a tenet of science, yet commonplace in only a few subdisciplines. Recognizing that a data sharing culture is unlikely to be achieved without policy guidance, some funders and journals have begun to request and require that investigators share their primary datasets with other researchers. The purpose of this study is to understand the current state of data sharing policies within journals, the features of journals that are associated with the strength of their data sharing policies, and whether the strength of data sharing policies impact the observed prevalence of data sharing. Methods: We investigated these relationships with respect to gene expression microarray data in the journals that most often publish studies about this type of data. We measured data sharing prevalence as the proportion of papers with submission links from NCBIâ��s Gene Expression Omnibus (GEO) database. We conducted univariate and linear multivariate regressions to understand the relationship between the strength of data sharing policy and journal impact factor, journal subdiscipline, journal publisher (academic societies vs. commercial), and publishing model (open vs. closed access). Results: Of the 70 journal policies, 53 made some mention of sharing publication-related data within their Instruction to Author statements. Of the 40 policies with a data sharing policy applicable to gene expression microarrays, we classified 17 as weak and 23 as strong (strong policies required an accession number from database submission prior to publication). Existence of a data sharing policy was associated with the type of journal publisher: 46% of commercial journals had data sharing policy, compared to 82% of journals published by an academic society. All five of the openaccess journals had a data sharing policy. Policy strength was associated with impact factor: the journals with no data sharing policy, a weak policy, and a strong policy had respective median impact factors of 3.6, 4.9, and 6.2. Policy strength was positively associated with measured data sharing submission into the GEO database: the journals with no data sharing policy, a weak policy, and a strong policy had median data sharing prevalence of 8%, 20%, and 25%, respectively. Conclusion: This review and analysis begins to quantify the relationship between journal policies and data sharing outcomes. We hope it contributes to assessing the incentives and initiatives designed to facilitate widespread, responsible, effective data sharing.

Notes: Presentation slides: http://www.slideshare.net/hpiwowar/presentations Archived Instructions to Authors statements are at http://www.researchremix.org /data/ELPUB2008%20Piwowar%20InstructionsForAuthors.zip Data is at http://www.researchremix.org/data/ELPUB2008%20Piwowar%20Data.zip Statistical analysis code (r script) is at http://www.researchremix.org/data/ELPUB2008%20Piwowar%20Stats.r

Posters

2009

Heather A Piwowar, Wendy W Chapman (2009) Using open access literature to guide full-text query formulation DBMI training retreat 2009, Pittsburgh PA [Posters]

Abstract: Literature searches, systematic reviews, and text mining require identifying articles based on full-text content. The full text of published biomedical articles contain valuable information not found in abstracts or MeSH terms. Full-text literature is increasingly available for query. PubMed Central, Highwire Press and Google Scholar are growing fast, thanks to the NIH public access mandate. However, it is difficult to formulate effective full-text queries manually. Prose and identifiers have large variation, and full-text portals are not designed for query evaluation. Current full text retrieval research does not address this problem. Cutting-edge systems developed for information retrieval and extraction require complete computational access to a full-text corpora for preprocessing: publisher licenses rarely allow this. We propose using open access literature to formulate queries for use in full-text portals. We can use open access articles to identify synonyms and lexical variants, tune performance, and generate queries compatible with full-text portal query languages.

Notes:

2008

PMID

Heather A Piwowar, Wendy W Chapman (2008) Envisioning a Biomedical Data Reuse Registry AMIA Annual Symposium 2008, Washington DC [Posters]

Abstract: Repurposing research data holds many benefits for the advancement of biomedicine, yet is very difficult to measure and evaluate. We propose a data reuse registry to maintain links between primary research datasets and studies that reuse this data. Such a resource could help recognize investigators whose work is reused, illuminate aspects of reusability, and evaluate policies designed to encourage data sharing and reuse.

Notes:

Heather A Piwowar, Wendy W Chapman (2008) Prevalence and Patterns of Microarray Data Sharing PSB 2008, Hawaii [Posters]

Abstract: Sharing research data is a cornerstone of science. Although many tools and policies exist to encourage data sharing, the prevalence with which datasets are shared is not well understood. We report our preliminary results on patterns of sharing microarray data in public databases.

Notes: Data and calculations: http://www.researchremix.org/data/PSB2008%20Piwowar%20Data.zip

2007

Heather A Piwowar, Douglas B Fridsma (2007) Examining the uses of shared data ISMB 2007, Vienna Austria [Posters]

Abstract: Although not all research topics can be addressed by re-using existing data, many can. Identifying areas with frequent re-use can highlight best practices to be used when developing research agendas, tools, standards, repositories, and communities in areas which have yet to receive major benefits from shared data.

Notes: Data and calculations: http://www.researchremix.org/data/ISMB2007%20Piwowar%20Data.zip

Invited Presentations

2010

2009

Heather A Piwowar (2009) Proposal: Foundational studies for measuring the impact, prevalence, and patterns of publicly sharing biomedical research data ASIS&T Annual Meeting 2009, Student Award Presentations [Invited Presentations]

Abstract: Presented at ASIS&T 2009 in the student awards section. The presentation contains an overview of my dissertation proposal, as 2009 winner of the Thomson Reuters Information Science Doctoral Dissertation Proposal Scholarship, administered by the ASIS&T Information Science Education Committee.

Notes:

Heather A Piwowar (2009) Measuring the adoption of Open Science Open Science workshop at PSB 2009, Hawaii [Invited Presentations]

Abstract: Why measure the adoption of Open Science? As we seek to embrace and encourage participation in open science, understanding patterns of adoption will allow us to make informed decisions about tools, policies, and best practices. Measuring adoption over time will allow us to note progress and identify opportunities to learn and improve. It is also just plain interesting to see where we are, where we arenâ��t, and where we might go! What can we measure? Many attributes of open science can be studied, including open access publications, open source code, open protocols, open proposals, open peer-review, open notebook science, open preprints, open licenses, open data, and the publishing of negative results. This presentation will focus on measuring the prevalence with which investigators share their research datasets. What measurements have been done? How? What have we learned? Various methods have been used to assess adoption of open science: reviews of policies and mandates, case studies of experiences, surveys of investigators, and analyses of demonstrated data sharing behavior. Weâ��ll briefly summarize key results. Future research? The presentation will conclude by highlighting future research areas for enhancing and applying our understanding of open data adoption.

Notes:

2008

Heather A Piwowar (2008) Why study Data Sharing? (+ why share your data) DBMI Colloquium, University of Pittsburgh [Invited Presentations]

Abstract: A presentation to the DBMI department at the University of Pittsburgh about data sharing and reuse: what this means, why it is important, some of what weâ��ve learned, and what we still donâ��t know.

Notes:

2007

Heather A Piwowar (2007) Sharing Detailed Research Data is Associated with Increased Citation Rate NLM Trainee Conference, Stanford University. [Invited Presentations]

Abstract: Heather A Piwowar, Roger S Day, Douglas B Fridsma (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate PLoS ONE 2: 3. e308 Abstract: Sharing research data provides benefit to the general scientific community, but the benefit is less obvious for the investigator who makes his or her data available. We examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data. The 48% of trials with publicly available microarray data received 85% of the aggregate citations. Publicly available data was significantly (p = 0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin using linear regression. This correlation between publicly available data and increased literature impact may further motivate investigators to share their detailed research data

Notes:

Working Papers

2010

Heather A Piwowar, Wendy W Chapman (2010) Using open access literature to guide full-text query formulation Available from Nature Precedings, http://hdl.handle.net/10101/npre.2010.4267.2 [Working Papers]

Abstract: Background Much scientific knowledge is contained in the details of the full-text biomedical literature. Most research in automated retrieval presupposes that the target literature can be downloaded and preprocessed prior to query. Unfortunately, this is not a practical or maintainable option for most users due to licensing restrictions, website terms of use, and sheer volume. Scientific article full-text is increasingly queriable through portals such as PubMed Central, Highwire Press, Scirus, and Google Scholar. However, because these portals only support very basic Boolean queries and full text is so expressive, formulating an effective query is a difficult task for users. We propose improving the formulation of full-text queries by using the open access literature as a proxy for the literature to be searched. We evaluated the feasibility of this approach by building a high-precision query for identifying studies that perform gene expression microarray experiments. Methodology and Results We built decision rules from unigram and bigram features of the open access literature. Minor syntax modifications were needed to translate the decision rules into the query languages of PubMed Central, Highwire Press, and Google Scholar. We mapped all retrieval results to PubMed identifiers and considered our query results as the union of retrieved articles across all portals. Compared to our reference standard, the derived full-text query found 56% (95% confidence interval, 52% to 61%) of intended studies, and 90% (86% to 93%) of studies identified by the full-text search met the reference standard criteria. Due to this relatively high precision, the derived query was better suited to the intended application than alternative baseline MeSH queries. Significance Using open access literature to develop queries for full-text portals is an open, flexible, and effective method for retrieval of biomedical literature articles based on article full-text. We hope our approach will raise awareness of the constraints and opportunities in mainstream full-text information retrieval and provide a useful tool for todayâ��s researchers.

Notes: See Version section at bottom of Nature Precedings page for the most recent version: http://hdl.handle.net/10101/npre.2010.4267.2

Alexander Garnett, Heather A Piwowar, Edie M Rasmussen, Judy Illes (2010) Formulating MEDLINE queries for article retrieval based on PubMed exemplars Available from Nature Precedings, http://hdl.handle.net/10101/npre.2010.4270.1 [Working Papers]

Abstract: Bibliographic search engines allow endless possibilities for building queries based on specific words or phrases in article titles and abstracts, indexing terms, and other attributes. Unfortunately, deciding which attributes to use in a methodologically sound query is a non-trivial process. In this paper, we describe a system to help with this task, given an example set of PubMed articles to retrieve and a corresponding set of articles to exclude. The system provides the users with unigram and bigram features from the title, abstract, MeSH terms, and MeSH qualifier terms in decreasing order of precision, given a recall threshold. From this information and their knowledge of the domain, users can formulate a query and evaluate its performance. We apply the system to the task of distinguishing original research articles of functional magnetic resonance imaging (fMRI) of sensorimotor function from fMRI studies of higher cognitive functions.

Notes: Check Version section at Nature Precedings for the most recent version: http://hdl.handle.net/10101/npre.2010.4270.1

2009

Heather A Piwowar, Wendy W Chapman (2009) pubmedi: A free, open, flexible, scriptable approach to estimating citation indices for aggregate analysis within biomedicine [Working Papers]

Abstract: When looking for citation indices within biomedicine for aggreggate analysis, we suggest calculating them using data from PubMed, PubMed Central, and the Author-ity name disambiguation engine. We call this the "pubmedi" approach, and explore it further here.

Notes:

2008

Heather A Piwowar (2008) Generalizability coefficient for Mechanical Turk annotations ResearchRemix blog on December 29, 2008 [Working Papers]

Abstract: We conducted a pilot annotation study with Amazonâ��s Mechanical Turk to estimate the accuracy with which annotation tasks can be performed by this group of non-experts, the number of independent annotations necessary to get sufficient generalizability, and the cost of annotation.

Notes:

Masters theses

1996

Heather A Campbell (1996) Simulation of quadrature amplitude demodulation in a digital telemetry system Thesis (Masters of Engineering). MIT, Deptartment of Electrical Engineering and Computer Science.

Abstract: The realization of a new wireline acquisition front end has made it possible for Schlumberger to redesign its uphole telemetry receiver. In order to achieve data rates of 500 kbits/second over standard oil well logging cables, the Digital Telemetry System uses Quatrature Amplitude Modulation (QAM) to transmit its measurement data. The demodulator involves timing recovery, filtering, cable equalization, and symbol decoding. The purpose of this thesis is to simulate the demodulator, thereby documenting the demodulation process and creating a design tool that can be used to design future QAM telemetry systems.

Notes: Completed in fulfillment of a Masters of Engineering degree, on co-op assignment at Schlumberger Austin Research Center.