Abstract: Functional data analysis (FDA) is a relatively new branch of statistics devoted to describing and modelling data that are complete functions. Many relevant aspects of musical performance and perception can be understood and quantified as dynamic processes evolving as functions of time. In this paper, we show that FDA is a statistical methodology well suited for research into the field of quantitative musical performance analysis. To demonstrate this suitability, we consider tempo data for 28 performances of Schumann's Traumerei and analyse them by means of functional principal component analysis (one of the most powerful descriptive tools included in FDA). Specifically, we investigate the commonalities and differences between different performances regarding (expressive) timing, and we cluster similar performances together. We conclude that musical data considered as functional data reveal performance structures that might otherwise go unnoticed.
Abstract: We propose new dependence measures for two real random variables not necessarily linearly related. Covariance and linear correlation are expressed in terms of principal components and are generalized for variables distributed along a curve. Properties of these measures are discussed. The new measures are estimated using principal curves and are computed for simulated and real data sets. Finally, we present several statistical applications for the new dependence measures.
Abstract: We consider the problem of nonparametrically predicting a scalar response variable y from a functional predictor X. We have n observations (X-i, y(i)). We assign a weight w(i) = K (d(X, X-i)/h) to each X-i, where d is a semi-metric, K is a kernel function and h, is the bandwidth. Then we fit a Weighted (Linear) Distance-Based Regression, where the weights are as above and the distances are given by a possibly different semi-metric.
Abstract: We propose a methodology to carry out spatial prediction when measured data are curves. Our approach is based on both the kriging predictor and the functional linear point-wise model theory. The spatial prediction of an unobserved curve is obtained as a linear combination of observed functions. We employ a solution based on basis function to estimate the functional parameters. A real data set is used to illustrate the proposals.
Abstract: Three methods of estimation, namely maximum likelihood, moments and L-moments, when data come from an asymmetric exponential power distribution are considered. This is a very flexible four-parameter family exhibiting variety of tail and shape behaviours. The analytical expression of the first four L-moments of these distributions are derived, allowing for the use of L-moments estimators. A simulation study compares the three estimation methods in small samples. (C) 2007 Elsevier B.V. All rights reserved.
Abstract: Principal curves are smooth parametric curves passing through the "middle" of a non-elliptical multivariate data set. We model the probability distribution of this kind of data as a mixture of simple nonlinear models and use MCMC techniques to fit the mixture model.
Abstract: This paper deals with the k-sample problem for functional data when the observations are density functions. We introduce test procedures based on distances between pairs of density functions (L (1) distance and Hellinger distance, among others). A simulation study is carried out to compare the practical behaviour of the proposed tests. Theoretical derivations have been done in order to allow weighted samples in the test procedures. The paper ends with a real data example: for a collection of European regions we estimate the regional relative income densities and then we test the significance of the country effect.
Abstract: Motivation: This application aims at assisting researchers with the extraction of significant medical and biological knowledge from data sets with complex relationships among their variables. Results: Non-hypothesis-driven approaches like Principal Curves of Oriented Points (PCOP) are a very suitable method for this objective. PCOP allows for obtaining of a representative pattern from a huge quantity of data of independent variables in a very flexible and direct way. A web server has been designed to automatically realize 'non-linear pattern' analysis, 'hidden-variable-dependent' clustering, and new samples 'local-dispersion-dependent' classification from the data involving new statistical techniques using the PCOP calculus. The tools facilitate the managing, comparison and visualization of results in a user-friendly graphical interface. Availability: http://ibb.uab.es/revresearch(C) 2007 Elsevier Ltd. All rights reserved.
Abstract: Two existing density estimators based on local likelihood have properties that are comparable to those of local likelihood regression but they are much less used than their counterparts in regression. We consider truncation as a natural way of localising parametric density estimation. Based on this idea, a third local likelihood density estimator is introduced. Our main result establishes that the three estimators coincide when a free multiplicative constant is used as an extra local parameter.
Abstract: All electoral systems have an electoral formula that converts proportions of votes into Parliamentary seats. Pre-electoral polls usually focus on estimating proportions of votes and then apply the electoral formula to give a forecast of Parliamentary composition. We describe the problems that arise from this approach: there will typically be a bias in the forecast. We study the origin of the bias and some methods for evaluating and reducing it. We propose a bootstrap algorithm for computing confidence intervals for the allocation of seats. We show, by Monte Carlo simulation, the performance of the proposed methods using data from Spanish elections in previous years. We also propose graphical methods for visualizing how electoral formulae and Parliamentary forecasts work (or fail).
Abstract: Random coefficient regression models have been applied in different fields during recent years and they are a unifying frame for many statistical models. Recently, Beran and Hall (Ann. Statist. 20 (1992) 1970) raised the question of the nonparametric study of the coefficients distribution. Nonparametric goodness-of-fit tests were considered in Delicado and Romo (Ann. Inst. Statist. Math. 51 (1999) 125). In this nonparametric framework, the study of parametric families for the coefficient distributions was started by Beran (Ann. Inst. Statist. Math. (1993) 639). Here we propose statistics for parametric goodness-of-fit tests and we obtain their asymptotic distributions. Moreover, we construct bootstrap approximations to these distributions, proving their validity. Finally, a simulation study illustrates our results. (C) 2002 Elsevier B.V. All rights reserved.
Abstract: We introduce nonparametric density estimators that generalize the classical histogram and frequency polygon. The new estimators are expressed as linear combinations of density functions that are piecewise polynomials, where the coefficients are optimally chosen in order to minimize an approximate version of the integrated square error of the estimator. We establish the asymptotic behaviour of the proposed estimators, and study their performance in a simulation study.
Abstract: Principal curves where introduced by Hastie & Stuetzle (1989) as smooth parametric curves passing through the middle of a multidimensional data set. Delicado (2001) defines Principal Curves of Oriented Points, based on the fixed points of a function from IRP into itself This definition is nonparametric and smoothing methods are used to find principal curves of a data set. Here we extend this work in two directions. First, we propose a bandwidth choice method based on the Minimum Spanning Tree of the data set. Second, we present an object oriented application that implements the principal curves computation for any dimension in a flexible recursive way. Examples on synthetic and real data are included.
Abstract: This paper shows that increasing block rate pricing schedules usually applied by water utilities can reduce the efficiency and equity levels. To do this, we first present a two step method to estimate the demand and to recover the distribution of consumer tastes when increasing block rate pricing is used. We show that in this case the tariff induces a pooling equilibrium and customers with different taste parameters will be observed to choose the same consumption level. Second, we show that a two-part tariff that neither reduces the revenue for the firm nor increases the aggregate level of water consumption increases the welfare and equity levels in relation to an increasing block rates schedule.
Abstract: In the context of hypothesis testing simulation studies, this paper advocates using graphical and numerical tools to summarize the results, beyond the conventional practice of just reporting empirical levels. Some of these tools are defined by us. They are mainly based on the computation of distances between the empirical distribution function of the p-values derived from the simulation experiment and the distribution function of the U([0, 1 ]) random variable. Their null distribution is tabulated. A joint study of several distances reveals that important aspects of a test can pass unnoticed if only empirical significance levels are calculated. The proposed tools are applied to two practical examples, demonstrating their usefulness in the discrimination between alternative test procedures and also in the detection of data not according to the null hypothesis.
Abstract: Principal curves have been defined as smooth curves passing through the "middle" of a multidimensional data set. They are nonlinear generalizations of the first principal component. a characterization of which is the basis of the definition of principal curves. We establish a new characterization of the first principal component and base our new definition of a principal curve on this property. We introduce the notion of principal oriented points and we prove the existence of principal curves passing through these points. We extend the definition of principal curves to multivariate data sets and propose an algorithm to find them. The new notions lead us to generalize the definition of total variance. Successive principal curves are recursively defined from this generalization. The new methods are illustrated on simulated and real data sets. (C) 2001 Academic Press.
Abstract: This paper presents a comparative analysis of linear and mixed models for short-term forecasting of a. real data series with a high percentage of missing data. Data are the series of significant wave heights registered at regular periods of three hours by a buoy placed in the Bay of Biscay. The series is interpolated with a linear predictor which minimizes the forecast mean square error. The linear models are seasonal ARIMA models and the mixed models have a linear component and a non-linear seasonal component. The non-linear component is estimated by a non-parametric regression of data versus time. Short-term forecasts, no more than two days ahead, are of interest because they can be used by the port authorities to notify the fleet. Several models are fitted and compared by their forecasting behaviour. Copyright (C) 1999 John Wiley & Sons, Ltd.
Abstract: Random coefficient regressions have been applied in a wide range of fields, from biology to economics, and constitute a common frame for several important statistical models. A nonparametric approach to inference in random coefficient models was initiated by Beran and Hall. In this paper we introduce and study goodness of fit tests for the coefficient distributions; their asymptotic behavior under the null hypothesis is obtained. We also propose bootstrap resampling strategies to approach these distributions and prove their asymptotic validity using results by Gine and Zinn on bootstrap empirical processes. A simulation study illustrates the properties of these tests.
Abstract: OBJECTIVE. To compare the performance of two predictive radiologic models, logistic regression (LR) and neural network (NN), with five different resampling methods. METHODS. One hundred sixty-seven patients with proven calvarial lesions as the only known disease were enrolled. Clinical and CT data were used for LR and NN models, Both models were developed with cross-validation, leave-one-out, and three different bootstrap algorithms. The final results of each model were compared with error rate and the area under receiver operating characteristic curves (A(z)). RESULTS. The NN obtained statistically higher A(z) values than LR with cross-validation. The remaining resampling validation methods did not reveal statistically significant differences between LR and MV rules, CONCLUSION. The NN classifier performs better than the one based on LR. This advantage is well detected by three-fold cross-validation but remains unnoticed when leave-one-out or bootstrap algorithms are used.
Abstract: In the fixed design regression model, additional weights are considered for the Nadaraya-Watson and Gasser-Muller kernel estimators. We study their asymptotic behavior and the relationships between new and classical estimators. For a simple family of weights, and considering the AIMSE as global loss criterion, we show some possible theoretical advantages. An empirical study illustrates the performance of the weighted kernel estimators in theoretical ideal situations and in simulated data sets. Also some results concerning the use of weights for local polynomial estimators are given.
Abstract: We discuss the use of bootstrap methodology in hypothesis testing, focusing on the classical F-test for linear hypotheses in the linear model. A modification of the F-statistics which allows for resampling under the null hypothesis is proposed. This approach is specifically considered in the one-way analysis of variance model. A simulation study illustrating the behaviour of our proposal is presented.