hosted by
publicationslist.org
    

Carlos A Coelho


cmac@fct.unl.pt

Books

2010
Carlos A Coelho, João T Mexia (2010)  Product and Ratio of Geeralized Gamma-Ratio Random Variables : Exact and Near-exact distributions - Applications   Lambert Academic Publishing  
Abstract: Products of ratios of independent Gamma random variables (r.v.'s) are relevant in many tests of hypotheses. Obtaining explicit manageable expressions for their p.d.f. and c.d.f. is a challenging problem. In this monograph we take this challenge. The book tries to illustrate the use of several techniques, exhibiting a balanced blend between theory and a good number of examples. A large number of graphs and tables illustrate several particular aspects of the distributions being studied. Besides the exact distribution, we also consider near-exact ones, obtained through a new concept of approximation of the characteristic function. Computational modules are provided to implement all distributions developed. The approach followed enabled an easy extension to the non-central case and to negative power parameters, greatly widening the domain of application of the results obtained. As particular immediate cases we have the distribution of products and ratios of many known distributions, among which folded T, folded Cauchy, Beta prime or Beta second kind and, of course, F r.v.'s. The book is intended for an audience at the graduate or post-graduate level, with focus on Distribution Theory.
Notes:

Journal articles

2012
Filipe J Marques, Carlos A Coelho (2012)  The block sphericity test -- exact and near-exact distributions for the likelihood ratio test statistic   Mathematical Methods in the Applied Sciences  
Abstract: Using a suitable decomposition of the null hypothesis of the sphericity test for several blocks of variables, into a sequence of conditionally independent null hypotheses, we show that it is possible to obtain the expressions for the likelihood ratio test statistic, for its hth null moment, and for the characteristic function of its logarithm. The exact distribution of the logarithm of the likelihood ratio test statistic is obtained in the form of a sum of a generalized integer gamma distribution with the sum of a given number of independent logbeta distributions, taking the form of a single generalized integer gamma distribution when each set of variables has two variables. The development of near-exact distributions arises, from the previous decomposition of the null hypothesis and from the consequent-induced factorization of the characteristic function, as a natural and practical way to approximate the exact distribution of the test statistic. A measure based on the exact and approximating characteristic functions, which gives an upper bound on the distance between the corresponding distribution functions, is used to assess the quality of the near-exact distributions proposed and to compare them with an asymptotic approximation on the basis of Box's method.
Notes:
Carlos A Coelho, Filipe J Marques (2012)  Near-exact distributions for the likelihood ratio test statistic to test equality of several variance-covariance matrices   Computational Statistics  
Abstract: The exact distribution of the likelihood ratio test statistic to test the equality of several variance-covariance matrices has a non-manageable form.Onthe other hand, the existing asymptotic approximations do not exhibit the necessary precision formany applications. For these reasons, the development of near-exact approximations to the distribution of this statistic, arising from a different method of approximating distributions, emerges as a desirable goal. These distributions, while being manageable are much closer to the exact distribution than the usual asymptotic distributions and opposite to these, are also asymptotic for increasing number of variables and matrices involved. Computational modules to implement the near-exact distributions are made available on a web-site.
Notes:
Barry C Arnold, Carlos A Coelho, Filipe J Marques (2012)  The distribution of the product of powers of independent Uniform random variables   Journal of Multivariate Analysis  
Abstract: What is the distribution of the product of given powers of independent uniform (0, 1) random variables? Is this distribution useful? Is this distribution commonly used in some contexts? Is this distribution somehow related to the distribution of the product of other random variables? Are there some test statistics with this distribution? This paper will give the answers to the above questions. It will be seen that the answer to the last four questions above is: yes! We will show how particular choices of the numbers of variables involved and their powers will result in interesting and useful distributions and how these distributions may help us to shed some new light on some well-known distributions and also how it may help us to address, in a much simpler way, some distributions usually considered to be rather complicated as is the case with the exact distribution of a number of statistics used in Multivariate Analysis, including some whose exact distribution up until now is not available in a concise and manageable form.
Notes:
Luís M Grilo, Carlos A Coelho (2012)  A family of near-exact distributions based on truncations of the exact distribution for the generalized Wilks Lambda statistic   Communications in Statistics - Theory and Methods (accepted for publication)  
Abstract: For the case where at least two sets have an odd number of variables we do not have the exact distribution of the generalized Wilks Lambda statistic in a manageable form, adequate for manipulation. We develop in this paper a family of very accurate near-exact distributions for this statistic for the case where two or three sets have an odd number of variables. We first express the exact characteristic function of the logarithm of the statistic in the form of the characteristic function of an infinite mixture of Generalized Integer Gamma distributions. Then, based on truncations of this exact characteristic function, we obtain a family of near-exact distributions, which, by construction, match the first two exact moments. These near-exact distributions display an asymptotic behaviour for increasing number of variables involved. The corresponding cumulative distribution functions are obtained in a concise and manageable form, relatively easy to implement computationally, allowing for the computation of virtually exact quantiles. We undertake a comparative study for small sample sizes, using two proximity measures based on the Berry-Esseen bounds, to assess the performance of the near-exact distributions for different numbers of sets of variables and different numbers of variables in each set.
Notes:
2011
Carlos A Coelho, Filipe J Marques (2011)  On the exact, asymptotic and near-exact distributions for the likelihood ratio statistics to test equality of several Exponential distributions   AIP Conference PRoceedings 1389: 1471-1474  
Abstract: The distribution of the likelihood ratio test statistic to test the equality of several one or twoâparameter Exponential distributions either for censored or nonâcensored samples has been studied by several authors. These statistics are of interest in many areas, namely in reliability and lifetime studies. We propose several nearâexact distributions for these statistics, which provide very accurate but yet very manageable approximations to the exact distribution, much adequate for practical purposes.
Notes:
Filipe J Marques, Carlos A Coelho, Barry C Arnold (2011)  A general near-exact distribution theory for the most common likelihood ratio test statistics used in Multivariate Analysis   TEST 20: 1. 180-203 MAY  
Abstract: In this paper we first show how the exact distributions of the most common likelihood ratio test (l.r.t.) statistics, that is, the ones used to test the independence of several sets of variables, the equality of several variance-covariance matrices, sphericity, and the equality of several mean vectors, may be expressed as the distribution of the product of independent Beta random variables or the product of a given number of independent random variables whose logarithm has a Gamma distribution times a given number of independent Beta random variables. Then, we show how near-exact distributions for the logarithms of these statistics may be expressed as Generalized Near-Integer Gamma distributions or mixtures of these distributions, whose rate parameters associated with the integer shape parameters, for samples of size n, all have the form (n-j)/n for j=2,aEuro broken vertical bar,p, where for three of the statistics, p is the number of variables involved, while for the other one, it is the sum of the number of variables involved and the number of mean vectors being tested. What is interesting is that the similarities exhibited by these statistics are even more striking in terms of near-exact distributions than in terms of exact distributions. Moreover all the l.r.t. statistics that may be built as products of these basic statistics also inherit a similar structure for their near-exact distributions. To illustrate this fact, an application is made to the l.r.t. statistic to test the equality of several multivariate Normal distributions.
Notes:
Filipe J Marques, Carlos A Coelho (2011)  The multi-sample block-matrix sphericity test   AIP Conference Proceedings 1389: 1479-1482  
Abstract: The multiâsample blockâmatrix sphericity test and its particular cases have wide applications in several areas. However, the practical implementation of this test has been hindered by difficulties in handling the exact distribution of the associated statistic and the nonâavailability in the literature of asymptotic distributions. We use a decomposition of the null hypothesis into three null hypotheses to obtain very wellâfit and highly manageable nearâexact distributions for the likelihood ratio test statistic of this test and its particular cases. These distributions will allow for the easy computation of wellâfit nearâexact quantiles and pâvalues.
Notes:
2010
Carlos A Coelho, Filipe J Marques (2010)  Near-exact distributions for the independence and sphericity likelihood ratio test statistics   JOURNAL OF MULTIVARIATE ANALYSIS 101: 3. 583-593 MAR  
Abstract: In this paper we show how, based on a decomposition of the likelihood ratio test for sphericity into two independent tests and a suitably developed decomposition of the characteristic function of the logarithm of the likelihood ratio test statistic to test independence in a set of variates, we may obtain extremely well-fitting near-exact distributions for both test statistics. Since both test statistics have the distribution of the product of independent Beta random variables, it is possible to obtain near-exact distributions for both statistics in the form of Generalized Near-integer Gamma distributions or mixtures of these distributions. For the independence test statistic, numerical studies and comparisons with asymptotic distributions proposed by other authors show the extremely high accuracy of the near-exact distributions developed as approximations to the exact distribution. Concerning the sphericity test statistic, comparisons with formerly developed near-exact distributions show the advantages of these new near-exact distributions. (C) 2009 Elsevier Inc. All rights reserved.
Notes:
Carlos A Coelho, Barry C Arnold, Filipe J Marques (2010)  Near-exact distributions for certain likelihood ratio test statistics   J. Stat. Theory Pract. 4: 4. 711-725  
Abstract: In this paper we will show how, using an expansion of a Logbeta distribution as an in nite mixture of Gamma distributions we are able to obtain near-exact distributions for the negative logarithm of the l.r.t. (likelihood ratio test) statistics used in Multivariate Analysis to test the independence of several sets of variables, the equality of several mean vectors, sphericity and the equality of several variance-covariance matrices which will match as many of the exact moments as we wish and for which we will be able to have an a priori upper-bound for the diference between their c.d.f. and the exact c.d.f.. These near-exact distributions also display very good performance, with an agreement with the exact distribution which may virtually be taken as far as we wish and which it is not possible to obtain with the usual asymptotic distributions. Furthermore, based on the results presented it will be easy to build near-exact distributions for any l.r.t. statistics which may be built as the product of the above l.r.t. statistics.
Notes:
Luís M Grilo, Carlos A Coelho (2010)  Near-exact distributions for the generalized Wilks Lambda statistic   Discuss. Math. Probab. Stat. 30: 1. 53-86  
Abstract: Two near-exact distributions for the generalized Wilks Lambda statistic, used to test the independence of several sets of variables with a multivariate normal distribution, are developed for the case where two or more of these sets have an odd number of variables. Using the concept of near-exact distribution and based on a factorization of the exact characteristic function we obtain two approximations, which are very close to the exact distribution but far more manageable. These near-exact distributions equate, by construction, some of the first exact moments and correspond to cumulative distribution functions which are practical to use, allowing for an easy computation of quantiles. We also develop three asymptotic distributions which also equate some of the first exact moments. We assess the proximity of the asymptotic and near-exact distributions obtained to the exact distribution using two measures based on the Berry-Esseen bounds. In our comparative numerical study we consider different numbers of sets of variables, different numbers of variables per set and different sample sizes.
Notes:
Filipe J Marques, Carlos A Coelho (2010)  The exact and near-exact distributions of the likelihood ratio statistic for the block sphericity test   AIP Conference Proceedings 1281: 1237-1240  
Abstract: Using a suitable decomposition of the null hypothesis of the test of sphericity for k blocks of p_i variables, into a sequence of conditionally independent null hypotheses we show that it is possible to obtain the expression of the likelihood ratio test statistic, the expression for the hâth null moment and the characteristic function of the logarithm of the likelihood ratio test statistic. The exact distribution of the logarithm of the likelihood ratio test statistic is then obtained as the distribution of the sum of a Generalized Integer Gamma random variable (r.v.) with the sum of a number of independent Logbeta r.v.âs. This distribution takes the form of a single Generalized Integer Gamma distribution when each set of variables has two variables. In the general case, the development of nearâexact distributions arises, from the previous decomposition of the null hypothesis and the consequent induced factorization on the characteristic function, as a natural and practical way to approximate the exact distribution of the test statistic. A measure based on the exact and approximating characteristic functions, which gives an upper bound on the distance between the corresponding distribution functions, is used to assess the quality of the nearâexact distributions proposed and to compare them with an asymptotic approximation based on Boxâs method.
Notes:
Luís M Grilo, Carlos A Coelho (2010)  The exact and near-exact distributions for the Wilks Lambda statistic used in the test of independence of two sets of variables   Amer. J. Math. Management Sci. 30: 1,2. 111-146  
Abstract: We develop the exact distribution of the Wilks Lambda statistic to test the independence of two sets of variables, both with an odd number of variables, under the form of an infinite mixture of Generalized Integer Gamma distributions. Based on truncations of the exact characteristic function, for the product of independent Beta random variables, we obtain near-exact distributions for such product and then by direct application of these results, and once again based on truncations, we develop near-exact distributions for the Wilks Lambda statistic. These nearâexact distributions are finite mixtures of Generalized Integer Gamma and Generalized NearâInteger Gamma distributions. By construction, the two first moments of these approximations are equal to the exact moments. These distributions are manageable and relatively easy to implement computationally, allowing for the computation of near-exact quantiles which may indeed be regarded as virtually exact, given the good convergence properties of the series involved, mainly when the difference between the sample size and the overall number of variables involved is rather small. We assess the proximity between these near-exact distributions and the exact distribution by using two measures based on the BerryâEsseen bounds.
Notes:
2009
Carlos A Coelho, Filipe J Marques (2009)  The advantage of decomposing elaborate hypotheses on covariance matrices into conditionally independent hypotheses in building near-exact distributions for the test statistics   LINEAR ALGEBRA AND ITS APPLICATIONS 430: 10. 2592-2606  
Abstract: The aim of this paper is to show how the decomposition of elaborate hypotheses on the structure of covariance matrices into conditionally independent simpler hypotheses, by inducing the factorization of the overall test statistic into a product of several independent simpler test statistics, may be used to obtain near-exact distributions for the overall test statistics, even in situations where asymptotic distributions are not available in the literature and adequately fit ones are not easy to obtain. (C) 2008 Elsevier Inc. All rights reserved.
Notes: 16th International Workshop on Matrices and Statistics, Windsor, CANADA, 2007
2008
Filipe J Marques, Carlos A Coelho (2008)  Near-exact distributions for the sphericity likelihood ratio test statistic   JOURNAL OF STATISTICAL PLANNING AND INFERENCE 138: 3. 726-741 MAR 1  
Abstract: In this paper three near-exact distributions are developed for the sphericity test statistic. The exact probability density function of this statistic is usually represented through the use of the Meijer G function, which renders the computation of quantiles impossible even for a moderately large number of variables. The main purpose of this paper is to obtain near-exact distributions that lie closer to the exact distribution than the asymptotic distributions while, at the same time, correspond to density and cumulative distribution functions practical to use, allowing for an easy determination of quantiles. In addition to this, two asymptotic distributions that lie closer to the exact distribution than the existing ones were developed. Two measures are considered to evaluate the proximity between the exact and the asymptotic and near-exact distributions developed. As a reference we use the saddlepoint approximations developed by Butler et al. [ 1993. Saddlepoint approximations for tests of block independence, sphericity and equal variances and covariances. J. Roy. Statist. Soc., Ser. B 55, 171-183] as well as the asymptotic distribution proposed by Box. (C) 2007 Elsevier B.V. All rights reserved.
Notes:
Elsa E Moreira, Carlos A Coelho, Ana A Paulo, Luis S Pereira, Joao T Mexia (2008)  SPI-based drought category prediction using loglinear models   JOURNAL OF HYDROLOGY 354: 1-4. 116-130 JUN 15  
Abstract: Loglinear modeling for three-dimensional contingency tables was used with data from 14 rainfall stations located in Alentejo and Algarve region, southern of Portugal, for short term prediction of drought severity classes. Loglinear models were fitted to drought class transitions derived from Standardized Precipitation Index (SPI) time series computed in a 12-month time scale. Quasi-association loglinear models proved to be the most adequate in fitting all the 14 data series. Odds and respective confidence intervals were calculated in order to understand the drought evolution and to estimate the drought class transition probabilities. The validation of the predictions was performed for the 2004-2006 drought, particularly for periods when the drought was initiating and establishing, and when it was dissipating. Despite the contingency tables of drought class transitions present a strong diagonal tendency, results of three-dimensional loglinear modeling present good results when comparing predicted and observed drought classes with 1 and 2 months lead for those 14 sites. Only for a few cases predictions did not fully match the observed drought severity, mainly for 2-month lead and when the SPI values are near the limit of the severity class. It could be concluded that loglinear prediction of drought class transitions is a useful tool for short term drought warning. (c) 2008 Elsevier B.V. All rights reserved.
Notes:
2007
Luís M Grilo, Carlos A Coelho (2007)  Development and study of two near-exact approximations to the distribution of the product of an odd number of independent Beta random variables   Journal of Statistical Planning and Inference 137: 5. 1560-1575  
Abstract: Using the concept of near-exact approximation to a distribution we developed two different near-exact approximations to the distribution of the product of an odd number of particular independent Beta random variables (r.v.'s). One of them is a particular generalized near-integer Gamma (GNIG) distribution and the other is a mixture of two GNIG distributions. These near-exact distributions are mostly adequate to be used as a basis for approximations of distributions of several statistics used in multivariate analysis. By factoring the characteristic function (c.f.) of the logarithm of the product of the Beta r.v.'s, and then replacing a suitably chosen factor of that c.f. by an adequate asymptotic result it is possible to obtain what we call a near-exact c.f., which gives rise to the near-exact approximation to the exact distribution. Depending on the asymptotic result used to replace the chosen parts of the c.f., one may obtain different near-exact approximations. Moments from the two near-exact approximations developed are compared with the exact ones. The two approximations are also compared with each other, namely in terms of moments and quantiles.
Notes:
Rui P Alberto, Carlos A Coelho (2007)  Study of the quality of several asymptotic and near-exact approximations based on moments for the distribution of the Wilks Lambda statistic   Journal of Statistical Inference and Planning 137: 5. 1612-1626  
Abstract: In this paper a measure of proximity of distributions, when moments are known, is proposed. Based on cases where the exact distribution is known, evidence is given that the proposed measure is accurate to evaluate the proximity of quantiles (exact vs. approximated). The measure may be applied to compare asymptotic and near-exact approximations to distributions, in situations where although being known the exact moments, the exact distribution is not known or the expression for its probability density function is not known or too complicated to handle. In this paper the measure is applied to compare newly proposed asymptotic and near-exact approximations to the distribution of the Wilks Lambda statistic when both groups of variables have an odd number of variables. This measure is also applied to the study of several cases of telescopic near-exact approximations to the exact distribution of the Wilks Lambda statistic based on mixtures of generalized near-integer gamma distributions.
Notes:
Carlos A Coelho, João T Mexia (2007)  On the distribution of the product and ratio of independent generalized Gamma-ratio random variables   Sankhya 69: 2. 221-255  
Abstract: Using a decomposition of the characteristic function of the logarithm of the product of independent generalized gamma-ratio random variables (r.v.âs), we obtain explicit expressions for both the probability density and cumulative distribution functions of the product of independent r.v.âs with generalized F or generalized gamma-ratio distributions in the form of particular mixtures of generalized Pareto and inverted Pareto distributions. The expressions obtained do not involve any unsolved integrals and are convenient for computer implementation. By considering power parameters which are not required to be positive, we were able to obtain, as particular cases, not only the distributions for the product of folded T and folded Cauchy r.v.âs but also for the ratio of two independent products of generalized gamma-ratio r.v.âs. Theoretical applications of the results as well as simulations are presented.
Notes:
Carlos A Coelho (2007)  The wrapped Gamma distribution and wrapped sums and linear combinations of independent Gamma and Laplace distributions   Journal of Statistical Theory and Practice 1: 1. 1-29  
Abstract: In this paper we first obtain an expression for the probability density function of the wrapped or circular Gamma distribution and then we show how it may be seen, both for integer and non-integer shape parameter, as a mixture of truncated Gamma distributions. Some other properties of the wrapped Gamma distribution are studied and it is shown how this distribution and mixtures of these distributions may be much useful tools in modelling directional data in biology and meteorology. Based on the results obtained, namely the ones concerning mixtures, and on some properties of the distributions of the sum of independent Gamma random variables, the wrapped versions of the distributions of such sums, for both integer and non-integer shape parameters are derived. Also the wrapped sum of independent generalized Laplace distributions is introduced as a particular case of a mixture of wrapped Gamma distributions. Among the particular cases of the distributions introduced there are symmetrical, slightly skewed and highly skewed wrapped distributions as well as the recently introduced wrapped Exponential and Laplace distributions.
Notes:
2006
Carlos A Coelho (2006)  The joint characterization of discrete and continuous 'waiting times' through their reciprocal relationships   Journal of Interdisciplinary Mathematics 9: 2. 297-318  
Abstract: Let us suppose there is an event C (occurrence of a given defect or disease) that is part of a group of events we are interested in, and whose probability of occurrence in that group is known. The distribution of the waiting times for the $r_1$-th event in that group, given that we expect $r$ events C, and the distribution of the number of events in the group that are not events C, given that we waited for a length $y$ waiting for the $r_1$-th event in the group (given that we expected $r$ events C) are derived based on very mild assumptions. Relations of the distributions obtained with known distributions, their expression as mixtures and a limiting case are also studied. Cases where $r_1<r$ and $r_1>r$ are studied in detail, since they correspond to two different situations of interest, the one in which r«the event C is one of the rarest ones in its group, is not easy toidentify or its occurrence $r$ times kills or disables the observation unit, or the case in which it may be rather common in the group and easy to identify. Examples of application in epidemiology, industry, transportation and agriculture are used for illustration.
Notes:
Carlos A Coelho, Rui P Alberto, Luís M Grilo (2006)  A mixture of Generalized Integer Gamma distributions as the distribution of the product of an odd number of independent Beta random variables: applications   Journal of Interdisciplinary Mathematics 9: 2. 229-248  
Abstract: In this paper we show first how the distribution of the logarithm of a random variable with a Beta distribution may be expressed either as a mixture of Gamma distributions or as a mixture of Generalized Integer Gamma (GIG) distributions and then how the exact distribution of the product of an odd number of independent Beta random variables whose first parameter evolves by 1/2 and whose second parameter is the half of an odd integer may be expressed as a mixture of GIG distributions. Some particularities of these mixtures are analysed. The results are then used to obtain the exact distribution of the logarithm of the Wilks Lambda statistic to test the independence of two sets of variables, both with an odd number of variables, and the exact distribution of the logarithm of the generalized Wilks Lambda statistic to test the independence of several sets of variables, in the case where two or three of them have an odd number of variables. A discussion of relative advantages and disadvantages of the use of the exact versus near-exact distributions is carried out.
Notes:
Carlos A Coelho (2006)  The exact and near-exact distributions of the product of independent Beta random variables whose second parameter is rational   Journal of Combinatorics, Information and System Sciences 31: 21-44  
Abstract: In this paper the exact distribution of the logarithm of the product of a given number of independent Beta random variables whose second parameter is rational is obtained under the form of a Generalized Integer Gamma distribution, for some particular cases, and near-exact distributions are obtained either as Generalized Near-Integer Gamma distributions or mixtures of these distributions, for the more general cases. As particular cases of interest we have the exact and near-exact distributions of the generalized Wilks Lambda statistic.
Notes:
2005
Ana A Paulo, Eunice Ferreira, Carlos A Coelho, Luís S Pereira (2005)  Drought class transition analysis through Markov and Loglinear models, an approach to early warning   Agricultural Water Management 77: 59-81  
Abstract: The standardized precipitation index (SPI) based on 68 years of precipitation data was computed for several sites of Alentejo, a drought prone region of southern Portugal. Drought classes were derived from SPI values. Markov chain modelling was used in order to estimate: (a) the probability of different drought severity classes; (b) the expected time in each class of severity; (c) the recurrence time to a particular drought class; (d) the expected time for the SPI to change from a particular class to another. A short-term conditional prediction scheme of drought classes is tested. The non-homogeneous Markov chains formulation produced better predictive results since probabilities are tied to each month. However, the persistence of recent climate conditions tend to dominate, so limiting the prediction capability of Markov chains modelling. Several Loglinear models were fitted to the drought class transition matrices and the computed odds and the respective confidence intervals were used to predict drought class transitions. Generally, the odds show lower values as the drought severity increases for the initial month and decreases for the following months, thus showing that odds of transition to the non-drought class versus transition to any drought class decrease when the drought severity of the present class increases. If the present drought class is moderate or severe, the probability of being 1 month from now in a drought class is higher than the probability of being in the non-drought class. Results show the utility of using the above-mentioned stochastic models to support monitoring the evolution of droughts and to produce early warning in combination with other indicators.
Notes:
2004
Joaquim Silva, João T Mexia, Carlos A Coelho, Gabriel Lopes (2004)  A statistical approach for multilingual document clustering and topic extraction from clusters   Pliska Studia Mathematica Bulgarica 16: 207-228  
Abstract: This paper describes a statistics-based methodology for document unsupervised clustering and cluster topics extraction. For this purpose, multiword lexical units (MWUs) of any length are automatically extracted from corpora using the LiPXtractor - a language independent statistics-based tool. The MWUs are taken as base features to characterize documents. These features are transformed and a document similarity matrix is constructed. From this matrix, a reduced set of features is selected using an approach based on Principal Component Analisys. Then, using the Model Based Clustering Analisys software, it is possible to obtain the best number of clusters. Precision and Recall for document-cluster assignment range above 90 %. Most important MWUs are extracted from each cluster and taken as document cluster topics. Results on new document classification will just be mentioned.
Notes:
Carlos A Coelho (2004)  The generalized near-integer Gamma distribution: a basis for ‘near-exact’ approximations to the distribution of statistics which are the product of an odd number of independent Beta random variables   Journal of Multivariate Analysis 89: 2. 191-218  
Abstract: In this paper the concept of near-exact approximation to a distribution is introduced. Based on this concept it is shown how a random variable whose exponential has a Beta distribution may be closely approximated by a sum of independent Gamma random variables, giving rise to the generalized near-integer (GNI) Gamma distribution. A particular near-exact approximation to the distribution of the logarithm of the product of an odd number of independent Beta random variables is shown to be a GNI Gamma distribution. As an application, a near-exact approximation to the distribution of the generalized Wilks Î statistic is obtained for cases where two or more sets of variables have an odd number of variables. This near-exact approximation gives the exact distribution when there is at most one set with an odd number of variables. In the other cases a near-exact approximation to the distribution of the logarithm of the Wilks Lambda statistic is found to be either a particular generalized integer Gamma distribution or a particular GNI Gamma distribution.
Notes:

Book chapters

2006
Carlos A Coelho (2006)  The joint characterization of discrete and continuous 'waiting times' through their reciprocal relationships   In: Advances in Interdisciplinary Mathematics Edited by:Sat Gputa, B. K. Dass. 69-90 Taru Publications isbn:81-901493-4-2  
Abstract: Let us suppose there is an event C (occurrence of a given defect or disease) that is part of a group of events we are interested in, and whose probability of occurrence in that group is known. The distribution of the waiting times for the $r_1$-th event in that group, given that we expect $r$ events C, and the distribution of the number of events in the group that are not events C, given that we waited for a length $y$ waiting for the $r_1$-th event in the group (given that we expected $r$ events C) are derived based on very mild assumptions. Relations of the distributions obtained with known distributions, their expression as mixtures and a limiting case are also studied. Cases where $r_1r$ are studied in detail, since they correspond to two different situations of interest, the one in which r«the event C is one of the rarest ones in its group, is not easy toidentify or its occurrence $r$ times kills or disables the observation unit, or the case in which it may be rather common in the group and easy to identify. Examples of application in epidemiology, industry, transportation and agriculture are used for illustration.
Notes:
Carlos A Coelho, Rui P Alberto, Luís M Grilo (2006)  A mixture of Generalized Integer Gamma distributions as the exact distribution of the product of an odd number of independent Beta random variables: applications   In: Advances in Interdisciplinary Mathematics Edited by:Sat Gupta, B. K. Dass. 1-20 Taru Publications isbn:81-901493-4-2  
Abstract: We show first how the distribution of the logarithm of a random variable with a beta distribution may be expressed either as a mixture of gamma distributions or as a mixture of generalized integer gamma (GIG) distributions and then how the exact distribution of the product of an odd number of independent beta random variables whose first parameter evolves by $1/2$ and whose second parameter is the half of an odd integer may be expressed as a mixture of GIG distributions. Some particularities of these mixtures are analysed. The results are then used to obtain the exact distribution of the logarithm of the Wilks $\Lambda$ statistic to test the independence of two sets of variables, both with an odd number of variables, and the exact distribution of the logarithm of the generalized Wilks $\Lambda$ statistic to test the independence of several sets of variables, in the case where two or three of them have an odd number of variables. A discussion of relative advantages and disadvantages of the use of the exact versus near-exact distributions is carried out.
Notes:

Conference papers

2009
Powered by PublicationsList.org.