Abstract: During the last decades, the disciplines of Data Mining and Operations Research have been working mostly
independent of each other. However, the increasing complexity of today’s applications in areas such as business, medicine, and science requires more and more interaction between both disciplines. On the one hand, several data mining algorithms are based on optimization methods. On the other hand, in several applications the pure Knowledge Discovery in Databases (KDD) process is not sufficient since it does not take explicitly into account the entire decision process. This report presents future trends in Business Analytics and Optimization discussed at the panel sessions during the workshop on Business Analytics and Optimization (BAO’2010), where the future challenges of data mining regarding privacy, cyber-crime, dynamic models, open source/closed source software, model construction, model use and usability, and legal regulations, among others, were discussed.
Abstract: This paper addresses the problem of probability estimation in Multiclass classification tasks combining two well-known data mining techniques: Support Vector Machines and Neural Networks. We present an algorithm which uses both techniques in a two-step procedure. The first step employs Support Vector Machines within a One-vs-All reduction from multiclass to binary approach to obtain the distances between each observation and the Support Vectors representing the classes. The second step uses these distances as inputs for a Neural Network, built with an entropy cost function and softmax transfer function for the output layer where class membership is used for training. Consequently, this network estimates probabilities of class membership for new observations. A benchmark using different databases demonstrates that the proposed algorithm is highly competitive with the most recent techniques for multiclass probability estimation.
Abstract: Constrained clustering addresses the problem of creating minimum variance clusters with the added complexity that there is a set of constraints that must be fulfilled by the elements in the cluster. Research in this area has focused on "must-link" and ""cannot-link'" constraints, in which pairs of elements must be in the same or in different clusters, respectively. In this work we present a heuristic procedure to perform clustering in two classes when the restrictions affect all the elements of the two clusters in such a way that they depend on the elements present in the cluster. This problem is highly susceptible to outliers in each cluster (extreme values that create infeasible solutions), so the procedure eliminates elements with extreme values in both clusters, and achieves adequate performance measures at the same time. The experiments performed on a company database allow to discover a great deal of information, with results that are more readily interpretable when compared to classical k-means clustering.
Abstract: Data Mining is a widely used discipline with methods that are heavily supported by statistical theory. Game theory, instead, develops models with solid economical foundations but with low applicability in companies so far. This work attempts to unify both approaches, presenting a model of price competition in the credit industry. Based on game theory and sustained by the robustness of Support Vector Machines to structurally estimate the model, it takes advantage from each approach to provide strong results and useful information. The model consists of a market-level game that determines the marginal cost, demand, and efficiency of the competitors. Demand is estimated using Support Vector Machines, allowing the inclusion of multiple variables and empowering standard economical estimation through the aggregation of client-level models. The model is being applied by one competitor, which created new business opportunities, such as the strategic chance to aggressively cut prices given the acquired market knowledge.
Abstract: To model market dynamics is a challenge that has attracted the interest of practitioners and researchers alike. This problem has been addressed from the perspective of Game Theory, in models that explicitly include profit-maximization schemes for the companies, and also from the point of view of data mining, with models that consider multivariate functions to model customer demands and related phenomena. In this paper we present a two-stage model that unifies both approaches, with a hybrid neural network - support vector machines model to estimate multiclass demand at a client level, that then serves as input for a game theoretic model that considers the strategic relationships between costs and demands in price fixation schemes for Bertrand equilibriums. The model was applied to a real-life database in a loan-granting institution with good results. New knowledge discovered includes insights about cost structures and the competitive behavior of the institutions, creating new business opportunities.
Abstract: This paper addresses the problem of probability estimation in multiclass classification tasks combining two well known data mining techniques: support vector machines and neural networks. We present an algorithm which uses both techniques in a two-step procedure. The first step employs support vector machines within a one-vs-all reduction from multiclass to binary approach to obtain the distances between each observation and the support vectors representing the classes. The second step uses these distances as inputs for a neural network, built with an entropy cost function and softmax transfer function for the output layer where class membership is used for training. Consequently, this network estimates probabilities of class membership for new observations. A benchmark using different databases demonstrates that the proposed algorithm is highly competitive with the most recent techniques for multiclass probability estimation.