hosted by
publicationslist.org
    

Cagatay Catal


cagataycatal@hotmail.com

Books

2012

Journal articles

2012
2011
Cagatay Catal (2011)  Software Fault Prediction: A Literature Review and Current Trends   Expert Systems with Applications 38: 4626-4636 April  
Abstract: Software engineering discipline contains several prediction approaches such as test effort prediction, correction cost prediction, fault prediction, reusability prediction, security prediction, effort prediction, and quality prediction. However, most of these prediction approaches are still in preliminary phase and more research should be conducted to reach robust models. Software fault prediction is the most popular research area in these prediction approaches and recently several research centers started new projects on this area. In this study, we investigated 90 software fault prediction papers published between year 1990 and year 2009 and then we categorized these papers according to the publication year. This paper surveys the software engineering literature on software fault prediction and both machine learning based and statistical based approaches are included in this survey. Papers explained in this article reflect the outline of what was published so far, but naturally this is not a complete review of all the papers published so far. This paper will help researchers to investigate the previous studies from metrics, methods, datasets, performance evaluation metrics, and experimental results perspectives in an easy and effective manner. Furthermore, current trends are introduced and discussed.
Notes:
Cagatay Catal, Oral Alan, Kerime Balkan (2011)  Class Noise Detection based on Software Metrics and ROC Curves   Information Sciences  
Abstract: Noise detection for software measurement datasets is a topic of growing interest. The presence of class and attribute noise in software measurement datasets degrades the performance of machine learning-based classifiers, and the identification of these noisy modules improves the overall performance. In this study, we propose a noise detection algorithm based on software metrics threshold values. The threshold values are obtained from the Receiver Operating Characteristic (ROC) analysis. This paper focuses on case studies of five public NASA datasets and details the construction of Naive Bayes-based software fault prediction models both before and after applying the proposed noise detection algorithm. Experimental results show that this noise detection approach is very effective for detecting the class noise and that the performance of fault predictors using a Naive Bayes algorithm with a logNum filter improves if the class labels of identified noisy modules are corrected.
Notes:
Cagatay Catal, Ugur Sevim, Banu Diri (2011)  Practical development of an Eclipse-based software fault prediction tool using Naive Bayes algorithm   Expert Systems with Applications 38: 3. 2347-2353 March  
Abstract: Despite the amount of effort software engineers have been putting into developing fault prediction models, software fault prediction still poses great challenges. This research using machine learning and statistical techniques has been ongoing for 15 years, and yet we still have not had a breakthrough. Unfortunately, none of these prediction models have achieved widespread applicability in the software industry due to a lack of software tools to automate this prediction process. Historical project data, including software faults and a robust software fault prediction tool, can enable quality managers to focus on fault-prone modules. Thus, they can improve the testing process. We developed an Eclipse-based software fault prediction tool for Java programs to simplify the fault prediction process. We also integrated a machine learning algorithm called Naive Bayes into the plug-in because of its proven high-performance for this problem. This article presents a practical view to software fault prediction problem, and it shows how we managed to combine software metrics with software fault data to apply Naive Bayes technique inside an open source platform.
Notes:
2009

Book chapters

2010
Cagatay Catal, Ugur Sevim, Banu Diri (2010)  Metrics-Driven Software Quality Prediction Without Prior Fault Data   In: Electronic Engineering and Computing Technology 189-199 Springer Netherlands  
Abstract: Software quality assessment models are quantitative analytical models that are more reliable compared to qualitative models based on personal judgment. These assessment models are classified into two groups: generalized and product-specific models. Measurement-driven predictive models, a subgroup of product-specific models, assume that there is a predictive relationship between software measurements and quality. In recent years, greater attention in quality assessment models has been devoted to measurement-driven predictive models and the field of software fault prediction modeling has become established within the product-specific model category. Most of the software fault prediction studies focused on developing fault predictors by using previous fault data. However, there are cases when previous fault data are not available. In this study, we propose a novel software fault prediction approach that can be used in the absence of fault data. This fully automated technique does not require an expert during the prediction process and it does not require identifying the number of clusters before the clustering phase, as required by the K-means clustering method. Software metrics thresholds are used to remove the need for an expert. Our technique first applies the X-means clustering method to cluster modules and identifies the best cluster number. After this step, the mean vector of each cluster is checked against the metrics thresholds vector. A cluster is predicted as fault-prone if at least one metric of the mean vector is higher than the threshold value of that metric. Three datasets, collected from a Turkish white-goods manufacturer developing embedded controller software, have been used during experimental studies. Experiments revealed that unsupervised software fault prediction can be automated fully and effective results can be achieved by using the X-means clustering method and software metrics thresholds.
Notes:

Conference papers

2011
2010
2009
Cagatay Catal (2009)  Codifying Domain-Specific Experience into Software Development Tools: An Eclipse-based Embedded Platform Development Experience   In: The IEEE Region 8 Conference EUROCON 2009, St. Petersburg, Russia, 18-23 May. 392-398  
Abstract: Organizations generally lose their domain experiences when key developers leave from the organization which doesn't have a powerful and effective infrastructure to collect, package, validate, and spread experience. In a recent project aimed at building a general purpose embedded application development platform, we developed an Eclipse-based IDE to accelerate our embedded development process, codify our Linux embedded software development knowledge on one extensible platform, and standardize tools, scripts, and libraries within our organization. This paper shows the approach that we used to collect domain-specific experience, component-based layered architecture of Eclipse-based platform, and our experiences on Eclipse.
Notes:
Cagatay Catal, Ugur Sevim, Banu Diri (2009)  Software Fault Prediction of Unlabeled Program Modules   In: International Conference of Computer Science and Engineering, World Congress on Engineering (WCE 2009), London, UK, International Association of Engineers (IAENG)  
Abstract: Software metrics and fault data belonging to a previous software version are used to build the software fault prediction model for the next release of the software. Until now, different classification algorithms have been used to build this kind of models. However, there are cases when previous fault data are not present; and hence, supervised learning approaches cannot be applied. In this study, we propose a fully automated technique which does not require an expert during the prediction process. In addition, it is not required to identify the number of clusters before the clustering phase, as required by K-means clustering method. Software metrics thresholds are used to remove the expert necessity. Our technique first applies X-means clustering method to cluster modules and identifies the best cluster number. After this step, the mean vector of each cluster is checked against the metrics thresholds vector. A cluster is predicted as fault-prone if at least one metric of the mean vector is higher than the threshold value of that metric. In addition to X-means clustering-based method, we made experiments with pure metrics thresholds method, fuzzy clustering, and K-means clustering-based methods. Experiments reveal that unsupervised software fault prediction can be fully automated and effective results can be produced using X-means clustering with software metrics thresholds. Three datasets, collected from Turkish white-goods manufacturer developing embedded controller software, have been used for the validation.
Notes:
Oral Alan, Cagatay Catal (2009)  An Outlier Detection Algorithm Based on Object-Oriented Metrics Thresholds   In: 24th International Symposium on Computer and Information Sciences, ISCIS 2009, Guzelyurt, Northern Cyprus, September 14-16,2009, Middle East Technical University  
Abstract: Detection of outliers in software measurement datasets is a critical issue that affects the performance of software fault prediction models built based on these datasets. Two necessary components of fault prediction models, software metrics and fault data, are collected from the software projects developed with object-oriented programming paradigm. We proposed an outlier detection algorithm based on these kinds of metrics thresholds. We used Random Forests machine learning classifier on two software measurement datasets collected from jEdit open-source text editor project and experiments revealed that our outlier detection approach improves the performance of fault predictors based on Random Forests classifier.
Notes:
2008
Cagatay Catal (2008)  Predictable Software Quality: Complexity and Security Concerns   In: Quality for Financial Applications (QAFA) and Test Management Summit (TMS) European Test Center Krakow, Poland:  
Abstract: Since today’s software is more complex than ever, software quality should be managed with an engineering approach called Software Quality Engineering (SQE). Even though there are several Quality Assurance techniques inside SQE, software testing is still the most dominant quality assurance activity in the software sector. Unmanned aerial vehicles (UAV), inter-continental ballistic missiles (UCBM), and combat robots to deal with roadside bombs are some examples of complex systems that must be reliable, secure, available, and safe. Software Engineering Institute (SEI) published a report in 2006 and proposed a research agenda for U.S. Department of Defense about ultra-large-scale systems which will likely to have billions of lines of code within 30-50 years. “Adaptable and Predictable System Quality” is a research area proposed by SEI for future ultra-large-scale systems to maintain quality under attacks and failures. Complexity is one of the most important internal quality factors locating under the software quality iceberg and avoiding unnecessary complexity during software development makes systems more secure. Current quality prediction models apply several complexity metrics to predict fault-prone modules and these models can be adapted to identify vulnerability-prone or attack-prone components by using several security metrics together with complexity metrics. Software engineering discipline contains several prediction approaches such as test effort prediction, correction cost prediction, defect prediction, reusability prediction, and quality prediction. We proposed a software life cycle called prediction-centric software life cycle including some of these prediction approaches and we believe this life cycle will improve the quality and make the software quality predictable. We suggest that early prediction of software faults with quality prediction models, early identification of vulnerability-prone components with security prediction models, and a prediction-centric life cycle are strong elements for systems of the future.
Notes:
Cagatay Catal, Banu Diri (2008)  A Conceptual Framework to Integrate Fault Prediction Sub-process for Software Product Lines   In: 2nd IEEE International Symposium on Theoretical Aspects of Software Engineering Nanjing, China: IEEE Computer Society  
Abstract: Software product line engineering is a growing recent paradigm to develop similar products using reusable core assets such as architecture and test cases. The general aim is to enhance quality and decrease development costs. Current software product line engineering frameworks apply only a few quality assurance activities but today’s single-system engineering has much more quality assurance activities that can be adapted to software product lines. In this study, software fault prediction sub-process is integrated into software product line engineering framework and the key activities are defined. This approach will improve quality and enhance testing process for software product lines.
Notes:
2007
Cagatay Catal, Banu Diri (2007)  An Artificial Immune System Approach for Fault Prediction in Object-Oriented Software,   In: 2nd International Conference on Dependability of Computer Systems (DepCos-Relcomex 2007), Edited by:Wojciech Zamojski, Jacek Mazurkiewicz. 238-245, Szklarska Poreba, Poland: IEEE Computer Society.  
Abstract: The features of real-time dependable systems are availability, reliability, safety and security. In the near future, real-time systems will be able to adapt themselves according to the specific requirements and real-time dependability assessment technique will be able to classify modules as faulty or fault-free. Software fault prediction models help us in order to develop dependable software and they are commonly applied prior to system testing. In this study, we examine Chidamber-Kemerer (CK) metrics and some method-level metrics for our model which is based on Artificial Immune Recognition System (AIRS) algorithm. The dataset is a part of NASA Metrics Data Program and class-level metrics are from PROMISE repository. Instead of validating individual metrics, our mission is to improve the prediction performance of our model. The experiments indicate that the combination of CK and the lines of code metrics provide the best prediction results for our fault prediction model. The consequence of this study suggests that class-level data should be used rather than methodlevel data to construct relatively better fault prediction models. Furthermore, this model can constitute a part of real-time dependability assessment technique for the future.
Notes:
Cagatay Catal, Banu Diri (2007)  Software Fault Prediction with Object-Oriented Metrics Based Artificial Immune Recognition System,   In: Product-Focused Software Process Improvement Conference 2007 Springer-Verlag, Lecture Notes in Computer Science 4589, pp. 300-314.  
Abstract: Software testing is a time-consuming and expensive process. Software fault prediction models are used to identify fault-prone classes automatically before system testing. These models can reduce the testing duration, project risks, resource and infrastructure costs. In this study, we propose a novel fault prediction model to improve the testing process. Chidamber-Kemerer Object-Oriented metrics and method-level metrics such as Halstead and McCabe are used as independent metrics in our Artificial Immune Recognition System based model. According to this study, class-level metrics based model which applies AIRS algorithm can be used successfully for fault prediction and its performance is higher than J48 based approach. A fault prediction tool which uses this model can be easily integrated into the testing process.
Notes:
Cagatay Catal, Banu Diri (2007)  Software Defect Prediction using Artificial Immune Recognition System   In: 25th IASTED International Multi-Conference on Software Engineering 285-290 IASTED Innsbruck, Austria: ACTA Press  
Abstract: Predicting fault-prone modules for software development projects enables companies to reach high reliable systems and minimizes necessary budget, personnel and resource to be allocated to achieve this goal. Researchers have investigated various statistical techniques and machine learning algorithms until now but most of them applied their models to the different datasets which are not public or used different criteria to decide the best predictor model. Artificial Immune Recognition System is a supervised learning algorithm which has been proposed in 2001 for the classification problems and its performance for UCI datasets (University of California machine learning repository) is remarkable. In this paper, we propose a novel software defect prediction model by applying Artificial Immune Recognition System (AIRS) along with the Correlation-Based Feature Selection (CFS) technique. In order to evaluate the performance of the proposed model, we apply it to the five NASA public defect datasets and compute G-mean 1, G-mean 2 and F-measure values to discuss the effectiveness of the model. Experimental results show that AIRS has a great potential for software defect prediction and AIRS along with CFS technique provides relatively better prediction for large scale projects which consist of many modules.
Notes:
2006
2005
2004
2003

Paper in Special Issue (Turkish)

2007

Poster

2008
Powered by PublicationsList.org.