hosted by
publicationslist.org
    

George Giannakopoulos


ggianna@iit.demokritos.gr

Journal articles

2010
2008
George Giannakopoulos, Vangelis Karkaletsis, George Vouros, Panagiotis Stamatopoulos (2008)  Summarization system evaluation revisited : N-gram graphs   ACM Trans. Speech Lang. Process. 5: 3. 1-39  
Abstract: This article presents a novel automatic method (AutoSummENG) for the evaluation of summarization systems, based on comparing the character n-gram graphs representation of the extracted summaries and a number of model summaries. The presented approach is language neutral, due to its statistical nature, and appears to hold a level of evaluation performance that matches and even exceeds other contemporary evaluation methods. Within this study, we measure the effectiveness of different representation methods, namely, word and character n-gram graph and histogram, different n-gram neighborhood indication methods as well as different comparison methods between the supplied representations. A theory for the a priori determination of the methods’ parameters along with supporting experiments concludes the study to provide a complete alternative to existing methods concerning the automatic summary system evaluation process.
Notes:

Conference papers

2012
Katsiarina Mirylenka, George Giannakopoulos, Themis Palpanas (2012)  SRF : A Framework for the Study of Classifier Behavior under Training Set Mislabeling Noise   In: Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2012 Kuala Lumpur, Malaysia:  
Abstract: Machine learning algorithms perform differently in settings with varying levels of training set mislabeling noise. Therefore, the choice of a good algorithm for a particular learning problem is crucial. In this paper, we introduce the ``Sigmoid Rule'' Framework focusing on the description of classifier behavior in noisy settings. The framework uses an existing model of the expected performance of learning algorithms as a sigmoid function of the signal-to-noise ratio in the training instances. We study the parameters of the above sigmoid function using five different classifiers, namely, Naive Bayes, kNN, SVM, a decision tree classifier, and a rule-based classifier. Our study leads to the definition of intuitive criteria based on the sigmoid parameters that can be used to compare the behavior of learning algorithms in the presence of varying levels of noise. Furthermore, we show that there exists a connection between these parameters and the characteristics of the underlying dataset, hinting at how the inherent properties of a dataset affect learning. The framework is applicable to concept drift scenaria, including modeling user behavior over time, and mining of noisy data series, as in sensor networks.
Notes:
George Giannakopoulos, Vangelis Karkaletsis, George Vouros (2012)  Detecting Human Features in Summaries - Symbol Sequence Statistical Normality   In: SETN 2012 Lamia, Greece:  
Abstract: The presented work studies textual summaries, aiming to detect the qualities of human multi-document summaries, in contrast to automatically extracted ones. The measured features are based on a generic statistical regularity measure, named Symbol Sequence Statistical Regularity (SSSR). The measure is calculated over both character and word n-grams of various ranks, given a set of human and automatically extracted multi-document summaries from two different corpora. The results of the experiments indicate that the proposed measure provides enough distinctive power to discriminate between the human and non-human summaries. The results hint on the qualities a human summary holds, increasing intuition related to how a good summary should be generated.
Notes:
2011
George Papadakis, George Giannakopoulos, Claudia Niederée, Themis Palpanas, Wolfgang Nejdl (2011)  Detecting and exploiting stability in evolving heterogeneous information spaces   In: Proceeding of the 11th annual international ACM/IEEE joint conference on Digital libraries 95-104  
Abstract: Individuals contribute content on the Web at an unprecedented rate, accumulating immense quantities of (semi-) structured data. Wisdom of the Crowds theory advocates that such information (or parts of it) is constantly overwritten, updated, or even deleted by other users, with the goal of rendering it more accurate, or up-to-date. This is particularly true for the collaboratively edited, semi-structured data of entity repositories, whose entity proï¬les are consistently kept fresh. Therefore, their core information that remain stable with the passage of time, despite being reviewed by numerous users, are particularly useful for the description of an entity. Based on the above hypothesis, we introduce a classification scheme that predicts, on the basis of statistical and content patterns, whether an attribute (i.e., name-value pair) is going to be modiï¬ed in the future. We apply our scheme on a large, real-world, versioned dataset and verify its effectiveness. Our thorough experimental study also suggests that reducing entity proï¬les to their stable parts conveys significant benefits to two common tasks in computer science: information retrieval and information integration.
Notes:
2010
George Giannakopoulos, Vangelis Karkaletsis (2010)  Summarization System Evaluation Variations Based on N-Gram Graphs   In: Text Analysis Conference 2010 NIST Gaithersburg, MD, USA:  
Abstract: Within this article, we present the application of the AutoSummENG method within the TAC 2010 AESOP challenge. We further present two novel evaluation methods based on n-gram graphs. The first method is called Merged Model Graph (MeMoG) and it uses the n-gram graph framework to represent a set of documents with a single, ``centroid'' graph, offering state-of-the-art performance. The second method is called Hierarchical Proximity Graph (HPG) evaluation and it uses a hierarchy of graphs to represent texts, aiming to represent different granularity levels under a unified view. The experiments indicate that both novel methods offer very promising performance in different aspects of evaluation, improving on AutoSummENG scores.
Notes:
George Giannakopoulos, Themis Palpanas (2010)  The Effect of History on Modeling Systems' Performance: The Problem of the Demanding Lord   In: ICDM 2010 809-814 IEEE  
Abstract: In several concept attainment systems, ranging from recommendation systems to information filtering, a sliding window of learning instances has been used in the learning process to allow the learner to follow concepts that change over time. However, no analytic study has been performed on the relation between the size of the sliding window and the performance of a learning system. In this work, we present such an analytic model that describes the effect of the sliding window size on the prediction performance of a learning system based on iterative feedback. Using a signal-to-noise approach to model the learning ability of the underlying machine learning algorithms, we can provide good estimates of the average performance of a modeling system independently of the supervised machine learning algorithm employed.We experimentally validate the effectiveness of the proposed methodology with detailed experiments using synthetic and real datasets, and a variety of learning algorithms, including Support Vector Machines, Naive Bayes, Nearest Neighbor and Decision Trees. The results validate the analysis and indicate very good estimation performance in different settings.
Notes:
2009
2008

PhD theses

2009

Technical reports

2010
George Giannakopoulos, George A Vouros, Vangelis Karkaletsis (2010)  MUDOS-NG : Multi-document Summaries Using N-gram Graphs (Tech Report)   arXiv.org  
Abstract: This report describes the MUDOS-NG summarization system, which applies a set of language-independent and generic methods for generating extractive summaries. The proposed methods are mostly combinations of simple operators on a generic character n-gram graph representation of texts. This work defines the set of used operators upon n-gram graphs and proposes using these operators within the multi-document summarization process in such subtasks as document analysis, salient sentence selection, query expansion and redundancy control. Furthermore, a novel chunking methodology is used, together with a novel way to assign concepts to sentences for query expansion. The experimental results of the summarization system, performed upon widely used corpora from the Document Understanding and the Text Analysis Conferences, are promising and provide evidence for the potential of the generic methods introduced. This work aims to designate core methods exploiting the n-gram graph representation, providing the basis for more advanced summarization systems.
Notes:
Powered by PublicationsList.org.