Integrating Statistical Basefunctionality in Interactive Visual Data Analysis

Abstract

Both information visualization and statistics analyse high dimensional data, but these sciences provide different ways to explore datasets. The information visualization is a branch of the field of computer graphics and creates graphics of the datasets that in general contain more than three dimensions to provide insight to the behaviour of the data. Because of the high dimensionality the data items usually do not show any inherent spatial reference, which poses a special challenge to visualize the entire data. Additionally interaction possibilities are provided to adapt the graphics to the needs of the user. This allows the visual exploration and the extraction of the intrinsic information of the data.

In contrast to that statistics execute algorithms that provide numerical summaries of the analysis of the datasets. Based on the knowledgeable theory of data exploration the results of those methods allow making statements about the datasets and provide a hint for their validity.

As both sciences pursue the same aims, it is a consistent consequence to combine methods of information visualization and statistics to achieve a more efficient exploration of multivariate data, which is also called data mining. Therefore this work surveys the most important tools provided by both disciplines to analyse high dimensional data. Furthermore existing applications using techniques of the field of statistics and of the information visualization are presented.

But the main contribution of this work is to provide statistical methods for visual data mining applications. Therefore a library was compiled that contains routines, which are of high importance for information visualization techniques and allow a fast modification of their results, to integrate possible adaptations in the visualization. The library is able to work on datasets containing millions of data items and hundreds of dimensions.

In addition an example application is introduced that demonstrates a possible interweaving between statistical methods and information visualization techniques. Tasks like the detection of outliers, the grouping of data items and attributes as well as the reduction of the dimensionality were incorporated.

Abstract

Download