Both information visualization and statistics analyse high dimensional data, but these
sciences provide different ways to explore datasets. The information visualization is
a branch of the field of computer graphics and creates graphics of the datasets that
in general contain more than three dimensions to provide insight to the behaviour
of the data. Because of the high dimensionality the data items usually do not show
any inherent spatial reference, which poses a special challenge to visualize the entire
data. Additionally interaction possibilities are provided to adapt the graphics to the
needs of the user. This allows the visual exploration and the extraction of the intrinsic
information of the data.
In contrast to that statistics execute algorithms that provide numerical summaries of
the analysis of the datasets. Based on the knowledgeable theory of data exploration
the results of those methods allow making statements about the datasets and provide
a hint for their validity.
As both sciences pursue the same aims, it is a consistent consequence to combine
methods of information visualization and statistics to achieve a more efficient exploration
of multivariate data, which is also called data mining. Therefore this work
surveys the most important tools provided by both disciplines to analyse high dimensional
data. Furthermore existing applications using techniques of the field of statistics
and of the information visualization are presented.
But the main contribution of this work is to provide statistical methods for visual
data mining applications. Therefore a library was compiled that contains routines,
which are of high importance for information visualization techniques and allow a
fast modification of their results, to integrate possible adaptations in the visualization.
The library is able to work on datasets containing millions of data items and hundreds
of dimensions.
In addition an example application is introduced that demonstrates a possible interweaving
between statistical methods and information visualization techniques. Tasks
like the detection of outliers, the grouping of data items and attributes as well as the
reduction of the dimensionality were incorporated.