| About | Mondrian is a general purpose statistical data-visualization system. It features outstanding visualization techniques for data of almost any kind, and has its particular strength compared to other tools when working with Categorical Data, Geographical Data and LARGE Data. All plots in Mondrian are fully linked, and offer various interactions and queries. Any case selected in a plot in Mondrian is highlighted in all other plots. Currently implemented plots comprise Mosaic Plot, Scatterplots and SPLOM, Maps, Barcharts, Histograms, Missing Value Plot, Parallel Coordinates/Boxplots and Boxplots y by x. Mondrian works with data in standard
tab-delimited
ASCII files.
There is basic support for working directly on data in Databases
(please contact me for further info). For questions please report to mondrian@theusRus.de. Bugs may be submitted to the bug-tracker. News:
|
|||||||||||||
|
|
||||||||||||||
| Plots | ||||||||||||||
| Mosaic Plot | Mosaic plots in Mondrian are fully interactive. Interactivity includes the standard operation as described in the convention section (except zooming), plus the reordering of the variables in the plot by using the 4 arrow keys. Use <meta>-r to rotate the direction along which the last variable in the plot is split. Use <meta>-+ and <meta>-- to add and delete interactions during the modeling process using the ModelNavigator. During the model process one may want to use the plot option to show the expected values of the model and not the observed. The picture below shows an example of the Titanic dataset, which includes information on the class (1,2,3 crew) age (child, adult) and gender (male, female) of the passengers. Surviving passengers are highlight. |
|||||||||||||
![]() |
||||||||||||||
|
Although there are no labels to decode the cells, the order of the variables is given in the title of the window. Using the interactive interrogation it is very easy to query the cells. In this static representation the knowledge of the fact that there are no children and hardly any women in the crew, should be sufficient to decode the plot. |
||||||||||||||
| |
![]() Example of a rotated Mosaic plot, i.e. first variable is split along y not x! |
|||||||||||||
|
Additionally Mondrian features 4 variations
of mosaic plots. The Figures below show the same data from the cars
data set, in all five possible variations. Use the pop-up menu for the
plot options:
|
||||||||||||||
| Barcharts | Barcharts in Mondrian follow a vertical layout, not a horizontal layout. Thus the level-names can be printed in full length. Besides standard selection and interrogation techniques, interactivity in a barchart comprises:
|
|||||||||||||
A barchart for the dataset on the Titanic passengers. First class passenger are highlighted. |
||||||||||||||
| When a barchart has very many categories, it can be pretty painful to search particular items, especially if the barchart is sorted to something else than lexicographical order. As used from lists in many applications, you may now type a prefix of the item you are looking for, and Mondrian will automatically scroll to that item. | ||||||||||||||
| Missing Value Plot |
If the dataset has missing values, a missing value plot can be used to analyze the structure of the missing values (monotone missingness etc.). ![]() The options of the missing value plot are similar to those of a barchart (sorting etc.) Missing values MUST BE CODED AS "NA"!! |
|||||||||||||
| Maps | Whenever a dataset provides information on polygons, Mondrian can draw interactive maps of this geographical reference. A corresponding data record must be provided for each polygon defined in the dataset. Different polygons might point to the same data record, but multiple records to a set of polygons are ignored. Maps offer the standard selection and interrogation techniques. Additionally the standard zooming function of Mondrian is enabled. All maps have a popup-menu at the top to
create
a choropleth map of any of the variables; including
alphanumerical
variables.
|
|||||||||||||
Six choropleth maps of the five Midwest states, shaded according to educational status |
||||||||||||||
The saturation (more precise the alpha) of boundaries can now be changed with the right arrow key (more saturation) and left arror key (less saturation). This can change the perception of the map drastically, so make sure to test it out! US County Map with full saturation: ![]() US County Map with reduced boundaries: ![]() Note the extreme difference of the maps! (We call this technique map-martinizing…) |
||||||||||||||
| Parallel Coordinates |
Mondrian implements parallel coordinate plots for any arbitrary number of variables. Alphanumerical categorical variables are displayed equally spaced according to the currently defined order. Numerical variables are scaled according to their actual numbers. Besides standard selection and interrogation techniques, interactivity in a parallel coordinates comprises:
|
|||||||||||||
A parallel coordinate plot for the olive oil data. |
||||||||||||||
A parallel Box plot for the olive oil data. |
||||||||||||||
The "Sort Axes by" menu offers various
ways to sort axes automatically: ![]() |
||||||||||||||
|
||||||||||||||
| Boxplots | (Parallel) Boxplots y by x only include a single variable, split by a second categorical variable. To invoke a boxplot y by x select the continuous variable to plot and the categorical variable to split by and select 'boxplots y by x' in the 'Plots' menu. Manual reordering of the classes is only
possible
by reordering the levels in a corresponding barchart. The context menu offers to automatically
sort the levels by either median or IQ-range (and to reverse a current
ordering.) |
|||||||||||||
Parallel boxplots for the cars data set - heavy cars selected. |
||||||||||||||
| Scatterplots | Scatterplots offer the basic interactions. Data can be selected and highlighted. In contrast to most other plots in Mondrian, scatterplots offer axis, showing the maximum and minimum as orientation. Interrogation methods inside scatterplots
operate on three levels. The first level is a simple overview of the
position of the cursor, which is displayed as projection onto the x-
and y-axis. This interrogation is invoked by simply pressing the
altenate key. A pressing control invokes the second level of
interrogation.
A tooltip is presented with the data of the variables in the plot
closest to the curser. If more than one point is found at the same
distance, a list of the cases is presented in the pop-up. Holding shift
and control shows the information of the
selected variables in the variables list. |
|||||||||||||
Three levels of
queries in a scatterplot |
||||||||||||||
|
||||||||||||||
When Rserve and R are installed, scatterplots can be enhanced with scatterplot smoothers. Currently the list of smoothers comprises:
There is an option to either compare the highlighted smoother to all data or to the complement. |
||||||||||||||
| SPLOM | SPLOMS (ScatterPLOtMatrix) in Mondrian are "only" a collection of standard scatterplots, efficiently arranged in a single frame. Thus it has the disadvantage that all keyboard shortcuts apply to all panels simultaneously, but the advantage that each panel is a full featured scatterplot. Hint: SPLOMs are quite effective for a quick 2-d overview, but are very inefficient when working with more then just a few variables. In this case, parallel coordinate plots are far more effective. |
|||||||||||||
![]() |
||||||||||||||
| Histograms | The most crucial point in plotting histograms is to choose the ''right'' origin of the first bin and the ''right'' number of bins. Since there exists a vast amount of rules and hints, what ''right'' means under different assumptions, the most important interactive manipulation inside histograms is changing origin and number resp. width of the bins. This is done by pushing the arrow keys (up and down changes the number of bins, and left and right moves the origin). In order to keep the visual distortion as small as possible, the scale of the histogram axis is not updated during the interactive reparametrization. Obviously the y-scale must then represent probabilities and not counts. <meta>-0 fits the y-scale after the bin-width has been changed. The context menu allows to set fixed bin width and origins, either by using the suggested values or by entering arbitrary values. When Rserve and R are installed, density estimation will enhance the histograms greatly. |
|||||||||||||
Histogram of
the weekly working hours of
almost 64.000 household heads
|
||||||||||||||
![]() A linked
Spinogram showing the households
head income
(The right plot shows actually a CD-plot)
|
||||||||||||||
Histograms can now be weighted. Select two continuous variables (the weights usually should be positive, although Mondrian will not complain about negative weights) and choose weighted Histogram from the plot menu. ![]() Above example shows a typical situation for weighting in a histogram. The left plot shows the distribution of %blacks for the US Midwest counties. The right plot is weighted with the total population, thus showing us the number of people living in areas with a certain % of blacks. |
||||||||||||||
|
|
||||||||||||||
| Selections | Selections in Mondrian can be made in two ways.
Simple selections are performed as any selection in the operating system's desktop. A new selection replaces the current selection. Holding down the <shift> key will combine the new selection with the currently selected data in XOR-mode. Holding down <shift> and <alt> will perform a selection in extended mode, which is AND by default, but can be changed to OR in the Options menu. When using Selection Sequences, any selection is recorded. The selection is represented by a transparent rectangle with 8 handles. Use any of these handles to resize the rectangle (slice) or click-drag the rectangle to move (brush). The popup-context menu on a selection rectangle will indicate the selection step and offer the choice of changing to a different selection mode (union, intersection, negation, xor), of deleting this step, or the complete sequence. Deleting a single step can also be performed by <backspace>. Use <meta-backspace> to delete the complete sequence. To query objects covered by a selection rectangle hold down the <shift> key to click trough the rectangle. Selection Sequences can span across plots and more than just one selection can be made per plot. To keep track of the selections made, all selections are annotated in the windows menu, just behind the window title, i.e. "Scatterplot(x,y) [2] [4]" tells us that selection steps 2 and 4 have been made in the scatterplot of the variables x and y. Use <meta>-a to select all cases. |
|||||||||||||
![]() |
||||||||||||||
| A map with two sample selections of a
Selection Sequence. The first selection (south east states) is always
performed in "replace" mode. The second selection (north west states)
is queried with the context menu and the mode is switched from XOR to
OR. NOTE: Deleting all selections is not limited to the current plot window. |
||||||||||||||
|
|
||||||||||||||
| Color Brushing |
Wheras selections are a more transient technique to mark a subgroup of interest, color brushing persistently assigns colors to cases. There are three ways to define persistant colors in Mondrian
![]()
![]()
Although looking very useful at the first sight, one should keep two very critical issues with color-brushing in mind:
|
|||||||||||||
| Conventions | The key to a smooth and efficient user interface are conventions. Once the user learned the basic set of operations like, selections, interrogations, zooming and alteration, she/he can perform these operation within any plot. In an interactive graphical system, possible interactions can be performed by mouse and keyboard. Since JAVA programs are not bound to a specific platform, Mondrian tries to only makes use of features, which can be found on all platforms. There are some restrictions like one-button-mouse for most MAC-users (Steve give us the right button!!). The most commonly found modifier keys are SHIFT, CTRL, ALTERNATE and META. CTRL is blocked as the popup-trigger on the Macintosh, META abused under Windows and ALT blocked by many window-managers under Linux. The interactions in Mondrian are assigned as follows:
|
|||||||||||||
|
|
||||||||||||||
| α-Channel | The α-channel can be used to specify the transparency of an object painted. This is very useful, when plotting really many objects , which would result in heavy overplotting. Thus the density of objects can be easily displayed. The figures below show an application using the well known "pollen" dataset. |
|||||||||||||
![]() |
||||||||||||||
| The darker string in the parallel coordinate plot above is actually the word "EUREKA", which was put into the artificial dataset. Zooming the scatterplot below would show us the 6 letters of the word. | ||||||||||||||
![]() |
||||||||||||||
| There is a new option to invert the density scheme. (Does not work on Windows) |
||||||||||||||
![]() |
||||||||||||||
|
|
||||||||||||||
| Modeling | Although Mondrian was not designed to support statistical modeling of datasets, a graphical modeling technique for categorical data using Mosaic Plots is built in. The so called ModelNavigator allows a stepwise graphical modeling of loglinear models. The ModelNavigator basically inverts the usage of graphs and models. Whereas packages like R or S-Plus usually assume a model, for which diagnostic plots can be plotted, the approach in Mondrian starts with a graph, to set up a model, and uses the statistical measures as diagnostics, to reinforce the graphical implications. For a more precise description of this
technique
see the paper on Visualization
of Loglinear Models. |
|||||||||||||
| The figure below shows the ModelNavigator used to model the Detergent data, often used to illustrate loglinear models for 3 and more variables. | ||||||||||||||
![]() |
||||||||||||||
| Data | ||||||||||||||
| ASCII Data | Mondrian supports the standard ASCII data format, which consist of a header of variable names, and tab-delimited columns. Numerical and alphanumerical data may be used. See example below: |
|||||||||||||
|
|
Country
Car
MPG
Weight
Horsepower U.S. Buick Estate Wagon 16.9 4.36 155 U.S. Ford Country Squire Wgn 15.5 4.054 142 U.S. Chevy Malibu Wagon 19.2 3.605 125 U.S. Chrysler LeBaron Wagon 18.5 3.94 150 U.S. Chevette 30 2.155 68 Japan Toyota Corona 27.5 2.56 95 Japan Datsun 510 27.2 2.3 97 U.S. Dodge Omni 30.9 2.23 75 Germany Audi 5000 20.3 2.83 103 Sweden Volvo 240 GL 17 3.14 125 Sweden Saab 99 GLE 21.6 2.795 115 France Peugeot 694 SL 16.2 3.41 133 ... |
|||||||||||||
| Since Mondrian detects the format of a
column automatically (continuous
or categorical) the detection can be overridden by putting a '/C' for
continuous and a '/D' for discrete as a prefix in front of the variable
name. The mode of non-numerical variables can be set interactively:
To get a fast and effective overview of which variables have missings, there are 'white'-versions of the icons of all three types, i.e. , and
indicating at least one missing in the particular variable.Polygon Data must be stored in a separate map file The format for map-data The dataset must include
one variable of references,
the polygons can refer to. This variable must start with /P. If a dataset refers to a polygon, there must be an
empty line after the data matrix followed by the relative path+filename
to the file containing the map data. In the map file, each
polygon must be defined as follows: An example for Union
county: 1761 /Pnew jersey,union 25 1762 /Pnew jersey,warren 33 |
||||||||||||||
| Database Connections |
The research version of Mondrian allows the connection to databases via the JDBC interface. Currently this type of connection, which leaves the data entirely inside the database, is under further development and thus not released with the latest releases. The figure below shows the database connection dialog: |
|||||||||||||
![]() |
||||||||||||||
| Data Sets | Here are some sample data sets, which are ready to load and test with Mondrian (make sure to save the link directly to preserve the tabs!): Titanic Data set on the 2201 passengers of the Titanic. Pure categorical with data on class, gender, age and survival. Pollen
Olive
Oils Berlin
(old map format) US
Election 2004 (new map format) |
|||||||||||||
| Preferences | Mondrian now features a Preferences box, to set your favorite background and highlight color. Three schemes are preset. If you have some other intriguing color scheme, please let me know to integrate it. |
|||||||||||||
![]() |
||||||||||||||
| Downloads | By downloading any version of Mondrian, you accept the following license: Copyright (c) 1997-1998 AT&T Labs Research, 2002-2006 University of Augsburg. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, see <http://www.gnu.org/licenses/>. Read-only svn-access to the source code: svn://svn.rforge.net/org/trunk/rosuda/Mondrian/ Binaries for Windows, MacOSX and Linux: 1.0 beta11 as of 03/19/2008. Windows (exe-file) UNIX (JAR-file) Mac OS X (Disk-Image containing application and demo data) Changes: - Image can be used in extended queries for URL variables - New color scheme in maps - Search in barcharts by typing a prefix of the level - Fixes and clean-ups 1.0 beta10 as of 12/16/2007. Windows (exe-file) UNIX (JAR-file) Mac OS X (Disk-Image containing application and demo data) Changes: - More consistent menu entries and menu labels for plot windows - A 'Open Recent ...' menu entry - Indication of missingness in the variable window icons - Window sizes can now be set in the scale dialog box - Censored zooming in barcharts (shift up/down-arrow) consistent with mosaic plots 1.0 beta7 as of 05/13/2007. Windows (exe-file) UNIX (JAR-file) Mac OS X (Disk-Image containing application and demo data) Changes: - Rserve start-up compatible with Rserve for R2.5.x - SPLOMs are available now (for those who like'm ...) - histograms are more consistent now (weighted histograms support densities (needs Rserve), spinograms now work at any zoom level) - better scaling and queries in parallel boxplots (still incomplete) - several fixes and enhancements ... Changes in Version 1.0 beta3 as of 10/31/2006. - simple transformations (+, *, -, /, log, 1/x, ...) - selection order of variables in variable window is reflected in all multivariate plots! - many minor fixes and enhancements ... Changes in Version 1.0 beta1 as of 05/24/2006. - new much faster loader (note: maps are now expected to be in a separate file) - missing values (coded as "NA") are suported in all graphics - missing value plot can be used to investigate the structure of the missing values. - custom scaling (<meta>-j), scatterplot only, other plots to follow - color brushing (<meta>-b) in barcharts, mosaic plots and histograms (rainbow) - <meta>-1...9 sets persistent colors for the current selection - derived variables from selection- and color-state - painting, via "OR"-mode in the first selection step of a selection sequence - many minor fixes and enhancements ... Changes in Version RC 1.0m as of 11/29/2005. - Using 1.4.x JVM on all platforms. - '<-' and '->' can be used to change the saturation of boundaries in maps. - "Boxplot y by x" is now a separate menu item. - Levels can now be sorted in boxplots y by x according to median or IQ-range. - Plotting of 2-dim MDS (input is not carefully checked yet) Changes in Version RC 1.0f as of 04/06/2005. - Queries are now implemented via ToolTips. - Further improvements to Parallel Coordinate Plots. See section for details! - Maps now feature six different color schemes for shading choropleth maps. - Under MacOSX you can now drop files on Mondrian to start the application and load the data. - If you have R and Simon's Rserve installed on your machine, you find new features in + Histograms + Scatterplots Changes in Version RC 1.0 as pf 09/24/2004: - Vast improvements to Parallel Coordinate Plots. Se section for details! - Printing works via <meta>-P in all plots. In MacOS X use "Preview" to save as PDF. - Additional sorting options in Barcharts. - Histogram parameters can now be set manually as well. - Choropleth maps can now be inverted and colored by rank. - Yet another update to the L&F of selection sequences. Changes in Version 0.99a as of 03/11/2004: - an updated version of selection sequences. See the section for details. - "window" menu and more intelligent window placement - new controls to set width and origins in histograms - zooming for all platforms (use middle mouse button on all other machines than mac) Changes in Version 0.99 as of
11/18/2003 Changes in Version
0.98 as of 03/22/2003: Changes
in Version 0.97a as of 11/21/2002: Changes in Version 0.97 as of 7/12/2002: First public release 0.96 as of
4/9/2002 |
|||||||||||||
The best reference to cite Mondrian - apart from the website - is the JSS-Paper Here is the BibTeX entry: @article{Theus:2002:JSSOBK:v07i11, author = "Martin Theus", title = "Interactive Data Visualization using Mondrian", journal = "Journal of Statistical Software", volume = "7", number = "11", pages = "1--9", day = "22", month = "11", year = "2002", CODEN = "JSSOBK", ISSN = "1548-7660", bibdate = "2002-11-22", URL = "http://www.jstatsoft.org/v07/i11", accepted = "2002-11-22", acknowledgement = "", keywords = "", submitted = "2002-07-11", } Once 'the book' is out, it will be the canonical reference to cite. |
||||||||||||||
Starting Rserve Getting the tiny warning message after starting Mondrian does only indicate that there is no connection to R. This will NOT harm Mondrian in its core functionality - one can happily live without the R connection. Rserve is now a regular R package and can be installed as such. Right now, Mondrian will start Rserve automatically only under MacOSX. For the time being, Windows and Linux users need to start Rserve manually. Detailled instructions on how to do this can be found here. File Formats If you have trouble getting data loaded into Mondrian, load the file into MS Excel first, and check whether
Starting a JAR File under Windows (if one can't use "Mondrian.exe" for some reason) After an installation of SUNs latest JRE (or JDK), .jar-files can be started by a simple double click. If this does not work, the following two problems might be the cause
Other Problems Please mail to either mondrian@theusRus.de or the mailing list stats-rosuda-devel, or submit your issue at the Bugzilla based bug-tracker. Issues of general interest will be posted on this page. |
||||||||||||||
| Martin Theus, 4/24/2008 | ||||||||||||||