micromap : A Package for Linked Micromaps

The R package micromap is used to create linked micromaps, which display statistical summaries associated with areal units, or polygons. Linked micromaps provide a means to simultaneously summarize and display both statistical and geographic distributions by linking statistical summaries to a series of small maps. The package contains functions dependent on the ggplot2 package to produce a row-oriented graph composed of diﬀerent panels, or columns, of information. These panels at a minimum typically contain maps, a legend, and statistical summaries, with the color-coded legend linking the maps and statistical summaries. We ﬁrst describe the layout of linked micromaps and then the structure required for both the spatial and statistical datasets. The function create_map_table in the micromap package converts the input of an sp SpatialPolygonsDataFrame into a data frame that can be linked with the statistical dataset. Highly detailed polygons are not appropriate for display in linked micromaps so we describe how polygon boundaries can be simpliﬁed, decreasing the time required to draw the graphs, while retaining adequate detail for detection of spatial patterns. Our worked examples of linked micromaps use public health data as well as environmental data collected from spatially balanced probabilistic surveys.


Introduction
A series of recent articles discuss the merits, limits, and opportunities for collaboration in the fields of statistical graphics and information visualization, or InfoVis (Few January 20, 2011;Gelman and Unwin 2013a,b;Kosara 2013;Murrell 2013;Wickham 2013). InfoVis is part of computer science that explores, analyzes, and presents large amounts of data (Kosara 2013;Wickham 2013). A type of graphic that uses strengths from both fields is a linked micromap. In "Visualizing Data Patterns with Micromaps", Carr and Pickle (2010) first present principles of data visualization design, such as the use of small multiples (Tufte 1983), and then apply them to georeferenced statistical summaries. Such summaries occur when environmental data are collected and aggregated over areal units, or polygons. Examples include crop statistics per state, public health data such as number of cancer cases per county, or monitoring data collected among reporting regions from a spatially balanced probabilistic survey (Olsen, Kincaid, and Payton 2012). For these data, displaying both the geographic and statistical distributions is of interest, and linked micromap plots, often just called micromaps, are a way to visualize such geographically referenced data. (Carr and Pierson 1996;Carr, Olsen, Courbois, Pierson, and Carr 1998;Carr, Wallin, and Carr 2000;Symanzik and Carr 2008;Carr and Pickle 2010).
Micromaps are row-oriented graphs with different panels, or columns, of information, which typically contain at a minimum maps, a color-coded legend, and statistical summaries (Figure 1) and are a more informative alternative to choropleth maps (Carr and Pierson 1996;Symanzik and Carr 2008). Unlike a choropleth map in which color is associated with the magnitude of a variable, the color-coded legend in a micromap provides the link between the maps and statistical summaries. Two features of a micromap are the sorting and grouping in the display of the data. The sorting is done on the values of a statistical variable, and the data are grouped so only few, typically four to six, areal units and their statistical summaries are presented across a row as a perceptual group (Carr and Pickle 2010). This use of perceptual groups allows the viewer to discern cumulative geographical patterns. For example, in the poverty and education micromap ( Figure 1) the accumulation of gray polygons in the map panel reveals that, except for Virginia, all of the southeastern states have poverty levels greater than the median poverty level. Another significant advantage of this type of visualization over choropleth maps is that multiple variables can be simultaneously displayed.
The means with which users may readily produce micromaps has been varied. Some United States agencies have web sites or downloadable software to make micromaps, but analysts cannot access the software itself (http://statecancerprofiles.cancer.gov/micromaps/, http://gis.cancer.gov/tools/micromaps/, http://nass.usda.gov/Charts_and_Maps/). Analysts can use geographic information system (GIS) software to make the map component of micromaps, and statistical software to produce the statistical summaries, and then bring those two components together to produce a micromap. Symanzik and Carr (2008) noted that most major statistical packages could produce micromaps, the requirements being: availability of polygons, or boundary files, ease of production, quality of appearance, and control of output format. In R (R Core Team 2014), the grid package (Murrell 2005) has been used to produce micromaps of large financial datasets (Blunt 2006). A new package, micromapST, allows users to create linked micromaps specfically for US states and the District of Columbia (Carr, Pearson, and Pickle 2013).
Here we describe the micromap package (Payton and Olsen 2014), created for R and available on CRAN, which has been created to enable users to create linked micromaps for any areal spatial dataset. Similar to other packages such as GGally (Schloerke, Crowley, Cook, Hofmann, Wickham, Briatte, and Marbach 2013), the micromap package was designed to take advantage of the graphical functionality of the ggplot2 package. The ggplot2 package allows  Figure 1: A micromap of poverty and education level in the United States illustrating the components of a micromap, which include panels, or columns, a color legend, labels for the polygons, statistical summaries of the polygons, and maps that share a color code across the panels giving a spatial context to the summary statistics. The data are sorted by poverty level and displayed by perceptual groups of five states per group. The data are from the 2006-2010 American Community Survey and were obtained from: http://statecancerprofiles.cancer.gov/micromaps/.
for a predictable alignment of panels and perceptual groups, leaving the user free to concern themselves purely with the finer aesthetic details of the plot. A challenge to working with statistical and spatial data noted by Wickham (2009) is the need to match the identifiers in the statistical data to the corresponding identifiers in the spatial data. The general idea behind the micromap package is to make this as intuitive as possible while allowing the user to make aesthetic changes to the entire plot (perceptual group number and size, color coding, header sizes, etc.) rather than panel by panel. This should create a seamless feel among the various panels being produced and assemble them effortlessly into a single display.
Our objective with the micromap package is to simplify the making of linked micromaps so this type of visualization can readily be used by analysts to represent georeferenced statistics (Carr and Pickle 2010). We meet that objective by describing four steps to create micromaps. First, we identify what geoprocessing needs to be done to the spatial data prior to making a micromap. Second, we describe how in the micromap package the spatial polygon data frame is restructured so the spatial data can be linked to the statistical data. Third, we use the mmplot function to call the two data frames, spatial and statistical, to create the different panels and rows of perceptual groups for a draft micromap. Producing this draft micromap allows the user to check that the data frames are structured correctly and to decide if coding for a more refined, publication quality micromap is warranted. Finally, we provide examples of three types of micromaps, and also describe how users can create their own types. Section 2 of this paper describes the preparation of the spatial data. Section 3 covers the main functions in the micromap package. Section 4 gives examples of the three micromap plot types, including the new type called a group categorized micromap. Finally, Section 5 describes challenges and future developments to making micromaps.

Map simplification
The polygons displayed in the maps of micromaps only need to be detailed enough to convey shape and relative position of the areas and provide a spatial framework by identifying neighboring polygons (Carr and Pickle 2010). In fact, overly detailed polygons will create difficulty in display and interpretation of linked micromap plots. The early applications of micromaps linked statistical estimates to administrative polygons such as countries and states (Carr and Pierson 1996;Carr, Olsen, Courbois, Pierson, and Carr 1998). An example of simplified administrative polygons suitable for use in micromaps is the United States State Visibility Base Map developed by Monmonier (1993) available in the maps package (Becker, Wilks, Brownrigg, and Minka 2013). These generalized polygons, or map caricatures, are faster to draw for micromaps (Carr and Pierson 1996;Carr, Wallin, and Carr 2000).
For users working with other areal units such as watershed polygons or ecoregions, those data are often overly detailed and will render the map difficult to read (Gebreab, Gillies, Munger, and Symanzik 2008). Such data will also slow the mmplot function in drawing the map panel. Several approaches can be used to generalize, or smooth, the polygons so they contain fewer vertices. One approach relies on working with GIS software, such as ArcMap (ESRI 2013), that has simplification functions. The resulting simplified shapefile can be read into R using the readOGR function from the rgdal (Bivand, Keitt, and Rowl-ingson 2013) package that is loaded with the micromap package or one of the functions in the other R spatial libraries that read in shapefile data. Another approach is to use some of the topology-preserving simplification tools available in Javascript such as mapshaper (Bloch 2014). Also see descriptions at http://www.jasondavies.com/maps/simplify/ and http://bost.ocks.org/mike/simplify/. Finally, one may simplify polygons within R using polygon simplification functions such as the thinnedSpatialPoly function in maptools (Bivand and Lewin-Koh 2013), dp in the shapefiles package (Stabler 2013), generalize.polys in GISTools (Brundson and Chen 2012) or the gsimplify function in rgeos (Bivand and Rundel 2013). The simplification functions in both ArcGIS and R use the Douglas-Peuker algorithm (Douglas and Peucker 1973) for point simplification and require a weeding tolerance for simplification of polygons. The point simplification algorithm is fast and simplifies lines by keeping critical points to depict shapes and removing other points. Additionally in ArcGIS a bend simplify algorithm is available (Wang and Muller 1998) that applies shape recognition techniques to simplify lines and tends to produce more cartographically pleasing results than the point remove algorithm. Simplifying polygons without preserving topology can result in visual artifacts in the maps such as overlapping polygons, slivers and gaps. Due to the diversity of simplification functions in ArcGIS, we have found that to be the most efficient way to simplify polygons.

Simplification approaches
Both the GIS software and R approaches are illustrated in the micromap package user guide.
In the GIS software approach and using the thinnedSpatialPoly function in R a user can specify a minimum area of polygons to retain in simplifying, and care is needed to avoid mistakenly deleting any polygons that should appear on the micromap.
Another geoprocessing step that may be necessary is to manipulate the layout and size of the polygons that are to be brought into the micromap package. For the map caricature, small polygons may need to be enlarged while retaining sufficient features so they are recognizable (Carr and Pierson 1996). For example, in Figure 1 notice how Alaska and Hawaii are offset to be below the southwest border of the United States, and that Washington D.C. is enlarged and set just off the east coast. These steps may require some degree of editing within a GIS application or R, and functions for offsetting and enlargement of polygons may be included in future version of the micromap package.

Functions in the micromap package
The user accessible functions contained in the micromap package can be classified into two categories: two main plot creation functions and a small number of utility functions for more meticulous control over that process. The plot creation functions, mmplot and mmgroupedplot, allow the user to create the most commonly created linked micromaps with all aesthetic preferences being specified using a single list of arguments provided to a single function call. The utility functions within the micromap package allow for ad hoc adjustments and fine tuning.
The micromap package is aimed at facilitating the consolidation of statistical data, in the form of a data frame, with GIS spatial objects. To simplify this process, we have included the create_map_table function which transforms a spatial object into a flat data frame object.
The create_map_table function is similar to the ggplot2 package's fortify function. The create_map_table function creates a data frame object which is specifically designed for use with the mmplot and mmgroupedplot functions. This entails organizing collections of polygons which include "holes", "islands", and what we refer to as "plugs" -that is, islands that overlay the "hole" from another collection of polygons. Plotting each of these polygons can require a significant amount of memory; therefore we have attempted to eliminate the redundancy inherent in plotting multiple polygons where only one will be visible. These features of create_map_table offer significant advantages in comparison to the fortify function. The output from the create_map_table is used with the mmplot and mmgroupedplot functions to draw the polygons for the map panel.
There is a considerable amount of variation in the level of complexity a user may prefer in presenting their statistical data graphically. With this in mind, the micromap package has been designed to facilitate as many needs as possible. For those wishing to make simple use of the most common graphs associated with spatial data (box plots, dot plots, and bar plots), these graph types have been included along with various easily modifiable customizations. However, in consideration of users who may desire more unique or specialized graphs for their data, a series of specialized functions are included to allow a user to create new and unique graph types. The user guide that accompanies the micromap package walks through the process in great detail. Users are encouraged to submit any unique graph additions to the creators for inclusion in further updates of the micromap package.
The print method associated with mmplot objects is designed to allow users to make posthoc changes and reformat output files of the plots they've created. The ggplot2 package is designed to allow a user to create objects which may then be adjusted (altering color schemes, axis labels, etc.) without having to rerun the code used to create the plot in its entirety. The mmplot and mmgroupedplot functions return these ggplot2 objects as a list within a custom mm class object. Each ggplot2 object in this list may then be adjusted which allows the user to then reassemble the linked micromap easily. This function also allows a user to take a previously designed micromap and print directly to a file without needing to recreate the plot in R.

Examples
The following examples encompass the three basic types of plots available in the micromap package, which include a box plot, dot plot, and bar plot. The dot and bar plots can also be displayed with standard errors or confidence limits. Displaying such measures of variation is an advantage micromaps have over choropleth maps (Symanzik and Carr 2008). The first example is from the US EPA National Lakes Assessment (NLA) in which during the summer of 2007 the US Environmental Protection Agency, states, tribes, and other partners conducted a nationwide survey of the condition of lakes in the conterminous US (US Environmental Protection Agency 2009). The NLA used a spatially balanced probabilistic survey of 1028 lakes in the lower 48 states and as part of that design lakes were stratified by nine National Aquatic Resources Survey (NARS) reporting regions. A variable measured at each lake was pH, with lower values indicating greater acidity and a pH of 7 being neutral. We used the micromap package to produce a micromap showing two panels of statistical summaries that are linked to the nine reporting regions (Figure 2).
The polygons of the reporting regions were simplified as in Section 2, and a map In the statistical data NLA_pH.ds5, ecoregions is the variable that corresponds to the ID variable (Table 1).
The lines of code below produce Figure 2. The first few lines of code (through the specification of the map.link) can be used to create a draft micromap which a user may use to explore data and evaluate the usefullness of a more detailed micromap display. In the micromap code, the sequential order of the panel types corresponds directly to the sequential order of the list of panel data. That is, "dot_legend" panel type does not need input data and so is associated with an NA in the data list, "labels" is associated with "Ecoregions", "box_summary" is associated with the five number summary of pH from the statistics data frame, etc. The map also has an NA designation in the panel data list, since the source of that data is specified in the map.data argument. The map.link connects the reporting regions polygon labels in stat.data to their corresponding ID polygons in map.data.
The first statistical panel displays the five-number summary of pH values from the sampled lakes as a box plot for each of the nine ecoregions. The second statistical panel displays the population estimates of the median pH and confidence limits, with these estimates and interval bounds calculated using the spsurvey package (Stevens and Olsen 2004;Kincaid and Olsen 2012). The micromap is sorted from the lowest to highest estimated median pH. The nine NARS reporting regions are split into perceptual groups of three regions each. That grouping, along with the linking through color of labels, statistical summaries, and ecoregion polygons assists the viewer in noting that estimated median pH values of ∼ 7.5 or less occur in the eastern portion of the United States. We can also see from the box plot that increases in medians of the sampled pH values are associated with smaller interquartile ranges. Although this micromap displays just a small number of polygons, it allows viewers to discern both statistical and spatial patterns.
Statistical and spatial data that are comprised of a large number of polygons, such as states that have many counties, can be a challenge to display in a typical micromap layout of maps, region labels, and statistical panels (Carr 2001). For our second example, we examine public health data for the 255 counties of Texas, specifically incidence of leukemia and emissions data from roads and airports at the county level (Senkayi, Sattler, Rowe, and Chen 2014). Displaying all the counties for the state of Texas in a standard micromap layout would extend far beyond the vertical margins of a page layout. Creating such a micromap requires subsetting the data (done through row indexing the ordered data) in order to plot the data in groups over a series of micromap panels, as well as modifying micromap parameters for text and element sizes to work with the multi-panel layout (Figure 3). The ability to produce multi-panel micromaps extends the utility of micromaps for display of data composed of many regions.
Whereas the previous examples had a one-to-one relationship between a polygon and a statistical estimate, this last example, which uses the NLA data, has a one polygon to many statistical estimates relationship in which each reporting region is crossed with a categorical group that identifies a disturbance condition. The structure of the dataset is shown in Table 2.
To visualize this data requires a new style of micromap. This "group categorized" micromap we believe was first used in the Wadeable Streams Assessment (US Environmental Protection Agency 2006) and provides an alternative to pie charts or stacked bar charts. The mmgroupedplot function has been included in the micromap package to allow a user to create this type of map. Ten maps are displayed in the micromap in Figure 4, with the map at the top presenting the national summary statistics for the percent estimates, with confidence limits, of lakes that occurred in the categories of least disturbed, intermediate disturbance, and most disturbed based on their values in total phosphorus, total nitrogen, and turbidity.
The remaining maps show a specific reporting region and the estimates among the three categories. This comparison of a national summary to a region summary lets the viewer note that most of the lakes in the Northern Plains are in the most disturbed category based on their total phosphorus and total nitrogen values.
Besides the micromaps described here, users can create their own micromap by creating a new panel type. The micromap package user guide provides an example detailing the creation of an arrow linked micromap panel type which may be used to illustrate a temporal change in statistical estimates.

Challenges and future developments
A future challenge with the micromap package is to expand its capabilities to include plot types such as histograms, time series data, and possibly symbol plots. Shortly after micromaps were introduced, Carr (2001) noted the value of having micromaps in an interactive mapping environment. While that intent goes beyond the scope of the micromap package, some efforts have been made to create dynamic micromaps (Wang, Chen, Carr, Bell, and Pickle 2002;Hurst, Symanzik, and Gunter 2003  for exploratory spatial data analysis mentioned that a new tool for that package might include a micromap display (Laurent, Ruiz-Gazen, and Thomas-Agnan 2012).
We believe publication quality micromaps can be made using the micromap package based on the four steps we have outlined. First, working in GIS software or R, users need to simplify polygons for display in a micromap. Fortunately, once such map caricatures are made they can be reused (Carr, Wallin, and Carr 2000). Second, in the micromap package, the spatial polygon data frame is converted to a map table that includes an identifying variable to link the map table to the statistical data frame. Third, only approximately ten lines of code need to be used with the mmplot function to render a draft micromap. Finally, users have detailed control over the layout, quality of appearance, and output format of the final micromap, and they can create their own types of micromaps. The four steps make for convenient production of micromaps, and we hope that visualizing georeferenced statistics in this manner will become more frequently used (Peterson 2011;Robbins 2012). The versatility of the micromap package is that analysts can use it on a range of areal units such as watersheds and ecoregions, as well as administrative units from other countries. This flexibility may require some modification of both the geospatial data and the micromap plot aesthetics in order to ultimately produce a publication quality figure.