Journal of Statistical Software http://www.jstatsoft.org/rss Fri, 16 May 2008 12:21:51 GMT Fri, 16 May 2008 12:21:51 GMT Most recent publications from the Journal of Statistical Software lattice: Multivariate Data Visualization with R http://www.jstatsoft.org/v25/b02/paper Vol. 25, Book Review 2, May 2008

lattice: Multivariate Data Visualization with R
Deepayan Sarkar
Springer-Verlag, 2008
ISBN: 978-0-387-75968-5

]]>
Thu, 08 May 2008 07:00:00 GMT /v25/b02 Gabor Grothendieck
A statnet Tutorial http://www.jstatsoft.org/v24/i09/paper Vol. 24, Issue 9, May 2008

Abstract:

The statnet suite of R packages contains a wide range of functionality for the statistical analysis of social networks, including the implementation of exponential-family random graph (ERG) models. In this paper we illustrate some of the functionality of statnet through a tutorial analysis of a friendship network of 1,461 adolescents.

]]>
Thu, 08 May 2008 07:00:00 GMT /v24/i09 Martina Morris, David R. Hunter, Mark S. Handcock, Steven M. Goodreau, Carter T. Butts
networksis: A Package to Simulate Bipartite Graphs with Fixed Marginals Through Sequential Importance Sampling http://www.jstatsoft.org/v24/i08/paper Vol. 24, Issue 8, May 2008

Abstract:

The ability to simulate graphs with given properties is important for the analysis of social networks. Sequential importance sampling has been shown to be particularly effective in estimating the number of graphs adhering to fixed marginals and in estimating the null distribution of graph statistics. This paper describes the networksis package for R and how its simulate and simulate_sis functions can be used to address both of these tasks as well as generate initial graphs for Markov chain Monte Carlo simulations

]]>
Thu, 08 May 2008 07:00:00 GMT /v24/i08 Mark S. Handcock, Ryan Admiraal
Prototype Packages for Managing and Animating Longitudinal Network Data: dynamicnetwork and rSoNIA http://www.jstatsoft.org/v24/i07/paper Vol. 24, Issue 7, May 2008

Abstract:

Work with longitudinal network survey data and the dynamic network outputs of the statnet ERGMs has demonstrated the need for consistent frameworks and data structures for expressing, storing, and manipulating information about networks that change in time. Motivated by our requirements for exchanging data among researchers and various analysis and visualization processes, we have created an R package dynamicnetwork that builds upon previous work in the network, statnet and sna packages and provides a limited functional implementation. This paper discusses design issues and considerations, describes classes and forms of dynamic data, and works through several examples to demonstrate the utility of the package. The functionality of the rSoNIA package that uses dynamicnetwork to exchange data with the Social Network Image Animator (SoNIA) software to create animated movies of changing networks from within R is also demonstrated.

]]>
Thu, 08 May 2008 07:00:00 GMT /v24/i07 Skye Bender-deMol, Martina Morris, James Moody
Social Network Analysis with sna http://www.jstatsoft.org/v24/i06/paper Vol. 24, Issue 6, May 2008

Abstract:

Modern social network analysis---the analysis of relational data arising from social systems---is a computationally intensive area of research. Here, we provide an overview of a software package which provides support for a range of network analytic functionality within the R statistical computing environment. General categories of currently supported functionality are described, and brief examples of package syntax and usage are shown.

]]>
Thu, 08 May 2008 07:00:00 GMT /v24/i06 Carter T. Butts
Fitting Latent Cluster Models for Networks with latentnet http://www.jstatsoft.org/v24/i05/paper Vol. 24, Issue 5, May 2008

Abstract:

latentnet is a package to fit and evaluate statistical latent position and cluster models for networks. Hoff, Raftery, and Handcock (2002) suggested an approach to modeling networks based on positing the existence of an latent space of characteristics of the actors. Relationships form as a function of distances between these characteristics as well as functions of observed dyadic level covariates. In latentnet social distances are represented in a Euclidean space. It also includes a variant of the extension of the latent position model to allow for clustering of the positions developed in Handcock, Raftery, and Tantrum (2007).

The package implements Bayesian inference for the models based on an Markov chain Monte Carlo algorithm. It can also compute maximum likelihood estimates for the latent position model and a two-stage maximum likelihood method for the latent position cluster model. For latent position cluster models, the package provides a Bayesian way of assessing how many groups there are, and thus whether or not there is any clustering (since if the preferred number of groups is 1, there is little evidence for clustering). It also estimates which cluster each actor belongs to. These estimates are probabilistic, and provide the probability of each actor belonging to each cluster. It computes four types of point estimates for the coefficients and positions: maximum likelihood estimate, posterior mean, posterior mode and the estimator which minimizes Kullback-Leibler divergence from the posterior. You can assess the goodness-of-fit of the model via posterior predictive checks. It has a function to simulate networks from a latent position or latent position cluster model.

]]>
Wed, 07 May 2008 07:00:00 GMT /v24/i05 Pavel N. Krivitsky, Mark S. Handcock
Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects http://www.jstatsoft.org/v24/i04/paper Vol. 24, Issue 4, May 2008

Abstract:

Exponential-family random graph models (ERGMs) represent the processes that govern the formation of links in networks through the terms selected by the user. The terms specify network statistics that are sufficient to represent the probability distribution over the space of networks of that size. Many classes of statistics can be used. In this article we describe the classes of statistics that are currently available in the ergm package. We also describe means for controlling the Markov chain Monte Carlo (MCMC) algorithm that the package uses for estimation. These controls affect either the proposal distribution on the sample space used by the underlying Metropolis-Hastings algorithm or the constraints on the sample space itself. Finally, we describe various other arguments to core functions of the ergm package.

]]>
Wed, 07 May 2008 07:00:00 GMT /v24/i04 Martina Morris, Mark S. Handcock, David R. Hunter
ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks http://www.jstatsoft.org/v24/i03/paper Vol. 24, Issue 3, May 2008

Abstract:

We describe some of the capabilities of the ergm package and the statistical theory underlying it. This package contains tools for accomplishing three important, and inter-related, tasks involving exponential-family random graph models (ERGMs): estimation, simulation, and goodness of fit. More precisely, ergm has the capability of approximating a maximum likelihood estimator for an ERGM given a network data set; simulating new network data sets from a fitted ERGM using Markov chain Monte Carlo; and assessing how well a fitted ERGM does at capturing characteristics of a particular network data set.

]]>
Mon, 05 May 2008 07:00:00 GMT /v24/i03 David R. Hunter, Martina Morris, Steven M. Goodreau, Carter T. Butts, Mark S. Handcock
network: A Package for Managing Relational Data in R http://www.jstatsoft.org/v24/i02/paper Vol. 24, Issue 2, May 2008

Abstract:

Effective memory structures for relational data within R must be capable of representing a wide range of data while keeping overhead to a minimum. The network package provides an class which may be used for encoding complex relational structures composed a vertex set together with any combination of undirected/directed, valued/unvalued, dyadic/hyper, and single/multiple edges; storage requirements are on the order of the number of edges involved. Some simple constructor, interface, and visualization functions are provided, as well as a set of operators to facilitate employment by end users. The package also supports a C-language API, which allows developers to work directly with network objects within backend code.

]]>
Mon, 05 May 2008 07:00:00 GMT /v24/i02 Carter T. Butts
statnet: Software Tools for the Representation, Visualization, Analysis and Simulation of Network Data http://www.jstatsoft.org/v24/i01/paper Vol. 24, Issue 1, May 2008

Abstract:

statnet is a suite of software packages for statistical network analysis. The packages implement recent advances in network modeling based on exponential-family random graph models (ERGM). The components of the package provide a comprehensive framework for ERGM-based network modeling, including tools for model estimation, model evaluation, model-based network simulation, and network visualization. This broad functionality is powered by a central Markov chain Monte Carlo (MCMC) algorithm. The coding is optimized for speed and robustness.

]]>
Mon, 05 May 2008 07:00:00 GMT /v24/i01 Mark S. Handcock, David R. Hunter, Martina Morris, Carter T. Butts, Steven M. Goodreau
Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage http://www.jstatsoft.org/v25/b01/paper Vol. 25, Book Review 1, May 2008

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage
Zdravko Markov and Daniel T. Larose
John Wiley & Sons, 2007
ISBN: 978-0-471-66655-4

]]>
Thu, 01 May 2008 07:00:00 GMT /v25/b01 Patrick Mair
GEEQBOX: A MATLAB Toolbox for Generalized Estimating Equations and Quasi-Least Squares http://www.jstatsoft.org/v25/i14/paper Vol. 25, Issue 14, May 2008

Abstract:

The GEEQBOX toolbox analyzes correlated data via the method of generalized estimating equations (GEE) and quasi-least squares (QLS), an approach based on GEE that overcomes some limitations of GEE that have been noted in the literature. GEEQBOX is currently able to handle correlated data that follows a normal, Bernoulli or Poisson distribution, and that is assumed to have an AR(1), Markov, tri-diagonal, equicorrelated, unstructured or working independence correlation structure. This toolbox is for use with MATLAB.

]]>
Thu, 01 May 2008 07:00:00 GMT /v25/i14 Sarah J. Ratcliffe, Justine Shults
SAS Macros for Analysis of Unreplicated 2<sup>k</sup> and 2<sup>k-p</sup> Designs with a Possible Outlier http://www.jstatsoft.org/v25/i13/paper Vol. 25, Issue 13, May 2008

Abstract:

Many techniques have been proposed for judging the significance of effects in unreplicated 2<sup>k</sup> and 2<sup>k-p</sup> designs. However, relatively few methods have been proposed for analyzing unreplicated designs with possible outliers. Outliers can be a major impediment to valid interpretation of data from unreplicated designs. This paper presents SAS macros which automate a manual method for detecting an outlier and performing an analysis of data from an unreplicated 2<sup>k</sup> or 2<sup>k-p</sup> design when an outlier is present. This method was originally suggested by Cuthbert Daniel and is based on the normal or half normal plot of effects. This automated version was shown in simulation studies to perform better than other procedures proposed to do the same thing.

]]>
Thu, 01 May 2008 07:00:00 GMT /v25/i13 John Lawson
GillespieSSA: Implementing the Gillespie Stochastic Simulation Algorithm in R http://www.jstatsoft.org/v25/i12/paper Vol. 25, Issue 12, Apr 2008

Abstract:

The deterministic dynamics of populations in continuous time are traditionally described using coupled, first-order ordinary differential equations. While this approach is accurate for large systems, it is often inadequate for small systems where key species may be present in small numbers or where key reactions occur at a low rate. The Gillespie stochastic simulation algorithm (SSA) is a procedure for generating time-evolution trajectories of finite populations in continuous time and has become the standard algorithm for these types of stochastic models. This article presents a simple-to-use and flexible framework for implementing the SSA using the high-level statistical computing language R and the package GillespieSSA. Using three ecological models as examples (logistic growth, Rosenzweig-MacArthur predator-prey model, and Kermack-McKendrick SIRS metapopulation model), this paper shows how a deterministic model can be formulated as a finite-population stochastic model within the framework of SSA theory and how it can be implemented in R. Simulations of the stochastic models are performed using four different SSA Monte Carlo methods: one exact method (Gillespie's direct method); and three approximate methods (explicit, binomial, and optimized tau-leap methods). Comparison of simulation results confirms that while the time-evolution trajectories obtained from the different SSA methods are indistinguishable, the approximate methods are up to four orders of magnitude faster than the exact methods.

]]>
Wed, 30 Apr 2008 07:00:00 GMT /v25/i12 Mario Pineda-Krch
Invariant and Metric Free Proximities for Data Matching: An R Package http://www.jstatsoft.org/v25/i11/paper Vol. 25, Issue 11, Apr 2008

Abstract:

Data matching is a typical statistical problem in non experimental and/or observational studies or, more generally, in cross-sectional studies in which one or more data sets are to be compared. Several methods are available in the literature, most of which based on a particular metric or on statistical models, either parametric or nonparametric. In this paper we present two methods to calculate a proximity which have the property of being invariant under monotonic transformations. These methods require at most the notion of ordering. An open-source software in the form of a R package is also presented.

]]>
Wed, 30 Apr 2008 07:00:00 GMT /v25/i11 Stefano Iacus, Giuseppe Porro