Constructing and Modifying Sequence Statistics for relevent Using informR in R

The informR package greatly simpliﬁes the analysis of complex event histories in R by providing user friendly tools to build suﬃcient statistics for the relevent package. Historically, building suﬃcient statistics to model event sequences (of the form a (cid:47) (cid:47) b ) using the egocentric generalization of Butts’ (2008) relational event framework for modeling social action has been cumbersome. The informR package simpliﬁes the construction of the complex list of arrays needed by the rem() model ﬁtting for a variety of cases involving egocentric event data, multiple event types, and/or support constraints. This paper in-troduces these tools using examples from real data extracted from the American Time Use Survey.


Introduction
The relational event framework (Butts 2008) is an approach to modeling complex interaction sequences (e.g., among social actors, or between actors and their environments) as discrete events in continuous time. The relevent package (Butts 2015) for R (R Core Team 2014) provides one implementation of this framework, with specialized functionality for easily fitting dyadic relational event models, as well as a general purpose function, rem(), that supports a broader class of models (including models with multiple event types, endogenous support constraints, etc.). Unfortunately, rem() requires the use of user-generated inputs that can be quite difficult to construct, historically limiting its usefulness. The informR package (Marcum 2015) for R contains a set of functions that assist in the creation and manipulation of sufficient statistics to be modeled using the rem() function, thereby broadening the class of models that can be easily fit using relevent. In the simplest case, and given some data, the package creates the two objects required by the rem() regression function for ego-centric relational event model fitting: eventlist and statslist. These two objects can be complex and non-intuitive to create without specialized code; informR greatly simplifies the procedure. The package allows users to specify sufficient statistics for general families of event sequences (or patterns, called s-forms as introduced below) to include in statslists. Additionally, tools for adding and dropping covariates from statslists are included in the package. This paper illustrates how to use the informR package using included data from a subset of the US Department of Labor Statistics' American Time Use Survey. Package informR is available from the Comprehensive R Archive Network (CRAN) at http://CRAN.R-project.org/package=informR.
While informR is intended to interface with relevent, we note that various other packages for R support event modeling through survival analysis (often called event history analysis in the social sciences, or failure/reliability analysis in engineering fields; Blossfeld and Rohwer 2002). Mills (2011) provides a good introduction to those packages and methods involving the estimation of event hazards for, e.g., the timing of death or the likelihood of marrying by a particular age. Additionally, in the CRAN Task View on "Survival Analysis" (Allignol and Latouche 2014) a list is maintained that describes packages for the analysis of time to event data. Packages such as survival (Therneau and Grambsch 2000;Therneau 2014) and muhaz (Gentleman 2010) can be used to fit closely related models, although they do not currently support the full range of models described e.g., by Butts (2008). Users of these packages may be able to employ informR to help generate inputs for their associated functions as well, although we do not explore these potential applications in this paper.

The relational event modeling framework
Before discussing the use of the informR package, it is useful to briefly review the relational event modeling framework. A relational event as discussed by Butts (2008) is a discrete event (taken to have zero duration in continuous time) in which some actor (e.g., person, organization, or other entity) directs some action towards some target (e.g., another person, a location or object, or even the actor him/her/itself). 1 Events may be of distinct types, but are here treated as otherwise unvalued (though see e.g., Brandes, Lerner, and Snijders (2009) for valued extensions); the risk set of events at any given time point (a subset of the Cartesian product of the set of possible senders, possible receivers, and possible event types) may be fixed, or may itself evolve as a function of the past event history. These features make the relational event framework useful for modeling complex social dynamics (e.g., conversation, radio or email communication, or adversarial interactions) or behavioral sequences involving complex interactions between an individual and his or her environment (so-called "egocentric" relational events).
Most relational event modeling to date has focused on the case in which event hazards can be taken to be piecewise constant functions of sender, receiver, or higher-order covariates, and the past event history. The inputs to these functions are taken to represent factors that enhance or inhibit the propensity for actors to take particular actions, as determined by their associated parameters. A typical development (per Butts 2008, pp. 159-166) is as follows. Let S be the set of all potential senders, R the set of all potential receivers, and C the set of all potential event types. A relational event a is defined as a tuple (s, r, c, t) ∈ S ×R×C ×R, with the last element representing the time at which the event occurs. For notational convenience, we also define s(a) ∈ S, r(a) ∈ R, c(a) ∈ C, and τ (a) ∈ R to be the event's sender, receiver, type, and time (respectively). Let A t = (a 1 , . . . , a M ) be the set of all events to have occurred by time t (i.e., the event history at t); the set of possible (i.e., non-zero hazard) events at t is then designated by A(A t ) ⊆ S × R × C × {t}. The hazard of an arbitrary event, a, at time t is then parameterized as λ(a, θ, X t ) = exp θ u (s(a), r(a), c(a), A t , X t ) a ∈ A(A t ) 0 a ∈ A(A t ) (1) where u is a p-vector of statistics, θ ∈ R p , and X t is a set of covariates. In the piecewise constant model, it is assumed that hazards change only when events occur (hence λ does not depend directly on t), and hence X t may likewise change only at realized events. The impact of this latter restriction may be easily relaxed by the introduction of exogenous events (not discussed in Butts 2008), as described below. We note that while the log-linear form is a natural choice for hazard modeling (as it guarantees non-negativity and allows coefficients to be interpreted in terms of logged rate multipliers), other functions can also be employed; since the relevent package supports only the log-linear case, this is our focus here.
Given the above formulation, the development of the likelihood for an observed event history is fairly straightforward. We extend the above-cited development only by the introduction of exogenous events, which are implemented in package relevent but were not discussed in the above paper. We assume that we observe an event history A t over time interval [0, t), treating time 0 as the onset of risk for all potential events; for notational convenience, we define the null event a 0 with τ (a 0 ) = 0 to mark the onset of the observation period. In general, we model the event history as being the result of a continuous time event process in which the hazard of any potential event is given by Equation 1. Some events, however, may be considered exogenous, in the sense that (1) their hazards are assumed unrelated to the observed events, and (2) their hazards are not a function of θ. Exogenous events may represent the actions of environmental or other entities that affect (but are not affected by) the system under study, or they may be used as "book keeping" devices to model systems in which hazards change for reasons other than endogenous events. For instance, "clock events" that occur deterministically at fixed periods in time (e.g., every hour, day, etc.) can be used to model changes in hazards due to time period, and exogenous events can also be used to capture the impact of changes in covariates (i.e., X t changing at times other than those associated with endogenous events). We here denote endogeneity via the indicator , such that (a i ) = 1 if a i ∈ A t is endogenous, and 0 otherwise. Subject to this minor extension, the joint likelihood of A t under the above-described process becomes (per ibid., Equation 2) Butts (2008) also considers the case in which the temporal order of events is known, but the event times are not. The observed data likelihood of A t here becomes a product of categorical distributions with outcome probabilities proportional to the event hazards. With the minor addition of exogenous events, this likelihood is given by (per ibid., Equation 3) which depends only on the hazards of the potential and realized events (and not the interevent times). Note that, like Equation 3, the likelihood of Equation 4 depends upon the assumption of piecewise constant hazards.
The estimation function rem() in the relevent package, along with its dyadic-data cousin rem.dyad(), fits both ordinal and interval time relational event models (Butts 2015). As discussed below, the interval time likelihood model is fit by supplying data on event types and complete timing of events to rem() with timing argument timing = "interval", while the ordinal time model is fit by supplying only the ordered events with timing = "ordinal". The scope of the discussion here will cover how to construct model statistics and to format data in both cases using the informR toolkit.
As there are no "canned" sufficient statistics for the rem() model-fitting routine, users must supply their own properly formatted data structure to eventlist and array of sufficient statistics to statslist. The purpose of package informR is to simplify the construction of the eventlist and statslist objects for a variety of egocentric relational event models, greatly reducing the difficulty of specifying and fitting such models in practice. The balance of the paper focuses on the informR tools, and how they may be used to assist with common modeling tasks of sequence analysis in the REM framework.

Getting started
The informR package can be downloaded and installed from CRAN; the package depends upon the abind (Plate and Heiberger 2011) and relevent packages for R, which can also be obtained from CRAN as of the time of this writing. Load the package in the usual way using library("informR"). The example data used in this paper is packaged in informR and can be loaded with data("atus80ord", package = "informR") and data("atus80int", package = "informR"), respectively for the ordinal and interval cases. The only individual level covariate included in this subset is the sex of the respondent (1 = Male).
The example data come from the pooled 2003-2008 American Time Use Survey (ATUS). The ATUS is a nationally representative sample of the non-institutionalized adult US population. The survey collects information on how people spend their time during a single randomly selected day. The type of each activity and its duration is recorded. The included subset represents event histories from respondents aged 80 and above. In addition to the activity variables, unique identifiers and respondent's gender are also included in the subset. The two data.frames that hold the event history data differ in that atus80ord contains only sequences of activity spells and atus80int restructures each activity spell observation into two event types, a starting event and a stopping event. Thus, in atus80ord, the first informant's (20030101031049) first activity spell is coded as Sleeping but in atus80int, this same activity spell is recorded as two observations Sleeping|START and Sleeping|STOP and associated with event timing in the Time column. Naturally, nrow(atus80int) == 2 * nrow(atus80ord).
There are 62,352 and 124,704 respective observations on 3,430 individuals. To make this clear, consider the print output of the first five activity spells from the following code snippet: R> library("informR") R> data("atus80ord", package = "informR") R> data("atus80int", package = "informR") R> atus80ord [1:5, ] Activities TUCASEID SEX 1348 Sleeping 20030101031049  The Time column of the atus80int data set records the time, in cumulative minutes, that the event took place in each respondent's event history. The duration elapsed between each pair of "starting" and "stopping" events constitute a spell of that type of activity. Each "stopping" event is offset from the next starting event by 0.001 minutes, which enforces the "no simultaneous events" assumption of the relational event framework. For example, actor 20030101031049 started sleeping at time 0, woke up 4 hours later (240/60), then she began to eat breakfast a fraction of a second later, which took about 30 minutes to eat etc.
Having summarized the structure of the example data, we next discuss how package informR can be used to construct sequence statistics for the rem() function. The first step is to create an eventlist object using the gen.evl() function. For ordinal time relational event models, the gen.evl() function takes two arguments: (1) eventlist, which is a two-column matrix or a data frame consisting of event observations in the first column and an event history grouping factor in the other (e.g., a unique informant id), and (2) an optional parameter called null.events which is a character vector of event types that should be treated as exogenous or otherwise not modeled directly. In this case, we are going to use the respondent id variable, TUCASEID, as the grouping factor and since there are no truly exogenous events in this data, we treat missing spell types (NA) as exogenous (i.e., for illustrative purposes only): R> atus80ord[which(is.na(atus80ord[, "Activities"])), "Activities"] <-+ "MISSING" R> rawevents <-cbind(atus80ord$Activities, atus80ord$TUCASEID) R> evls <-gen.evl(rawevents, null.events = "MISSING") R> names(evls) [1] "eventlist" "event.key" "null.events" The resulting object evls stores the eventlist in evls$eventlist. Inevls$eventlist each element is a vector of numeric unique event type ids, which have a corresponding alphabetic attribute that is used in regular expression matching in other methods -the character vector attribute of the first eventlist can be obtained using attr(evls$eventlist[ [1]], "char").
The package currently supports event histories with 52 event types or less. The event types are mapped to their respective alphabetic ids in evls$event.key. Because we passed a vector to null.events in gen.evl(), the evls object contains a list of the exogenous events in evls$null.events. Correspondingly, any event type listed in null.events is assigned a numeric id value of 0 in the eventlist for consistency with how rem() handles exogenous events. 2 Because event histories with fewer than two events cannot identify standard relational event models, any event history in the data having fewer than two events is dropped and a warning is issued. Additionally, gen.evl() can be used to build eventlist objects for the interval time relational event model; the only difference is that eventlist is passed as a three-column matrix where the first two columns index the events and the temporal information, respectively, and the third column indexes the event history grouping factor. Finally, as gen.evl() conditions on the observed event types to generate the event key, users may need to append non-observed yet potential events to it after the fact. We revisit both of these latter points below.
With the eventlist stored in evls$eventlist, we can create the second object required to fit a model using rem(), the statslist. A simple statslist object will store the intercept sufficient statistics for the egocentric relational event model. This is accomplished through the gen.intercepts() function. This function takes four arguments, evl, basecat, type and contr. The evl argument expects an object passed from gen.evl(), as above. The basecat parameter allows a user to specify which event type should be treated as the baseline category for the ordinal relational event model -if not specified, the default behavior is to take the first event type in alphabetic order. The type argument is used to indicate whether effects specified within the statslist are to be considered "global" (type = 1) or "local" (type = 2) by rem(); global effects are assumed to be homogeneous (i.e., pooled) across event histories, while local effects are allowed to vary for each event history. The default behavior is to assign all statistics to be global. Though not used in this example, the logical argument contr is used to indicate whether or not the resulting statslist should include contrasts by dropping a baseline category. When generating statistics for the interval relational event model, this parameter should usually be passed as contr = FALSE -which also overrides any value passed to basecat.

R> alpha.ints[[1]][[1]][1, , ]
Eating Once both an eventlist and a statslist is generated, we can fit a baseline egocentric discrete relational event model using rem(). In this example, we employ the Bayesian posterior mode (BPM) estimator, along with a weakly informative independent Student's t prior: R> alpha.fit <-rem(eventlist = evls$eventlist, statslist = alpha.ints, + estimator = "BPM", prior.param = list(mu = 0, sigma = 100 , nu = 4)) R> summary (  The Bayesian "p values" here can be interpreted as two times the posterior probability that the true value of the associated parameter has a sign opposite that of the posterior mode (under an asymptotic z-approximation); this is equivalent to the result that would be obtained via a standard (frequentist) two-sided z-test of the hypothesis that the parameter value is zero, under the assumption that the posterior standard deviation is the standard error.
The results of this crude baseline model suggest that all endogenous events except Household Production and Leisure are significantly less likely to occur than Sleeping, and that Travel is not significantly more likely to occur than Sleeping. Note that, in the ordinal timing case of the egocentric relational event model, an intercept-only model such as this is equivalent to a series of Poisson models, and the sample means are approximately equal to the BPM estimates in this case:

Sequence statistics
While intercept-only models are interesting, the real strength of the informR tools comes from their ability to generate sufficient statistics to model complex event sequences using rem().
Here, we introduce the term s-form (or "sequence form") to describe canonical sequences of events that we wish to model. Each s-form has a prefix and suffix. The prefix may consist of any combination of valid event types and represents the event sequence that precedes an event to be predicted. Correspondingly, the suffix should be a single event type representing the event to be predicted by the preceding events of the prefix. The suffix is always the final event type in the s-form framework.
For example, a simple s-form might consist of the event sequence a / / b / / c, which is read as "event a leads to event b which predicts event c" -or, less deterministically, "event a followed by event b predicts event c" Here, the prefix is "a / / b" and the suffix is event "c." As discussed below, it is possible to parameterize s-forms in several different ways using package informR.
The informR package uses simple regular expressions (regex) based on the alphabetic ids mapped to the event types in the event.key element in an eventlist object to represent s-forms. Two special regex operators are permitted in s-form expressions: the "|" character (alternation operator) and the "+" character (repetition operator) (Friedl 2006, pg. 18). The | operator is used to differentiate between two possible event paths and the + operator is used to indicate repetition, or persistence, of the preceding event. A single + operator may be nested within an | statement (see Section 3.1 on complex sequences, below, for example code) but nested | statements are not currently supported. The columns of Table 1 respectively report various useful canonical s-forms, their corresponding regex form, and their definitions.
Building sufficient statistics using the s-form framework is accomplished through the functions gen.sformlist() and glb.sformlist(). The first function, gen.sformlist(), can be used to construct statistics for specific sequences. For example, digram and trigram s-forms (e.g, a / / b and a / / b / / c) are easily constructed using this function. The second function, glb.sformlist(), pools multiple s-form regular expressions into a single statistic. This is particularly handy for modeling, say, a class of behaviors that is characterized by multiple s-forms sharing a common feature. Both functions are wrappers for an internal function, gen.sform(), which has limited functionality to the user. The mandatory arguments for s-form regex Definition a / / a aa Inertial term: s-form of the type "event a predicts event a".
a / / b ab Basic digram transition term: s-form of the type "event a predicts event b".
Transition term with persistence: s-form of the type "some series of events a predicts event b".
Basic trigram transition term: s-form of the type "event a followed by event b predicts event c". a c b @ @ (a|b)c Transition term with disjunction: s-form of the type "event a OR event b predicts event c". gen.sformlist are: evl, an eventlist object, and sforms, a character vector of regex sforms. The mandatory arguments for glb.sformlist are: evl, an eventlist object, sforms, a list of character vectors of regex s-forms, and new.names, which is a character vector of variable names that has the same length as sforms. Additional optional arguments for both functions include: cond = FALSE, interval = FALSE, and warn = TRUE, which I will discuss below.
Using the examples provided in Table 1, generating sufficient statistics for modeling inertial (or persistence) effects of the endogenous events is done by: R> a1 <-paste(evls$event.key [-9, 1], evls$event.key[-9, 1], sep = "") R> beta.sforms <-gen.sformlist(evls, a1) The beta.sforms object is a list of three dimensional arrays, the length of which is equal to the number of valid event histories in the dataset (i.e., the length of evls$eventlist). Each element of beta.sforms consists of an i, j, k array where i indexes the event, j indexes the event type, and k indexes the sufficient statistic to a corresponding element passed to gen.sformlist by sforms. Examining the first event of the first element of beta.sforms reveals the structure of the output: which shows that the row and column labels of each event matrix correspond to the alphabetic ids of the event types in event.key and their inertial s-forms, respectively. For completeness, it is useful to walk through the relationship between the inertial sufficient statistics and the event history. Let us take the first three events from the event history associated with atus80ord respondent "20030101033341" as our example.

R> evls$eventlist$"20030101033341"
[1] 1 2 2 7 1 attr(,"char") [1] "a" "b" "b" "g" "a" R> sapply(attr(evls$eventlist$"20030101033341", "char")[1:3], sf2nms, event.key = evls$event.key) This subsequence consists of the actor sleeping followed by two instances of eating. The sf2nms() is a convenience function that substitutes the character representation of the event types with their respective names. The corresponding inertial s-forms for these two event types are a / / a, b / / b. These are located in the beta.sforms$"20030101033341" sformlist: Thus, the inertial statistics relate back to the eventlist in the following way: the first event has alphabetic id "a", so the "a" column in the next event row (row 2) is equal to 1 in the "aa" element of the array. Likewise, the second and third events have alphabetic id "b," so the "b" column in the third and fourth rows are set equal to 1 in the "bb" element of the array. This is the general framework for the underlying statistic construction methods of the informR package.
To use the newly created sufficient statistics in a relational event model, the beta.sforms sformlist object needs to be appended to a statslist. This is easily done using the slbind() function, which takes an sformlist and a statslist in arguments sformstats and statslist, respectively. Optionally, slbind() can "translate" the alphabetic ids back into event type names (i.e., from "bb" to "EatingEating") if optional arguments new.names = TRUE and event.key = evls$event.key are passed to it. For convenience, the function sfl2statslist() can be used to directly convert an sformlist object to a statslist (i.e., without appending it to an already existing statslist as above), though name translation is not currently supported.

Constructing complex sequences
The inertial s-forms provide a relatively simple example of how to construct sequence statistics for rem(). Let us now consider a couple more complex cases using some gerontological theories about the older US population to motivate our statistics. For example, the older population is believed to have their sleep frequently interrupted by the need to use the bathroom or perform some other personal care activity, or interrupted in order to help another older person (usually a spouse) with similar needs (Barker and Mitteness 1988). These types of activities are captured in the data by "Private personal care", "Personal Care", and "Caregiving" spell types. Without loss of generality, we can use the | operator to combine information from the two slightly different personal care spell types in our s-form construction. Thus, we can model the likelihood of sleep being interrupted by personal care by using the s-form: a , h > > and the likelihood of sleep being interrupted by providing care with the s-form: a / / l / / a.
We use the persistence + operator after the "l" in the latter s-form because the inertial effect of caregiving was very large and significant in the previously fit model (beta.fit, above), which gives us reason to suspect that if caregiving is going to interrupt sleep, it will do so as a sequence of caregiving spells rather than a single instance. Specifying the special operators causes gen.sformlist() to use a slower algorithm for identifying valid subsequences and a note is issued: R> a2 <-c("a(c|h)a", "al+a") R> gamma.sforms <-gen.sformlist(evls, a2) The results suggest that, ceteris paribus, spells of sleep tend to be interrupted by personal care activities (β = 1.216 * * * ) but not by caregiving (β = −0.370 n.s. ). However, overall tendencies for sleep interruption to occur from any activity are unaccounted for in this model. To control for this, we could construct individual sufficient statistics for all possible trigrams (or, if theoretically meaningful, higher order sequences) and estimate their individual effects. Alternatively, we could pool those trigrams into a single sufficient statistic and save degrees of freedom. To accomplish this, use the glb.sformlist() function: R> sleep.sfs <-paste("a", letters[2:14], "a", sep = "") R> sleep.sforms <-glb.sformlist(evls, sforms = list(sleep.sfs), + new.names = "InterSleep") R> sleep.ints <-slbind(sleep.sforms, gamma.ints) R> sleep.fit <-rem(evls$eventlist, sleep.ints, estimator = "BPM", + prior.param = list(mu = 0, sigma = 100, nu = 4)) R> round ( Interestingly, the negative coefficient for the overall tendency for sleep interruption trigrams is negative (−1.291 * * * ), strongly suggesting that interruptions to sleep are not likely. That is, whatever happens following a spell of sleep is not likely to result in more sleeping. When sleep interruptions do occur, however, they are most likely to be the result of some personal care activity.

Additional s-form parametrizations
The preceding exercise demonstrates how the informR package can handle statslist construction of most combinations of canonical s-forms. For example, consider a relatively complex s-form: which contains both a persistence term and a divergence term preceding the suffix and which would be passed as "a + (b|c)d." One exception to this is when an s-form contains more than one persistence term inside a divergence statement, such as: To obtain a properly formed sufficient statistic for such cases, the best strategy is to write down each possible valid outcome of the s-form regex and pass them to glb.sformlist with the sform parameter. Using this s-form as an example, there are two possible paths from "a" to "d" that the internal gen.evl function can interpret. These are expressed by the following s-form regexes: "a(b + |c)d" and "a(b|c+)d." R> tmp.sforms <-glb.sformlist(evls, list(c("a(b+|c)d", "a(b|c+)d")), + new.names = "a(b+|c+)d") Note: a(b+|c)d S-form(s) contains special regex syntax. Using slow search methods. Note: a(b|c+)d S-form(s) contains special regex syntax. Using slow search methods.

R> evls$eventlist[[1]]
[1] 1 2 3 4 2 4 1 attr(,"char") [1] "a" "b" "c" "d" "b" "d" "a" Examine the sformlist output that corresponds with the "a(b + |c+)d" s-form: From the example output, we see that the first three events in the event history are "a", "b", "c" (corresponding to spells of sleeping, eating, then personal care activities). This selected sub-sequence of the event history, not coincidentally, fulfills one of the many possible outcomes from the complex s-form that we specified. Following our prior discussion on the relationship between the s-form, the event history, and the statslist, we can see that since "a" occurs the sufficient statistic for the s-form predicts that "b" or "c" will happen next and both respective columns are incremented in the second event row of the statslist. The second event is "b", and since we predict at least one "b" with the "+" character in the s-form, both "b" and the suffix event "d" columns are incremented in the third event row. Event "c" is no longer a predicted outcome because "c" did not occur after "a." Recall that there may be more than one way to specify s-form regexes. Here, we could have achieved the same result with c("ab+d", "ac+d") in the sform parameter. Creative use of the above strategy may provide a work-around for many cases that are not natively supported by package informR, such as s-forms with nested disjunctions.

Conditioning out the prefix
The default settings of gen.sformlist() and glb.sformlist() produce sufficient statistics that allow the model to influence the hazards of each non-initial element in the s-form, given what has happened up to that point in the event history. For example, an s-form term a / / b / / c will add its corresponding parameter value to the hazard of b given that a has just transpired and to the hazard of c given that a / / b has just transpired. Thus, this term's contribution to the likelihood of the event history grows larger as events of a occur and the subsequence of events a / / b occur. For certain applications, however, it is useful to estimate only the hazard of the suffix event given the preceding events (i.e., without realizing contributions to the likelihood from any prefix elements). To accomplish this, set cond = TRUE in gen.sformlist() and glb.sformlist(). The names of the s-forms in the output will be slightly different when using gen.sformlist with cond = TRUE as the prefix is enveloped between parentheses to indicate a conditional s-form. Here is an example comparing the a / / b / / c s-form with and without the prefix contributions to the likelihood: It should be clear that, in the case of simple digrams, there is no distinction between a / / b and (a) / / b because only the hazard of the suffix event is affected in both cases. By contrast, a / / b / / c and (a / / b) / / c, are different: specifically, the former term is equivalent to the combination of terms (a) / / b and (a / / b) / / c with the paramaters for the latter constrained to be equal. Unconditional s-forms are thus a parsimonious way to capture the tendency for a given chain of events to take place, while conditional s-forms allow for the specification of terms that affect only the end of a complex event sequence.

Modifying statslists
The above examples illustrate how easy it is to build complex statslists using the informR package, including how multiple statslist models may be combined using the slbind() function. This section demonstrates how to modify statslists to create new models by dropping and adding elements. Dropping elements of statslists is easy using the sldrop() function. This function takes three arguments, statslist, varname, and type and returns a new statslist that omits any sufficient statistic column matching any name in varname. The type argument is used to indicate whether the new statslist should be "global" or "local" -as in previously discussed methods.
The function slbind.cond() allows to add interaction variables to a statslist object. The slbind.cond() function adds actor-by-event conditional or interaction variables to a statslist object. This function is useful when an event history, or part thereof, depends upon some variable (actor characteristics, temporality, etc). The function takes seven arguments: intvar, a numeric variable; statslist, the statslist object; var.suffix, a character string to append to the name of the new variable; sl.ind, the column indexes of the events to be interacted with the intvar in each matrix element of the statslist; who.evs, for local statistic construction, the indexes of the actors to receive the new variable; and, type, an indicator for local or global statistics. Additional arguments can be passed to abind() using R's ellipsis arguments (...).
In the following example, a model that interacts an indicator for whether or not the respondent is female with the terms in the sleep interruption s-forms modeled above is fit to the data. This is motivated by the hypothesis that older males, due to declining bladder and prostate function, are more likely to have their sleep interrupted to urinate (Tikkinen, Tammela, Huhtalal, and Auvinen 2006) and that older females are more likely to be caretakers of their afflicted spouses (Willette-Murphy, Todera, and Yeaworth 2006).

Using interval time data
The rem() function can fit models for sequences of events when the exact (or nearly exact) timing of the events is known by providing the appropriate data in eventlist and setting the timing argument to "interval." In general, generating s-form sufficient statistics for interval time models is the same as the ordinal time case, though there are subtle differences. We will review how to construct s-forms for the interval time likelihood relational event model in this section.
First, recall that the interval time data must contain three columns where the first column indexes the events, the second column indexes the temporal information of when the event transpired, and the third column indexes an event history grouping factor. In the example data, atus80int, an "event" is recorded when an actor either starts or stops a particular activity spell. Eventlist objects and statslist objects for intercept models are created for this data using gen.evl() and gen.intercepts(), respectively, but with the interval time data arguments needing to be specified. As the interval timing models take slightly longer to fit, the example uses only a random subset (N = 500) of the data: 2 1650.001 attr(,"char") [1] "a" "b" "c" "d" "g" "h" "m" "n" "m" "n" "c" "d" "a" "b" "m" "n" "m" "n" [19] "a" "b" R> int.base <-gen.intercepts(evlsint, type = 1, contr = FALSE) Note that we now have twice as many event types as we did during the ordinal data example. Each event list in evlsint$eventlist is now a two-column matrix where the first column indexes the numeric code of the event type and the second column indexes the timing information. Failing to set contr = FALSE in gen.intercepts() in the interval case may result in an improperly identified model.
Before we can proceed with fitting the intercepts-only model, we need to consider that not all sequences of events are possible -the event support, A(A t ), is not in this case fixed. Since events in the present formulation indicate onsets and terminii of activity spells, the most trivial constraint we must enforce is that a START event for a given activity cannot occur if there has been a previous START event for that activity without the most recent event of that activity being a STOP spell (i.e., we cannot initiate an activity in which we are already engaged). Likewise, we cannot terminate an activity without starting it: a STOP event for a given activity cannot occur unless the most recent event for that activity is a START event. Likewise, there are obvious physical constraints that should induce additional restrictions on event ordering, such as the fact that one cannot, e.g., initiate eating without first ceasing to sleep. In the particular case of the ATUS, many of these issues are simplified by the nature of the study, which defines activities such that subjects can be engaged in at most one activity at any given time, and that activity begins with the onset of the first spell of the day. This amounts to a blanket constraint that (1) no START event can occur unless the previous event was a STOP event and (2) the only event that can immediately follow a START event is a STOP event of the same activity type. rem() allows for the specification of arbitrary constraints of this type, using the supplist argument (which allows users to specify at each point in the event history those events that were at risk for occuring). As documented in help("rem"), supplist is defined as follows: If desired, support constraints for the event histories can be specified using supplist. supplist should be a list with one element per history, each of which should be an event order by event type logical matrix. The ijth cell of this matrix should be TRUE if an event of type j was a possible next event given the preceding code i − 1 events, and FALSE otherwise. (By default, all events are assumed to be possible at all times.) As with the model statistics, the elements of the support list must be user supplied, and will often be history-dependent. (E.g., in a model for spell-based data, event types will come in onset/termination pairs, with terminal events necessarily being preceded by corresponding onset events.) (Butts 2015) Failure to specify support constraints where applicable can badly bias coefficient estimates, and it is hence important that they be considered when fitting complex relational event models.
Turning back to our example data, we can create a support list satisfying the constraints of the ATUS data with the following R code: Per Equation 1, the coefficients for the intercept model are estimates of the log hazards (log λ) for each event type, at each point in the event history for which the corresponding event type is possible (i.e., in A(A t )). Under this model (and due to the exclusivity constraint of the ATUS), the starting event coefficient for a given activity is proportional to the log probability that any given spell in the event history will be of the corresponding activity type. This can be verified by simple tabulation: R> bpm.start <-(exp(intfit.base$coef[grep("START", + names(intfit.base$coef))]) / + sum(exp(intfit.base$coef[grep("START", names(intfit.base$coef))]))) R> in.ids <-which(atus80int$TUCASEID %in% names(evlsint$eventlist) & + grepl("START", atus80int$Events)) R> mle.start <-prop. By turns, the inverse exponents of the stopping event coefficients can here be interpreted as the average duration of a particular spell of activity. For example, the average length of a spell of sleeping in this population is 1 e −5.690678 = 296.0943 minutes, or roughly 5 hours. This follows from the piecewise constant hazard assumption, which (together with the support constraints) results in an exponential waiting time distribution for STOP events following a corresponding START event.
Sequence statistics for interval time models are created and manipulated in the usual way (see above for details). Caution should be taken to ensure that sequences of events are properly specified in the s-form to include the onset and termination of each spell. Here, we illustrate the power of rem() to estimate highly complex conditional sequence statistics with the help of package informR via our sleep interruption example. Specifically, we will construct a model with terms for sleep interrupted by any other spell of activity and by personal care spells. We are defining the sleep interruption effects to be conditional on (1) sleeping, then (2) waking, then (3) doing something else, and finally, (4) going back to sleep. The sufficient statistics will be parametrized to model both the onset of sleeping and the duration of the spell, given that sleep followed by the other activities had previously occurred. This may be represented by a pair of s-forms: (Sleeping|Start → Sleeping|Stop → Something|Start → Something|Stop) → Sleeping|Start, for the onset of a conditional sleeping spell, and (Sleeping|Start → Sleeping|Stop → Something|Start → Something|Stop → Sleeping|Start) → Sleeping|Stop for the duration of the conditional spell. To construct the model, we need to use glb.sformlist() as both s-forms pool information from multiple spells into single elements of the statistic: R> sleepA <-paste("ab", c(letters[seq (1,26,2)], "A"), + c(letters[seq(2, 26, 2)], "B"), "a", sep = "") R> sleepB <-paste("ab", c(letters[seq (1,26,2)], "A"), + c(letters[seq(2, 26, 2)], "B"), "ab", sep = "") R> sleep.glbs <-glb.sformlist(evlsint, + list(sleepA, sleepB, c("abefa", "abopa"), c("abefab", "abopab")), + cond = TRUE, new.names = c("Inter.Sl.Ons", "Inter.Sl.Dur", + "PerCare.Sl.Ons", "PerCare.Sl.Dur")) R> sleep.int <-slbind(sleep.glbs, int.base) R> intfit.sleep <-rem(evlsint$eventlist, sleep.int, supplist = supplist, + timing = "interval", estimator = "BPM") R> summary(intfit.sleep) We again note that, overall, sleep interruption is rare, the evidence for which is captured by the negative coefficient for Inter.Sl.Ons (β = −0.891 * * * ), which, is interpreted as the conditional log hazard of returning to sleep versus doing some thing else; this has a correspondingly low conditional probability of 0.0568. However, given that sleep interruption occurs at all, it tends to decrease the average sleeping spell by a factor of 0.7024 ( 1 e 0.353266 ), or roughly 90 minutes.
R> exp(-0.890707 + 4.989059) / + sum(exp(intfit.sleep$coef[grep("START", names(intfit.sleep$coef))])) [1] 0.05677088 Moving onto interruptions directly caused by personal care needs, we see that, net of the overall tendency for interruption to occur, a spell of personal care that follows a sleeping spell increases the chances that the next event will indeed be sleeping. Ergo, personal care activities, such as using the bathroom, or changing clothes, interrupt spells of sleeping in this population (with conditional probability 0.5151). However, such interruptions do not significantly affect the average duration of sleep.

Dyadic data
As we have discussed throughout this paper, the general modeling function in the relevent package is rem(). For dyadic relational event data (where events are sender, receiver, action type tuples), the rem.dyad() function is appropriate for modeling large numbers of dyadic relational event statistics (currently, 34 classes of sufficient statistics are available, though users could specify more than that using the covariate terms). In principle, rem() can be used to fit models for dyadic relational event data as well as the ego-centric data demonstrated here; this allows support for a wider range of event types, self-directed events, support constraints etc., at the cost of the simplicity and computational optimizations afforded by rem.dyad().
While the primary purpose of the informR package is to aid in the construction of models for ego-centric data, some of its functions may be useful for constructing such dyadic models to use with rem(). As a proof-of-concept, a simple example of how informR tools can be used to construct dyadic models based on the example in the rem.dyad() documentation from the relevent package is provided in the Appendix A.

Conclusion
This paper has outlined the basic functionality of the informR package for R. Sufficient statistics to model event sequences for both ordinal and interval time data were constructed and employed in relational event models using real data and the relevent package. In addition, code for manipulating these statistics and constructing conditional models was introduced. The informR package greatly simplifies the use of the rem() modeling function, specifically, and the modeling of event histories with dependent sub-features more generally. It is hoped that this toolkit will lead to wider use of relational event models for complex event sequences, particularly by researchers engaged with non-dyadic data.

Computational details
The version of package informR used in this tutorial is 1.0-5 and the version of package relevent is 1.0-4.
The average batch run-time (n = 30) of the examples employed in this tutorial was 12.5 minutes on a 2.2 GHz Intel T6600 processor with 3.8 GB of RAM (absolute maximum memory consumption was 1.8 GB as estimated by the bourne-shell program top) using Linux kernel v2.6.32.

A. Dyadic models
In this appendix, we demonstrate a proof-of-concept for fitting a model of dyadic data with rem() from the relevent package. Normally, dyadic data is modeled using rem.dyad() and the examples here are derived from the example code in that relevent package function. Since rem.dyad() and rem() are optimized using different routinesoptim() and trust (Geyer 2014), respectively -results may differ slightly between models. The example is simple: we fit a relational event model of fixed-effects for sending and receiving. Extension to other sorts of dyadic relational event terms can be performed using techniques similar to those shown here.
In this example, we estimate fixed effects for sending and receiving ties among an artificial network of five actors using both estimating functions: rem.dyad() and rem(). With rem.dyad(), these terms are conveniently added to the dyadic relational events model by passing c("FESnd", "FERec") to the argument effects of that function. With rem(), however, the burden of generating the sufficient statistics to add to the model falls entirely on the user. Like the egocentric (non-dyadic) cases described in the main text of this manuscript, the informR tools are also helpful in this case. Specifically, we can use gen.evl() and gen.intercepts(), with minor modifications, to facilitate model fitting with rem(). For completeness, we fit both ordinal and interval likelihood models. Also, we demonstrate the use of maximum likelihood estimation (MLE) to fit these models.
As can be seen by the output, the results of the two ordinal model fitting routines are quite consistent. As we discussed above, there are minor differences in precision between the output of rem.dyad() and rem(). These small discrepancies arise due to the fact that the two functions use different optimizer routines: optim(), and trust(), respectively.