Network Coincidence Analysis: The netCoin R Package

The aim of the R package netCoin is to explore data structures using a number of statistical techniques that share the handling of interdependent variables. The main objective of this analysis is to detect events, characters, objects, attributes or characteristics that tend to appear together within a given set of scenarios. Its most notable feature is the combination of traditional multivariate statistical analysis and network analysis supported by topological graph theory. In addition, netCoin produces HTML graphs using the D3.js visualization library to support the interactive exploration of networked data. Among its many applications, netCoin can be used for the analysis of multiple responses in questionnaires to explore relevant associations, for the development of textual networks, for the study of ecological communities, for audience analysis, for mining large databases or for basket market analysis.


Introduction
netCoin (Escobar, Barrios, Prieto, and Martinez-Uribe 2020) is an R package which performs network coincidence analysis, whose aim is to find out the structure and the degree to which a series of events (subjects, objects or characteristics) tends to occur together within certain limits called scenarios. To discover these patterns, this package generates visualizations of the coincidences through interactive network graphs via a web browser.
Graphs represent elements (nodes) that may or may not be connected (edges). Coincidence graphs consist of two types of information: a set of nodes or vertices (events), N = (n 1 , n 2 , . . . , n J ), and a set of lines, links or edges (coincidences), L = (l 1 , l 2 , , . . . , l L ) (Wasserman and Faust 1994). interconnected between them. One of the typical applications of these structures are affiliation networks, which represent the connections between an actor with a set of social situations (Wasserman and Faust 1994). For example, bimodal networks could help study the membership of an executive group in a company or the events that the inhabitants of a certain village attend.
Those relationships can be studied using bipartite graphs or hypergraphs as well as dual hypergraphs, although in most cases the representation of only one of the sets is of interest (such as the actors or the inhabitants in the previous examples), and any bimodal network can be transformed into a unimodal one, which leads to the preference for the co-participation matrix. Precisely, the main operation behind netCoin is generating a coincidence matrix (unimodal) from an incidence matrix (bimodal) to convert the former into a graph.
Another application of network coincidence analysis is the study of species within different ecosystems. The coexistence of bird species in the Galápagos Islands is one highly popular case among biologists (Sanderson 2000). For this study, a probabilistic co-occurrence method based on the hypergeometric distribution, which is also included in netCoin, has been developed (Veech 2013).

Similar software
There are a variety of tools within the statistics and data analysis domain that perform similar operations to netCoin to visually analyze the structure of binary data.
It is also possible to find common ground with machine learning techniques, especially the association rules (Borgelt 2012) that have binary matrices as their starting point. In contrast, netCoin focuses on the associations between pairs of events while apriori() and eclat() procedures seek for higher order connections available through the package arules (Hahsler, Grün, and Hornik 2005).
Tools with a specific focus on network analysis and visualization include four major packages: igraph (Csardi and Nepusz 2006;Csardi 2020), network by Butts (2008Butts ( , 2019, the graphical complement networkD3 (Grandrud, Allaire, and Rusell 2016;Allaire, Grandrud, Rusell, and Yetman 2015) and visNetwork by Almende, Benoit, and Titouan (2019). The first two are powerful tools for analyzing networks and can represent them in a non-interactive way, unless they are used in conjunction with tcltk2 (Grosjean 2014), but they lack the analytic instruments to study coincidences and the ability to create HTML graphs. Another similar package is RJSplot (Prieto and Barrios 2017), which produces interactive and dynamic graphics widely used in DNA structure data analysis. The last three are more similar to netCoin. However, they lack statistical tools to produce the coincidence graphs. Outside of the R environment, a variety of social network analysis tools exist such as Gephi (Bastian, Heymann, and Jacomy 2009), Pajek (Batagelj and Mrvar 1998) (Schult and Swart 2008).
It is important to mention those packages which specialize in co-occurrence in community structures. Griffith, Veech, and Marsh (2016a) created the cooccur package (Griffith, Veech, and Marsh 2016b), with incidence matrices similar to those of netCoin, but only using the hypergeometric distribution. They refer to other packages to detect pairs of species that share some space with one another such as picante (Kembel et al. 2010), spaa (Zhang 2016) and vegan (Oksanen et al. 2019).
In terms of similarity and distance calculations, packages like stats (R Core Team 2020), proxy (Meyer and Buchta 2019) and even parallelDist (Eckert 2018) cover most of the metrics that netCoin calculates. However, they do not include some of the coincidence analysis metrics needed, such as frequency, conditional frequency or statistical significance. In addition to this, netCoin allows the calculation of more than one metric at the same time just by calling one function. This reduces calculation time and thus improves performance.
A similar package to netCoin is qgraph (Epskamp, Cramer, Waldorp, Schmittmann, and Borsboom 2012;Epskamp, Costantini, Haslbeck, and Isvoranu 2020), which provides an interface to visualize data through network modeling techniques. However, qgraph is intended to represent a correlation matrix or a factor analysis statically, while netCoin is specialized in the representation of qualitative variables transformed into dichotomies and its parameters can be interactively changed through a web page.
In sum, despite the fact that there are many packages and software tools to analyze binary metrics and represent networks, netCoin adds value by providing the possibility to efficiently calculate a series of distance and similarity measures, including their statistical significance, and allowing the generation of interactive graphic output in HTML.

Coincidence analysis
Co-occurrences have been widely studied in many fields, especially in the content analysis of texts (Carley 1993;Lund and Burgess 1996;Popping 2000Popping , 2003Matsuo and Ishizuka 2004) and in the study of ecological communities (Diamond and Gilpin 1982;Connor and Simberloff 1983;Veech 2013). In addition to this, there is extensive literature that focuses on applications and many R packages that facilitate their analysis as seen in the previous section.
netCoin focuses on a particular form of dealing with co-occurrences, which is called coincidence analysis, and whose aim is to detect which people, subjects, objects, attributes or events tend to appear at the same time in different limited spaces (Diaconis and Mosteller 1989;Baumgartner 2009;Escobar 2015).
An event (j) is a potential outcome of a random experiment. The set of possible outcomes is called a sample space and is composed of a series of elementary mutually exclusive events.
A scenario (i) is each one of the results of a complex experiment made up of a set of events (X j ) with varying degrees of dependence between each other. A scenario can also be defined as a spatial and temporal set in which the researcher collects information on the events that take place.
Since the events of the scenarios are not mutually exclusive, they can be represented using Scenarios Head Tail  I  1  0  II  1  1  III  1  1  IV  0  1    dichotomous vectors (they can either occur or not) or vectors containing natural numbers (number of times each event occurs in a given scenario).
Therefore, the set of observed n scenarios can be represented as an incidence matrix (I = (x ij )). In one dimension (generally the rows) the matrix contains the scenarios (i) and in the other dimension (commonly the columns) it contains the events (j). This matrix consists of 1s and 0s indicating if the events occurred or not, respectively, within the scenario. Alternatively, the occurrence matrix, which records the number of appearances of the event in every scenario, can be employed.
This distinction will be better understood with this simple example: If two coins are tossed four times, each toss represents a scenario where the events heads and tails are of interest. The three possible results for each toss of the two coins are: a) two heads, no tails; b) a head and a tail, and c) two tails and no head. The incidence matrix can be presented as shown in Table 1. On the other hand, the occurrence matrix must reflect the two heads or two tails obtained when the result is not head and tail (see Table 2).
Coincidence and co-occurrence matrices can be calculated from the incidence and occurrence matrices.
Definition. Two coincident events (j and k) are those which occur together in the same scenario i.
Along with the basic coincidence in a given scenario i, when considering whether two events coincide in a multiple set of scenarios, the total number of coincidences of the events j and k can be obtained.
In addition, we can distinguish different degrees of coincidences. Thus, the most basic coincidence classification would distinguish between: a. No coincidence: Two events that never occur in the same scenario, i.e., they are mutually exclusive (f jk = 0).
b. Simple coincidence: Two events are merely coincident if they occur together in at least one scenario (f jk > 0). c. Total coincidence: Two events that always occur together in the same scenarios. If one of them occurs, then the other does too (f jk = f jj = f kk ). A special case is the subtotal coincidence in which the other event occurs only if the first occurs and not vice versa (f jk = f jj > f kk ), i.e., the occurrence of the more frequent event (k) does not necessarily imply the occurrence of the less frequent event (j).
From the incidence matrix, the coincidence matrix F = (f ij ) can be calculated using this expression: F = I I. This is an example of how to project a bimodal network to a unimodal one. The elements of this matrix are either univariate (f jj ) or bivariate (f jk ) frequencies of the different events in the set of scenarios (i) contained in the rows of I.
From the coincidence matrix (F) three probabilistic measures can be derived: a. The marginal probability of X j , denoted as P(X j ), can be obtained by dividing the frequencies of each event (f jj ) by the total number of scenarios (n) in which it could have occurred: b. The joint probability of two events X j and X k , expressed as P(X jk ) is given by the frequency of occurrence in the same scenario divided by the set of scenarios considered in a given set: c. The conditional probability, denoted as P(X j |X k ), expresses the possibility that a certain event occurs when the second event has already occurred. It is obtained by dividing the joint probability by the marginal probability of the conditional event: With the conditional probability, we can create a coincidence gradient, the probable coincidence, between two events when their conditional probability is greater than 50%: When working with samples of scenarios instead of the whole universe, the upper limit of the confidence interval can be estimated under the alternative hypothesis of P(X j |X k ) < 0.5 using the formula where t α,f kk −1 is the value of the Student distribution for f kk − 1 degrees of freedom with a significance level of α.
The conditional coincidence is another coincidence gradient. It is derived from the concept of independence of events. Two events are independent if Equation 1 is true: Therefore, for that condition to be met, the following condition needs to be verified: From this equation, two events have a conditional coincidence when their frequency is greater than the expected (f * jk ) under the assumption of independence: It is also known (Haberman 1973) that the difference between f jk and f * jk assumes asymptotically a normal distribution with the following standard error: which can be used to standardize (r jk ) the difference between the empirical frequency of coincident events (f jk ) and the expected frequency (f * jk ) under the assumption of mutual independence: For small samples, the one-sided Fisher exact test, which employs the hypergeometric distribution should be used instead (Fisher 1935;Finney 1948).
The degrees of coincidence that can be detected between each pair of events is summarized in Table 4.

Coincidence metrics
In addition to classifying coincidences into different types, they can be measured using binary proximity metrics (Hubálek 1982;Gower 1985). These measures have a maximum value of Table 5: Contingency table. one when there is total coincidence between two dichotomous events and a value of 0 when there is total independence between them. Some of them can take negative values, in which case the minimum value could be −1 when two incompatible events are implied.
For the calculation of these metrics each element (f jk ) of the coincidence matrix can be split into the following system equivalences: Therefore, for each pair of events, Table 5 can be elaborated. With these four figures (a, b, c, d), representing the frequencies of the four states of presence/absence of two events in the set of scenarios studied, binary proximity measures are obtained.
These coefficients or binary proximity metrics can be classified into four types: The first one includes metrics that are similar to that of matching (Rogers and Tanimoto 1960; also known as Rogers and Tanimoto). They are the result of divisions with a numerator with both positive coincidences (the two events occur in the same scenario) and negative coincidences (the two events are absent in the same scenario), and a denominator where all scenarios are considered with different weights. The metrics belonging to this category are Rogers (Rogers and Tanimoto 1960), Sneath (Sneath and Sokal 1962), Anderberg (1973) and Gower (1985). These measurements should be used when considering coincidence both when two events are present in the same scenario, as well as when both are not present.
In the second type of metrics there is Jaccard (1901). Here, scenarios where neither of the two events whose coincidence degree we intend to measure (d) are excluded. Therefore, neither the numerator nor the denominator include those scenarios without any of the two events.
Metrics of this type also include Dice (Jaccard 1901), Antidice (Anderberg 1973), Ochiai (1957) and Kulczynski (1927). In this case, events that are not present in the same scenario are not considered to be coincident, and only those scenarios where at least one event has occurred are coincident.
The third type of similarity metrics for binary data only includes Russell and Rao (1940). It only considers those scenarios to be similar in which both events occur. It excludes from the numerator those in which none of the events occurs, considering that this does not indicate that the scenarios are similar. However, unlike the similarity metrics such as Jaccard's, all the possible scenarios are present in the denominator of the equation. This coincidence measure only takes into account coincident events and contemplates all scenarios, including those in which both events are not present. Logically, if there are no scenarios where neither of the two is present, then both are equal. However, if within an infinite number of scenarios neither of the two events existed, the value of Russell and Rao would be zero, while Jaccard would be 1 by convention.

Russell and Rao
Finally, in the fourth type we may include all metrics in which frequencies of coincidences (whether the events occur or not) are compared (subtracted) with frequencies of no coincidences (scenarios where an event occurs but the other one does not). Thus, these measurements can be positive if coincident events predominate, or negative otherwise, i.e., when the scenarios in which the events do not coincide predominate. Metrics of this type include Pearson (1900), Yule (1900) and Hamann (1961). This modality is similar to the correlation coefficients and has the advantage of presenting both positive and negative values. Positive values imply that whenever an event is present, the other is as well; while negative ones evidence that in most cases, the presence of an event implies the absence of the other.
All the previous expressions are called similarity metrics. To turn them into distance measurements, the following expression can be used distance = 1 − similarity. If the metric has a range between 0 and 1, then these limits are preserved, although with a different meaning, as Matching Hamann (ham), odds ratio (od) Probabilistic p value of Haberman (z), hypergeometric p greater value (hyp) the 0 indicates complete coincidence. Nevertheless, if the metric range is between −1 and +1, the new similarity metric will be between 0 and 2, with 1 indicating complete independence and higher values meaning that two events coincide less often than by mere chance.
An outline of these measures and the abbreviations to obtain them with netCoin can be found in Table 6.

Adjacency matrix
Coincidence and distance matrices have been covered. Both types can be transformed into adjacency matrices. An adjacency matrix connects each pair of events depending on whether their coincidence metric is above a certain value. Thus, it is a square matrix with as many rows and columns as the number of events being studied, and formed by elements representing the number of coincidences between every pair of events. Using all the previous metrics, adjacency matrices can be formed in the following ways: a. With the simple coincidences so that there will be a connection between two events provided that they have coincided in a single scenario.
With total or subtotal coincidences so that two completely overlapping events will be connected. In the first category, it will be a symmetrical connection, and in the case of subtotal coincidences, it will only connect the less frequent category and the most frequent ones.

Layouts
The same way that a series of coincidences can become an adjacency matrix, the latter can be converted into a graph. As previously said, a graph G consists of "two sets of information: a set of nodes (events), N = {n 1 , n 2 , . . . , n g }, and a set of lines (coincidences), L= {l 1 , l 2 , . . . , l L } between a pair of nodes" (Wasserman and Faust 1994).
An additional problem is where to draw each node, i.e., the spatial distribution of the nodes. Thanks to igraph, netCoin can be laid out according to the criteria in Table 7.
If none of these layouts are indicated, netCoin uses a dynamic Fruchterman-Reingold algorithm by default. Alternatively, the user can provide a matrix with two columns indicating the coordinates of those nodes that are going to be fixed in the representation. Leftover nodes should be stated as NA and would be placed according to a forced directed mechanism.

Communities
Cluster analysis is "a set of methods for constructing a (hopefully) sensible and informative classification of an initially unclassified set of data, using the variable values observed on each individual" (Everitt 2003). In agglomerative hierarchical clustering methods, there are various procedures to join cases using dendrograms: single, complete, average, median, Ward, etc. In the coincidence analysis, clustering could be useful to classify events according to their concurrences, using the Haberman residuals (r jk ) or another distance matrix (geodesic, matching, Jaccard, . . . ) as inputs to the clustering method.
Events j and k are structurally equivalent if, for all events, l = 1, 2, . . . , g (l = j, k), and for all associations r = 1, 2, . . . , R, event j has a relation to l if and only if k also has a relation to l. Consequently, structurally equivalent events are those that have identical edges with the rest of events. Structural equivalence can imply "community", but it does not have to (e.g., if each community consists of a standard set of hierarchical actors), and community does not have to imply structural equivalence. Events can be partitioned into subsets of structural equivalence using a hierarchical clustering or a similar algorithm of classification. netCoin allows us to obtain the igraph procedures listed in Table 8.

The R package netCoin
Some of netCoin's statistical and graphical features were originally implemented in Stata (StataCorp 2019) as the coin ado program (Escobar 2015). This initial Stata program lacked the graphical interactivity which provides agile data exploratory capabilities. That is the main reason why R was chosen to generate an extended version of the original coin program.
Firstly, the shiny (Chang, Cheng, Allaire, Xie, and McPherson 2020) and igraph packages were used to achieve graph results, but what provided the solution to accomplish the desired interactivity was the integration with the D3.js data visualization library (Bostock, Ogievetsky, and Heer 2011). In addition to this, R code has been written to obtain the coincidence metrics and their significance.

Installation
The netCoin package is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=netCoin and has dependencies on three other R packages igraph (Csardi and Nepusz 2006), Matrix (Bates and Mächler 2019) and haven (Dusa and Thiem 2020) which are loaded with netCoin.

Overview with three simple examples
The netCoin package incorporates every coincidence analysis element detailed in Section 3. The functions included help the analyst convert the data into an incidence matrix that is suitable for the analysis, produce the coincidence matrix, calculate all the statistical indicators, generate the nodes and edges of the graph, produce interactive network visualizations and export those networks as 'igraph' objects.
The basic input is an incidence binary matrix, which can be obtained with the function dichotomize() in case of absence. This function can be applied to both character variables and factor variables. In addition to this, among the former it is able to split fragments separated by a constant chain, whose default value is the null character ("").

Argument Meaning sep = ""
The separator in case that the variables are composed. min = 1 Minimum frequency of the value of a variable to be considered as an event.

length = Inf
Maximum number of events to be considered.

values = NULL
Events to be converted into dichotomies (not for multiple composed variables). sparse = FALSE Produce a sparse matrix instead of a data frame.

add = TRUE
Add the new columns to the original data frame. sort = TRUE Order the new columns by their frequencies. In addition to the data frame and the variable or variables to be dichotomized, the arguments of this function are given in Table 9.
The simplest example can be applied to the dice data frame included in the package: R> data("dice", package = "netCoin") R> events <-dichotomize(dice, "dice", add = FALSE, sort = FALSE) R> head(events) Thus, a new data frame with 6 columns corresponding to the six possible events of throwing a dice would be obtained.
We would have to add the argument sep = in case of factor variables composed of several events. As a second example, imagine that we tossed two coins in unison ten times into the air. The results could be "H,H", "T,H", "H,T", "T,T", each with the same probability. Therefore, to convert the events of each toss into elementary events, we use dichotomize() with the argument sep = ",".
R> frame <-data.frame(A = c("Man; Woman", "Woman; Woman", "Man; Man", + "Undet.; Woman; Man")) R> data <-dichotomize(frame, "A", sep = "; ") Finally, the function netCoin() can mix the nodes (extracted from the 'coin' object) with the edge list data frames in order to produce a 'netCoin' object, and if the argument dir = "directory" is used, a directory will be created with a graph within a web page whose main file is named index.html.
The 'netCoin' object has three methods: print() shows a sample (until 6) of nodes and links with their attributes, summary() shows the basic statistics of the nodes, and plot() shows the corresponding graph in the computer's default browser.

R> plot(net)
Alternatively, the 'netCoin' object could be obtained directly from a binary incidence matrix with the allNet() function, where more than 40 arguments can be controlled, although the only required argument is the incidence matrix. However, if we want to obtain the directory with the graph, we must add the dir = "directory" argument. Other sources to obtain a 'netCoin' object are an 'igraph' object with the function fromIgraph, and a data set with factor variables with the function surCoin(). See the four functions that obtain a 'netCoin' object in Figure 1.

Multigraph coincidence analysis with data of families from Renaissance Italy
The following example uses data about families from Renaissance Italy from Padgett and Ansell (1993). It consists of a data frame (families) with information about Italian families of the Renaissance, and another data frame (links) with the marriage and business bonds between families.
Two networks are generated representing the business and marriage bonds between the two families with the following commands.
R> G <-allNet(incidence = links[links$link == "Marriage", -17], + nodes = families, layout = "md", criteria = "f", minL = 1, + size = "frequency", color = "seat", + main = "Marriage links between Italian families", + note = "Data source: Padgett & Ansell (1983)") Argument Meaning incidence A data frame that contains the incidence matrix. nodes A data frame with at least one vector of names. layout The algorithm selected for the network topology. criteria The statistical criteria to be used for the selection of the edges. minL Minimum value of the statistic to represent the edge in the graph. size Name of the vector with size in the nodes data frame. color Name of the vector with color variable in the nodes data frame. main Upper title of the graph. note Lower title of the graph.

Function Description dichotomize
Function to convert factor or character column(s) in a data frame into a set of dichotomous columns. Their names will correspond to the labels or text of every category. coin This function generates a 'coin' object from an incidence matrix data frame. A 'coin' object consists of a list with two elements: the number of scenarios, and a coincidence matrix of events, whose main diagonal figures are the frequency of events and outside the said diagonal there are conjoint frequencies of these events asNodes From a 'coin' object, this function generates a data frame of nodes. edgeList (sim) Function to convert a coincidence matrix into an edge list calculating a variety of coincidence (proximity) metrics. The sim function produces the same information, but as a list of proximity matrices instead. netCoin The netCoin function produces an interactive 'netCoin' object from two data frames: one including nodes with attributes, and another one containing edges also with its own attributes. multigraphCreate This function produces an interactive multinetwork with several 'netCoin' objects. fromIgraph From an 'igraph' object, this function generates a 'netCoin' object. toIgraph With this function an 'igraph' object is generated from a 'netCoin' object. allNet Produces a 'netCoin' object from a data frame or a matrix with dichotomous values. surCoin Produces a 'netCoin' object from a data frame with factor variables accepting also a 'tbl_df' class (see package haven). R> H <-allNet(incidence = links[links$link == "Business", -17], + nodes = families, layout = "md", criteria = "f", minL = 1, + size = "frequencb", color = "seat", + main = "Marriage links between Italian families", + note = "Data source: Padgett & Ansell (1983) Padgett & Ansell (1983) The 'netCoin' object G (as well as the non-shown H) is composed of two data frames. In the first (nodes) there are the families' attributes: frequency of marriage links (f.Marriages), frequency of business links (f.Business), a wealth index (wealth), number of priories held (priorates) and holding of at least one priorate (seat). In every row of the links data frame there are two families with a column indicating the existence of a link (coincidence) between them.
Once the two networks are ready, the function multigraphCreate() generates both graphs in the specified directory (see Figure 2).

Sanderson's analysis of species co-occurrences
This section uses one of the most renowned data examples in ecology. Charles Darwin compiled data about 13 species of finches and 17 of the Galápagos Islands (Sanderson 2000) on which they could be found.
We prepare the nodes' attributes (finches) and their incidences in the islands (Galapagos). Afterwards, we have to add the images in a specific directory in order to refer to them in the allNet() function.
• maxL = 0.05: Maximum value of the statistic to include the edge in the list.
• lwidth = "Haberman": Name of the vector with width variable in the links data frame.
• lweight = "Haberman": Name of the vector with weight variable in the links data frame.
• image = "file": Name of the vector with image files in the nodes data frame.

Title: Species coincidences in Galapagos Islands
Nodes (13) Sanderson (2000) In this example, the only attributes of nodes are frequency, percentage (%) and type. The column specs has been suppressed because it is used to create the images from the images file names. More importantly, the links attributes are 1) frequencies, for example the number of coincidences of source and target finches, and 2) p (Fisher), which is the error probability of rejecting the one-side alternative hypothesis, in case that it is true that two species are not coincident on each island (scenario).
Once the 'netCoin' object is ready, the function plot() generates its graphical representation in a temporary directory (see Figure 3), or in the directory specified in the dir argument. In this way, all the necessary files to be deposited in a web server are saved so that anyone can view them and interact with them using a browser.

R> plot(Net)
Graphical comparison of two networks netCoin can also be used to graphically compare networks of co-occurrences. For instance, the previous graph of the Galápagos Islands finches (Net) can be compared with a random null model obtained from the same data with the function cooc_null_model() of the EcoSimR package (Gotelli, Hart, and Ellison 2015). Among the possibilities offered by this program, we opted for the nullity of co-occurrences and the Sim9 algorithm, which is a sequential swap (Gotelli 2000;Strona, Nappo, Boccacci, Fattorini, and San-Miguel-Ayanz 2014).
Once the theoretical or null model is randomly obtained (nullData), it could be analyzed and represented with the command allNet() assessing the significance of its co-occurrence links. Previously, in order to better compare the empirical data obtained by Darwin with the random null model data, the positions of the nodes of the null model are set using those of the empirical model. After using the hypergeometric distribution (criteria = "hyp") and a level of significance of 0.05 (maxL = 0.05), the new graph (NullNet) only has two co-occurrences out of the possible 78 (paired combinations of 13 fiches).

Survey analysis
Another interesting use for netCoin is that of survey analysis applied to explore relationships between variables including those from multiple choice questions. The straightforward analysis shown below uses the package haven (Dusa and Thiem 2020) to read a SPSS (IBM Corporation 2017) survey demo file. Three variables are selected for the analysis: gender, inccat (income category in thousands) and carcat (primary vehicle price category).
The plot() function is applied to the result of the surCoin() function with those three variables as inputs. This produces the graph in Figure 5 where the male node is connected to the lowest and highest incomes as well as the economy and luxury vehicle categories. On the other hand, the female node is linked to income categories in the middle range and either the standard or the luxury vehicle price category. R> library("haven") R> survey <-read_spss(file = "demo.sav") R> variables <-c("gender", "inccat", "carcat") R> plot(surCoin(survey, variables, communities = "Louvain"))
The results show faster times for parallelDist when the number of cases or events is smaller. But when the number of cells (cases times events) grows, then netCoin offers better results as shown by Figure 6. As time grows exponentially with the number of cells, time is represented by its logarithmic values in this figure.
The package produces interactive graphs that work well with up to 1500 edges. Using more than 1500 edges makes the interaction with the graph slow due to browser memory limitations.

Concluding comments
The netCoin package offers an opportunity for the interactive analysis and visualization of data sets composed of every kind of data insofar as variables are dichotomized. It contains a large variety of similarity measures to connect the events that co-occur in the same scenarios.
In order to select the relevant coincidences, netCoin incorporates two models of probability: the normal distribution through the Haberman residuals for a large number of scenarios, and the hypergeometric model for small data collections. Its main aim is to represent coincidences through a graph, which is particularly useful when many events are to be analyzed. By means of routines from igraph, netCoin can reproduce different types of layouts and obtain communities with various algorithms, which facilitate the analysis and interpretation of coincidences. Data are then converted into D3 interactive graphs with controls enabling an interactive event analysis that can be shared with users online.