AMR : An R Package for Working with Antimicrobial Resistance Data

Antimicrobial resistance is an increasing threat to global health. Evidence for this trend is generated in microbiological laboratories through testing microorganisms for resistance against antimicrobial agents. International standards and guidelines are in place for this process as well as for reporting data on (inter-)national levels. However, there is a gap in the availability of standardized and reproducible tools for working with laboratory data to produce the required reports. It is known that extensive efforts in data cleaning and validation are required when working with data from laboratory information systems. Furthermore, the global spread and relevance of antimicrobial resistance demands to in-corporate international reference data in the analysis process. In this paper, we introduce the AMR package for R that aims at closing this gap by providing tools to simplify antimicrobial resistance data cleaning and analysis, while incorporating international guidelines and scientifically reliable reference data. The AMR package enables standardized and reproducible antimicrobial resistance analyses, including the application of evidence-based rules, determination of first isolates, translation of various codes for microorganisms and antimicrobial agents, determination of (multi-drug) resistant microorganisms, and calculation of antimicrobial resistance, prevalence and future trends. The AMR package works independently of any laboratory information system and provides several functions to integrate into international workflows (e.g., WHONET software provided by the World Health Organization).


Introduction
Antimicrobial resistance is a global health problem and of great concern for human medicine, veterinary medicine, and the environment alike. It is associated with significant burdens to both patients and health care systems. Current estimates show the immense dimensions we are already facing, such as claiming at least 50,000 lives due to antimicrobial resistance each year across Europe and the United States alone (O'Neill 2014). Although estimates for the burden through antimicrobial resistance and their predictions are disputed (De Kraker, Stewardson, and Harbarth 2016) the rising trend is undeniable (CDC 2019), thus calling for worldwide efforts on tackling this problem.
Surveillance programs and reliable data are key for controlling and streamlining these efforts. Surveillance data of antimicrobial resistance at higher levels (national or international) usually comprise aggregated numbers. The basis of this information is generated and stored at local microbiological laboratories where isolated microorganisms are tested for their susceptibility to a whole range of antimicrobial agents. The efficacy of these agents against microorganisms is nowadays interpreted as follows (EUCAST 2019): • R: Resistant. There is a high likelihood of therapeutic failure.
• S: Susceptible, standard dosing regimen. There is a high likelihood of therapeutic success using a standard dosing regimen of an antimicrobial agent.
• I: Susceptible, increased exposure. There is a high likelihood of therapeutic success, but only when exposure to an antimicrobial agent is increased by adjusting the dosing regimen or its concentration at the site of infection.
Generally, antimicrobial resistance is defined as the proportion of resistant microorganisms (R) among all tested microorganisms of the same species (R + S + I). Today, the two major guideline institutes to define the international standards on antimicrobial resistance are the European Committee on Antimicrobial Susceptibility Testing (EUCAST, Leclercq et al. 2013) and the Clinical and Laboratory Standards Institute (CLSI, Clinical and Laboratory Standards Institute 2014). The guidelines from these two institutes are adopted by 94% of all countries reporting antimicrobial resistance to the WHO (World Health Organization 2018a).
Although these standardized guidelines are in place on the laboratory level for the data generation process, stored data in laboratory information systems are often not yet suitable for data analysis. Laboratory information systems are often designed to fit billing purposes rather than epidemiological data analysis. Furthermore, (inter-)national surveillance is hindered by inadequate standardization of epidemiological definitions, different types of samples and data collection, settings included, microbiological testing methods (including susceptibility testing), and data sharing policies (Tacconelli et al. 2018). The necessity of accurate data analysis in the field of antimicrobial resistance has just recently been further underlined (Limmathurotsakul et al. 2019). Antimicrobial resistance analyses require a thorough understanding of microbiological tests and their results, the biological taxonomy of microorganisms, the clinical and epidemiological relevance of the results, their pharmaceutical implications, and (inter-)national standards and guidelines for working with and reporting antimicrobial resistance.
incorporating scientifically reliable reference data about valid laboratory outcome, antimicrobial agents, and the complete biological taxonomy of microorganisms. The AMR package provides solutions and support for these aspects while being independent of underlying laboratory information systems, thereby democratizing the analysis process. Developed in R and available from the Comprehensive R Archive Network (CRAN) at https: //CRAN.R-project.org/package=AMR since February 22, 2018 (Berends et al. 2022), the AMR package enables reproducible workflows as described in other fields, such as environmental science (Lowndes et al. 2017). The AMR package provides a new technical instrument to aid in curbing the global threat of antimicrobial resistance. Furthermore, local and regional data in the laboratories can now become relevant in any setting for public health.
While no other packages R package with the purpose of dealing with antimicrobial resistance data are available on CRAN or Bioconductor, the AMR package may be integrated in workflows of related packages. For example, the R Epidemics Consortium (RECON) provides high-quality packages for data analysis in infectious disease outbreaks or epidemics (for example incidence and epicontacts, Jombart et al. 2020;Nagraj, Jombart, Randhawa, Sudre, Campbell, and Crellen 2021). In addition, on the laboratory side the antibioticR package provides approaches to work with disk diffusion zone diameter and minimum inhibitory concentration data from environment samples (Petzoldt 2021). We aim at providing a comprehensive and standardized toolbox for antimicrobial resistance data processing and analysis, with a focus on microbiological, clinical, and epidemiological purposes that was yet missing.
The following sections describe the functionality of the AMR package according to its core functionalities for transforming, enhancing, and analyzing antimicrobial resistance data using scientifically reliable reference data.

Antimicrobial resistance data
Microbiological tests can be performed on different specimens, such as blood or urine samples or nasal swabs. After arrival at the microbiological laboratory, the specimens are traditionally cultured on specific media, such as blood agar. If a microorganism can be isolated from these media, it is tested against several antimicrobial agents. Based on the minimal inhibitory concentration (MIC) of the respective agent and interpretation guidelines, such as guidelines by EUCAST (Leclercq et al. 2013) and CLSI (Clinical and Laboratory Standards Institute 2014), test results are reported as "resistant" (R), "susceptible" (S) or "susceptible, increased exposure" (I). A typical data structure is illustrated in Table 1 (Leclercq et al. 2013). patient date  test_no specimen mo  PEN  AMC  CIP  1  2019-03-08 100  blood  esccol  R  I  S  1  2019-03-09 101  blood  esccol  R  I  S  2 2019-03-08 102 blood StaAur > 8 (R)* < 0.01 (S)* . 3 2019-03-08 103 urine P. aeru. R S** S For the first two rows, the information should be read as: Escherichia coli (mo = esccol) was isolated from blood of patient 1 and was found to be resistant to penicillin, and susceptible to amoxicillin/clavulanic acid and ciprofloxacin. However, often (especially when merging sources) data is reported in ambiguous formats as exemplified in Table 2. It is crucial that source data can be analyzed in a reliable way, especially when the outcome will be used to evaluate patient treatment options. This requires reproducible and field-specific, specialized data cleaning and transforming.
The AMR package aims at providing a standardized and automated way of cleaning, transforming, and enhancing these typical data structures (Table 1 and 2), independent of the underlying data source. Processed data would be similar to Table 3 that highlights several package functionalities in the sections below.

Working with taxonomically valid microorganism names
Coercing is a computational process of forcing output based on an input. For microorganism names, coercing user input to taxonomically valid microorganism names is crucial to ensure correct interpretation and to enable grouping based on taxonomic properties. To this end, the AMR package includes all microbial entries from The Catalogue of Life (https://www.catalogueoflife.org/), the most comprehensive and authoritative global index of species currently available (Bánki et al. 2022). It holds essential information on the names, relationships, and distributions of more than 1.9 million species. The integration of it into the AMR package is described in the Appendix A.
The as.mo() function makes use of this underlying data to transform a vector of characters to a new class 'mo' of taxonomically valid microorganism name. The resulting values are microbial IDs, which are human-readable for the trained eye and contain information about the taxonomic kingdom, genus, species, and subspecies ( Figure 1).
The as.mo() function compares the user input with taxonomically valid microorganism names, rates the matching with a score and returns results based on the highest score. This   Figure 1: The structure of a typical microbial ID as used in the AMR package. An ID consists of two to four elements, separated by an underscore. The first element is the abbreviation of the taxonomic kingdom. The remaining elements consist of abbreviations of the lowest taxonomic levels of every microorganism: genus, species (if available) and subspecies (if available). Abbreviations used for the microbial IDs of microorganism names were created using the base R function abbreviate().
This will lead to the effect that e.g., "E. coli" will return the microbial ID of Escherichia coli (m = 0.688, a highly prevalent microorganism found in humans) and not Entamoeba coli (m = 0.079, a less prevalent microorganism in humans), although the latter would alphabetically come first. The matching score function is for users available as mo_matching_score().
If any coercion rules are applied, a warning is printed to the console and scores can be reviewed by calling mo_uncertainties(), that prints all other matches with their matching scores. Users can furthermore control the coercion rules by setting the allow_uncertain argument in the as.mo() function. The following values or levels can be used: • 0: no additional rules are applied; • 1: allow previously accepted (but now invalid) taxonomic names and minor spelling errors; • 2: allow all of 1, strip values between brackets, inverse the words of the input, strip off text elements from the end keeping at least two elements; • 3: allow all of level 1 and 2, strip off text elements from the end, allow any part of a taxonomic name; • TRUE (default): equivalent to 2; • FALSE: equivalent to 0.
To support organization specific microbial IDs, users can specify a custom reference 'data.frame', by using as.mo(..., reference_df = ...). This process can also be automated by users with the set_mo_source() function.

Working with antimicrobial names or codes
The AMR package includes the antibiotics data set, which comprises common laboratory information system codes, official names, anatomical therapeutic chemical (ATC) codes, defined daily doses (DDD) and more than 5,000 trade names of 464 antimicrobial agents (see Appendix A). The ATC code system and the reference list for DDDs have been developed and made available by the World Health Organization Collaborating Centre for Drug Statistics Methodology (WHOCC) to standardize pharmaceutical classifications (WHO Collaborating Centre for Drug Statistics Methodology 2018). All agents in the antibiotics data set have a unique antimicrobial ID, which is based on abbreviations used by the European Antimicrobial Resistance Surveillance Network (EARS-Net), the largest publicly funded system for antimicrobial resistance surveillance in Europe (European Centre for Disease Prevention and Control 2018). Furthermore, the AMR package includes the antivirals data set containing antiviral agents, which is also described in the Appendix A.

Properties of antimicrobial agents
It is a common task in microbiological data analyses (and other clinical or epidemiological fields) to work with different antimicrobial agents. The AMR package provides several functions to translate inputs such as ATC codes, abbreviations, or names in any direction. Using as.ab(), any input will be transformed to an antimicrobial ID of class 'ab'. Helper functions are available to get specific properties of antimicrobial IDs, such as ab_name() for getting the official name, ab_atc() for the ATC code, or ab_cid() for the compound ID (CID) used by PubChem (Kim et al. 2019). Trade names can be also used as input. For example, the input values "Amoxil", "dispermox", "amox" and "J01CA04" all return the ID of amoxicillin (AMX):

Selecting and filtering data based on classes of antimicrobial agents
The application of the ATC classification system also enables grouping of antimicrobial agents for data analyses. Data sets with microbial isolates can be filtered on isolates with specific results for tested antimicrobial agents in a specific antimicrobial class. For example, carbapenems() can be used to select columns or filter rows based on any of the 14 available antimicrobial agents in the group of carbapenems according to the antibiotics data set.

Working with antimicrobial susceptibility test results
Minimal inhibitory concentrations (MIC) are susceptibility test results measured by microbiological laboratory equipment to determine at which minimum antimicrobial drug concentration 99.9% of a microorganism is inhibited in growth. These concentrations are often capped at a minimum and maximum, for example ≤0.02 µg/ml and ≥32 µg/ml, respectively. The 'mic' class, an ordered 'factor' containing valid MIC values, keeps these operators while still ordering all possible outcomes correctly so that e.g., "<= 0.02" will be considered lower than "0.04".
Another susceptibility testing method is the use of drug diffusion disks, which are small tablets containing a specified concentration of an antimicrobial agent. These disks are applied onto a solid growth medium or a specific agar plate. After 24 hours of incubation, the diameter of the growth inhibition around a disk can be measured in millimeters with a ruler. The 'disk' class can be used to clean these kinds of measurements, since they should always be valid numeric values between 6 and 50. The supported minima and maxima of valid values for both classes, 'mic' and 'disk', are displayed in Table 4.
The higher the MIC or the smaller the growth inhibition diameter, the more active substance of an antimicrobial agent is needed to inhibit cell growth, i.e., the higher the antimicrobial resistance against the tested antimicrobial agent. At high MICs and small diameters, guidelines interpret the microorganism as "resistant" (R) to the tested antimicrobial agent. At low MICs and wide diameters, guidelines interpret the microorganism as "susceptible" (S) to the tested antimicrobial agent. In between, the microorganism is classified as "susceptible, increased exposure" (I). For these three interpretations the 'rsi' class has been developed. When using as.rsi() on MIC values (of class 'mic') or disk diffusion diameters (of class 'disk'), the values will be interpreted according to the guidelines from the CLSI or EUCAST (Clinical and Laboratory Standards Institute 2019; The European Committee on Antimicrobial Susceptibility Testing 2020, all guidelines between 2011 and 2021 are included in the AMR package). Guidelines can be changed by setting the guidelines argument.

Class <rsi> [1] R
When using the as.rsi() function on existing antimicrobial interpretations, it tries to coerce the input to the values "R", "S" or "I". These values can in turn be used to calculate the proportion of antimicrobial resistance.

Interpretative rules by EUCAST
Next to supplying guidelines to interpret raw MIC values, the EUCAST has developed a set of expert rules to assist clinical microbiologists in the interpretation and reporting of antimicrobial susceptibility tests (Leclercq et al. 2013). The rules comprise assistance on intrinsic resistance, exceptional phenotypes, and interpretive rules. The AMR package covers intrinsic resistant and interpretive rules for data transformation and standardization purposes. The first prevents false susceptibility reporting by providing a list of organisms with known intrinsic resistance to specific antimicrobial agents (e.g., cephalosporin resistance of all enterococci). Interpretative rules apply inference from established resistance mechanisms (Winstanley and Courvalin 2011;Courvalin 1992Courvalin , 1996Livermore, Winstanley, and Shannon 2001). Both groups of rules are based on classic IF THEN statements (e.g., IF Enterococcus spp. resistant to ampicillin THEN also report as resistant to imipenem). Some rules provide assistance for further actions when certain resistance has been detected, i.e., performing additional testing of the isolated microorganism. The AMR package function eucast_rules() can apply all EUCAST rules that do not rely on additional clinical information, such as additional information on patients' diagnoses. Table 2 and 3 highlight the transformation for the reporting of AMX = S in patient_id = 000003 to the correct report according to EUCAST rules of AMX = R. Of note, however, EUCAST rules overwrite original data to correct for the difference in how antimicrobial agents affect the tested microorganism in vitro (in the laboratory) and in vivo (in the human body). This requires users to closely collaborate with the data source provider to ensure correct versioning, backward compatibility, reproducibility, and taking into account specific local regulation for resistance reporting. Typical scenarios where changes to the original data points apply include in vitro test results indicating susceptibility when resistance in vivo is known. The changes are based on scientific consensus to ensure reliable high-quality reporting of antimicrobial susceptibility results. All changes to the data are printed to the console and can also be reviewed in detail by setting the argument eucast_rules(..., verbose = TRUE).
EUCAST rules are subject to regular updates which are implemented into the AMR package by the AMR maintenance team shortly after publication. The eucast_rules() function supports versioning of the rules. The arguments version_breakpoints and version_ expertrules can be set to current or previous versions. By default, the eucast_rules() function uses the latest implemented version.

Working with defined daily doses (DDD)
DDDs are essential for standardizing antimicrobial consumption analysis, for inter-institutional or international comparison. The official DDDs are published by the WHOCC (WHO Collaborating Center for Drug Statistics Methodology 2019). Updates to the official publication are monitored by the AMR maintenance team and implemented in the antibiotics data set included in the AMR package. Other metrics exist such as the recommended daily dose (RDD) or the prescribed daily dose (PDD). However, DDDs are the only metric that is independent of a patient's disease and therapeutic choices and thus suitable for the AMR package.
Functions from the atc_online_*() family take any text as input that can be coerced with as.ab() (i.e., to class 'ab'). Next, the functions access the WHOCC online registry (WHO Collaborating Center for Drug Statistics Methodology 2019, internet connection required) and download the property defined in the arguments (e.g., administration = "O" or administration = "P" for oral or parenteral administration and property = "ddd" or property = "groups" to get DDD or the group of the selected antimicrobial defined by its ATC code).

Determining first isolates
Determining antimicrobial resistance or susceptibility can be done for a single agent (monotherapy) or multiple agents (combination therapy). The calculation of antimicrobial resistance statistics is dependent on two prerequisites: the data should only comprise the first isolates and a minimum required number of 30 isolates should be met for every stratum in further analysis (Clinical and Laboratory Standards Institute 2014).
An isolate is a microorganism strain cultivated on specified growth media in a laboratory, so its phenotype can be determined. First isolates are isolates of any species found first in a patient per episode, regardless of the body site or the type of specimen (such as blood or urine) (Clinical and Laboratory Standards Institute 2014). The selection on first isolates (using function first_isolate()) is important to prevent selection bias, as it would lead to overestimated or underestimated resistance to an antimicrobial agent. For example, if a patient is admitted with a multi-drug resistant microorganism and that microorganism is found in five different blood cultures the following week, it would overestimate resistance if all isolates were to be included in the analysis. The episode in days can be set with the argument episode_days, which defaults to 365 as suggested by the Clinical and Laboratory Standards Institute (2014) guideline.

Determining multi-drug resistant organisms (MDRO)
Definitions of multi-drug resistant organisms (MDRO) are regulated by national and international expert groups and differ between nations. The AMR package provides the functionality to quickly identify MDROs in a data set using the mdro() function. Guidelines can be set with the argument guideline. At default, it applies the guideline as proposed by Magiorakos et al. (2012). Their work describes the definitions for bacteria being "MDR" (multidrug-resistant), "XDR" (extensively drug-resistant) or "PDR" (pan-drug-resistant). These definitions are widely adopted (Abat, Fournier, Jimeno, Rolain, and Raoult 2018) and known in the field of medical microbiology.
Some guidelines require a minimum availability of tested antimicrobial agents per isolate. This is needed to prevent false-negatives, since no reliable determination can be performed on only a few test results. This required minimum defaults to 50%, but can be set by the user with the pct_minimum_classes. Isolates that do not meet this requirement will be skipped for determination and will return NA (not applicable), with an informative warning printed to the console.
The rules are applied per row of the data. The mdro() function automatically identifies the variables containing the microorganism codes and antimicrobial agents based on the guess_ab_col() function. Following the guideline set by the user, it analyzes the specific antimicrobial resistance of a microorganism and flags that microorganism accordingly. The outcome is demonstrated in Table 5, where the first row is an MDRO according to the Dutch guidelines (Werkgroep Infectiepreventie 2011).
The returned value is an ordered 'factor' with the levels Negative < Positive, unconfirmed < Positive. For some guideline rules that require additional testing (e.g., molecular confirmation), the level Positive, unconfirmed is returned.

Multi-drug resistant tuberculosis
Tuberculosis is a major threat to global health caused by Mycobacterium tuberculosis (MTB) and is one of the top ten causes of death worldwide (World Health Organization 2018b). Exceptional antimicrobial resistance in MTB is therefore of special interest. To this end, the international WHO guideline for the classification of drug resistance in MTB (World Health Organization 2014) is included in the AMR package. The mdr_tb() function is a convenient wrapper around mdro(..., guideline = "TB"), which returns an other ordered 'factor' than other mdro() functions. The output will contain the 'factor' levels Negative < Mono-resistant < Poly-resistant < Multi-drug-resistant < Extensive drugresistant, following the WHO guideline.

Calculation of antimicrobial resistance
The AMR package contains several functions for fast and simple resistance calculations of bacterial or fungal isolates. A minimum number of available isolates is needed for the reliability of the outcome. The CLSI guideline suggests a minimum of 30 available first isolates irrespective of the type of statistical analysis (Clinical and Laboratory Standards Institute 2014). Therefore, this number is used as the default setting for any function in the package that calculates antimicrobial resistance or susceptibility, which can be changed with the minimum argument in all applicable functions.

Counts
The AMR package relies on the concept of tidy data (Wickham 2014), although not strictly following its rules (one row per test rather than one row per observation). Function names to calculate the number of available isolates follow these general resistance interpretation standards with count_S(), count_I(), and count_R() respectively. Combinations of antimicrobial resistance interpretations can be counted with count_SI() and count_IR(). All these functions take a vector of interpretations of the class 'rsi' (as discussed above) or are internally transformed with as.rsi(). The returned value is the sum of the respective interpretation in the selected test column. All count_*() functions support quasi-quotation with pipes, grouped variables, and can be used with dplyr::summarize() (Wickham, François, Henry, and Müller 2022).

Proportions
Calculation of antimicrobial resistance is carried out by counting the number of first resistant isolates (interpretation of "R") and dividing it by the number of all first isolates, see Equation 1. This is implemented in the proportion_R() function. To calculate antimicrobial susceptibility, the number of susceptible first isolates (interpretation of "S" and "I") has to be counted and divided by the number of all first isolates, which is implemented in the proportion_SI() function. For convenience, the resistance() function is an alias of the proportion_R() function, and the susceptibility() function is an alias of the proportion_SI() function.
The functions proportion_R(), proportion_IR(), proportion_I(), proportion_SI(), and proportion_S() follow the same logic as the count_*() functions and all return a vector of class 'double' with a value between 0 and 1. The argument minimum defines the minimal allowed number of available (tested) isolates (default: minimum = 30). Any number below the set minimum will return NA with a warning.
For calculating the proportion (P ) of antimicrobial resistance or susceptibility to one antimicrobial agent, the following equation is used: where P is the proportion of outcome o (that is either "R", "S", "I", or a combination of two of them), where x is a character vector of length k only consisting of values "R", "S", or "I" and [x i ∈ o] is the indicator function, returning 1 if the indicator function is true and 0 otherwise. The denominator must include the collection {R, S, I} so that "wrong" elements in x (i.e., elements not being "R", "S", or "I") will not be counted. Thus, the theoretical antimicrobial susceptibility of the vector x = {S, S, I, R, R} is: For the proportion of empiric susceptibility (s) for more than one antimicrobial agent, the calculation can be carried out in two ways ( Table 6). The first method is to count the total number of first isolates where at least one agent was tested as "S" or "I" and divide it by the number of first isolates tested where any of the agents was tested (Equation 2). This method will be used when setting only_all_tested = FALSE in the susceptibility() function: where x is a character vector only consisting of values "R", "S", or "I" (i.e., "agent A") and y is another character vector only consisting of values "R", "S", or "I" (i.e., "agent B").
The second method is to count the total number of first isolates where at least one agent was tested as "S" or "I" and where all agents were tested divided by the number of first isolates tested where all of the agents were tested (Equation 3). This method will be used when setting only_all_tested = TRUE in the susceptibility() function:  Based on Equation 1, the overall resistance and susceptibility of antimicrobial agents like gentamicin (GEN) and amoxicillin (AMX) can be calculated using the following syntax. The example_isolates is an example data set included in the AMR package, see Appendix A. The n_rsi() function is analogous to the n() function of the dplyr package. It counts the number of available isolates, but only includes observations with valid antimicrobial results (i.e., "R", "S", or "I"). This leads to the conclusion that combining gentamicin with amoxicillin would cover s(x = GEN, y = AMX) = 93.2% based on 1,921 out of 2,000 available isolates, which is 17.8% more than when treating with gentamicin alone (P (x = GEN, o = {S, I}) = 75.4%). With these functions, exact calculations can be done to evaluate the empiric success of treating infections with one or more antimicrobial agents.

Design decisions
The AMR package follows the rationale of tidyverse packages as authored by Wickham et al. (2019). Most functions take a 'data.frame' or 'tibble' as input, support piping (%>%) operations, can work with quasi-quotations, and can be integrated into dplyr workflows, such as mutate() to create new variables and group_by() to group by variables. Although the AMR package integrates well into tidyverse workflows, it can also be used with base R only.
To this extent, the AMR package was developed to be independent of any other R package to ensure and maintain sustainability.
The AMR package supports multiple languages. Currently supported languages are Danish, Dutch, English, French, German, Italian, Portuguese, Russian, Spanish and Swedish.
The system language will be used if the language is supported but can be overwritten with options(AMR_locale = ...). Multi-language support affects language-dependent output of functions such as mo_name(), mo_gramstain(), mo_type(), and ab_name().
The AMR package uses S3 classes, object oriented systems available in R. They allow different types of output based on the user input. The AMR package introduces five S3 classes ('mo', 'ab', 'rsi', 'mic', and 'disk') to increase the convenience when working with antimicrobial susceptibility data.

Reproducible example
We consider the problem of working with antimicrobial resistance data from three different hospitals between 2011-01-01 and 2020-01-01. After loading the AMR package and additional tidyverse packages to allow transformation and plotting, we load the example_isolates_ unclean example data from the AMR package into the global environment and assign it a new name.
R> data <-data %>% mutate( + bacteria = as.mo(bacteria), bacteria_name = mo_name(bacteria)) R> mo_uncertainties() Matching scores are based on pathogenicity in humans and the resemblance between the input and the full taxonomic name. Seè ?mo_matching_score`. In a next step, we can further enrich the data with additional microbial taxonomic data based on the bacteria variable, such as Gram-stain and microorganism family.
R> data <-data %>% mutate( + gram_stain = mo_gramstain(bacteria), family = mo_family(bacteria)) R> data %>% count(gram_stain) The variables AMX, AMC, CIP, and GEN contain antimicrobial susceptibility test results. The abbreviations stand for the tested antimicrobial agent. The official names and additional information about the antimicrobial agents can be checked with the ab_info() function from the AMR package.

R> data <-data %>% eucast_rules()
The output to the console lists the changes made to data: The rules affected 508 out of 3,000 rows, making a total of 657 edits => added 0 test results => changed 657 test results -11 test results changed from "S" to "I" -473 test results changed from "S" to "R" -85 test results changed from "I" to "R" -19 test results changed from "I" to "S" -33 test results changed from "R" to "I" -36 test results changed from "R" to "S" The data is now clean and ready for further analysis, for example, the identification of multidrug resistant microorganisms. In this example, we use the Dutch guideline to determine multi-drug resistance (Werkgroep Infectiepreventie 2011).
As described in Section 4.1, the identification of first isolates is essential for the reporting of resistance patterns. Using the filter_first_isolate() function and proportion_df() in combination with group_by(), we get a complete resistance analysis per hospital, bacteria, first isolate, and tested antimicrobial agent in one call: From the console we get the information how many first isolates were identified and used in the filter.

Discussion
For the first time, a free and open source software solution is available to cover all aspects of working with antimicrobial resistance data. The AMR package provides functionalities that enable standardized and reproducible workflows from raw laboratory data to publishable results, for research and clinical workflows alike. In the field of clinical microbiology and infectious diseases, research and clinical workflows are closely linked. For example, a performed research study on the prevalence of antimicrobial-resistant bacteria can have direct implications on the choice of antimicrobial agents for the treatment of patients. The AMR package was developed to be used in any research or clinical setting where the data analysis on microorganisms, antimicrobial resistance, antimicrobial agents is required.
Both, researchers and clinicians rely on the data from electronic laboratory information systems (LIS) where laboratory test results are processed, stored, and archived. Although some commercial solutions exist to conduct medical microbiological data analysis, these solutions are not comprehensive enough to apply antimicrobial resistance analysis for any clinical or research setting. Costs of these tools are a further constraint in resource-limited settings. Moreover, researchers and clinicians that require data from multiple LIS sources to be used in multi-center studies experience major barriers which cannot be solved by available commercial solutions.
Firstly, simple codes for microorganisms show substantial differences between different LIS and presumably correct taxonomic names are often misspelled or outdated. We analyzed the taxonomic names of bacteria used in reports from seven different public health institutions that perform microbiological diagnostics in the Netherlands and compared them with an official scientific up-to-date source for microbial taxonomy, the Catalogue of Life (Bánki et al. 2022). These institutions cover microbiological diagnostics for hospitals and primary care for 15% of the total Dutch population. All institutions reported outdated taxonomic names with a maximum lag ranging between 34 and 41 years. Given that antimicrobial resistance guidelines are strongly based on the microbial taxonomy (some rules only apply to a specific genus, other rules apply to a specific family), it is crucial that this information is correct and timely updated. All institutions admitted that there was no standard operating procedure to maintain their taxonomic reference data. Implementing and maintaining the taxonomic data for these and other institutions has been challenging, since no common machine-readable, reliable and up-to-date resource for the microbial taxonomy was publicly available. For reliable reference data about antimicrobial agents, this also holds true. The AMR package provides machine-readable reference data files for the complete microbial taxonomy and for more than 500 antimicrobial agents. Using functions starting with mo_* and ab_*, names of microorganisms and antimicrobial agents can be translated between different LIS codes or other forms of text codes for microorganisms and consequently allows to merge data sets from different sites with little effort.
Secondly, antimicrobial resistance interpretation guidelines (Leclercq et al. 2013;Clinical and Laboratory Standards Institute 2014) and taxonomic definitions of microorganisms are under constant change and are continually published in dedicated peer-reviewed journals. This is further complicated by differences between local, regional, and national guidelines. Yet, comparability and reproducibility across setting and time are key in research and clinics. The AMR package functions eucast_rules() (to apply guidelines to data), mdro() (to check for multi-drug resistance according to guidelines), or first_isolate() (to determine first isolates according to guidelines) address the needs to standardize comparability, and empower data analysts beyond the capabilities of their local LIS. The AMR package can be used as an extra layer of data validation when retrieving raw data from a LIS. Overall, the functionality of the AMR package has the potential to improve data validity in clinical settings, to ease multicenter study workflows, and to foster research reporting practices. The inherent global nature of antimicrobial resistances requires researchers, clinicians, and policy makers to reach beyond the borders of their local laboratory. The AMR package can build the bridge to link these sources and further encourages open science principles through its open source approach.
The AMR package also has limitations. It does not introduce novel statistical tests or models, nor does it add additional analytical approaches for AMR research. The calculation of the proportion of susceptibility for more than one antimicrobial agent simultaneously (see Section 5.1) seems simple but is subject to unclear reporting in clinical practice (Schechner, Temkin, Harbarth, Carmeli, and Schwaber 2013;Ma et al. 2017). The lack of clearly defined algorithms can lead to the effect that co-resistance rates for more than one antimicrobial agent are dropped altogether (Baur et al. 2017). The inclusion of isolates that are tested for some agents (only_all_tested = FALSE) or only isolates tested for all agents (only_all_tested = TRUE) can have an imminent clinical impact on patient care, if one combination of antimicrobial agents is preferred over another. Therefore, the AMR package provides different algorithms to standardize this crucial calculation. Unfortunately, unambiguous methodol-ogy for determining the right algorithm is lacking in scientific literature. An analysis on the algorithms used in the AMR package and their clinical impact is in preparation.
Reliable information about antimicrobial resistance is vital for clinical decision-making in infectious diseases, since the outcome of local antimicrobial resistance analyses support medical professionals/clinicians in the treatment choices for their patients. Moreover, when this information can be reliably stratified by, for example, year, hospital, and type of patients, new information can lead to new insights for choosing the best antimicrobial therapy for patients suffering from infections. The AMR package enables this by providing all required analysis tools and can therefore empower decision-making in infectious management. The AMR package is already being applied to this end in six hospitals in the Netherlands. The choice of empirical antimicrobial treatment (meaning; choosing the initial therapy at a time of not knowing the infection-causing pathogen) for septic non-post-surgical patients has been altered in at least one Dutch hospital, by analyzing antimicrobial resistance data with the AMR package. The clinical effect of this adjustment is being studied at the moment. To improve the quality of such analyses, planned future developments comprise the implementations of an imputation algorithm specifically for antimicrobial agents, and method guidance for applying prediction modelling in a health care setting based on patient-specific properties.
Since the first package release, users from different public and private settings have been suggesting additional functionalities, in particular, the incorporation of country-or time-specific guidelines (e.g., Magiorakos et al. 2012). This community-centered development will be continued and maintained by researchers at the University Medical Center Groningen and data scientists at Certe Medical Diagnostics and Advice, both non-profit public health organizations located in Groningen, the Netherlands. Moreover, a group of contributors from five different Dutch health care institutions has been formed at the Dutch Association for Medical Microbiology (Nederlandse Vereniging voor Medische Microbiologie -NVMM) that also peer-review major changes to the package, including the implementation of guideline updates. This way, updates required for scientific developments as well as maintaining consistent reproducibility are ensured. Updates to databases and guidelines included in the AMR package are incorporated on a regular and automated basis, while preserving version control. Any function making use of guidelines (e.g., eucast_rules()) refers to the latest implemented version of the guideline by default.
The aim of the AMR package is to provide a comprehensive toolbox of solutions for antimicrobial resistance data processing and analysis on an institution-and country-independent scale for clinical practice and research that are required according to international standards, but were not available to date.

Summary
This paper demonstrates the AMR package and its use for working with antimicrobial resistance data. It can be used to clean, enhance, and analyze such data according to (inter)national recommendations and guidelines while incorporating scientifically reliable reference data on microbiological laboratory test results, antimicrobial agents, and the biological taxonomy of microorganisms. Consequently, it allows for reproducible analyses, regardless of the many possible ways in which raw and uncleaned data are stored in laboratory information systems.
While the burden of antimicrobial resistance is increasing worldwide, reliable data and data analyses are needed to better understand current and future developments. Open source approaches, such as the AMR package for R, have the potential to help democratizing the required tools in the field for researchers, clinicians, and policy makers alike. In organizations or countries with very limited resources, this free and open-source package could also overcome a financial limitation that would otherwise hinder antimicrobial resistance analysis in these settings. Across settings, we believe the AMR package can be used to support clinical decision-making in infection management by providing improved insight into current local and regional resistance levels. Furthermore, data analysis approaches based on individual patient or microbiological data, which the AMR package enables, fosters empowerment of laboratory staff, infection control practitioners, and public health services.

Computational details
The results in this paper were obtained using R 4.2.0 in RStudio 2022.07.1 (RStudio, PBC 2022) with the AMR package 1.8.2, running under macOS Monterey 12.5.
R itself and all packages used are available from the CRAN at https://CRAN.R-project. org/. All development versions of the AMR package are available at https://github.com/ msberends/AMR/.

A. Included data sets
• microorganisms A 'data.frame' containing 70,760 (sub)species with 16 columns comprising their complete microbial taxonomy according to the Catalogue of Life (Bánki et al. 2022). Included microorganisms and their complete taxonomic tree of all included (sub)species from kingdom to subspecies with year of scientific publication and responsible author(s): -All 59,024 (sub)species from the kingdoms of Archaea, Bacteria, Chromista and Protozoa. -All 9,582 (sub)species from these orders of the kingdom of Fungi: Eurotiales, Onygenales, Pneumocystales, Saccharomycetales, Schizosaccharomycetales and Tremellales. -All 2,153 (sub)species from 47 other relevant genera from the kingdom of Animalia (like Strongyloides and Taenia). -All 14,338 previously accepted names of included (sub)species that have been taxonomically renamed. The kingdom of Fungi is a very large taxon with almost 300,000 different (sub)species, of which most are not microbial (but rather macroscopic, such as mushrooms). Therefore, not all fungi fit the scope of the AMR package. By only including the aforementioned taxonomic orders, the most relevant fungi are covered (such as all species of Aspergillus, Candida, Cryptococcus, Histoplasma, Pneumocystis, Saccharomyces and Trichophyton).
• antibiotics A 'data.frame' containing 464 antibiotic agents with 14 columns. All entries in this data set have three different identifiers: a human readable EARS-Net code (as used by ECDC (European Centre for Disease Prevention and Control 2010) and WHONET (WHO Collaborating Centre for Surveillance of Antimicrobial Resistance 2019) and primarily used by this package), an ATC code (as used by the WHO, WHO Collaborating Centre for Drug Statistics Methodology 2018), and a CID code (Compound ID, as used by PubChem, Kim et al. 2019). The data set contains more than 5,000 official brand names from many different countries, as found in PubChem. Other properties in this data set are derived from one or more of these codes, such as official names of pharmacological and chemical subgroups, and defined daily doses (DDD).
• antivirals A 'data.frame' containing 102 antiviral agents with 9 columns. Like the antibiotics data set, it contains ATC codes (as used by the WHO, WHO Collaborating Centre for Drug Statistics Methodology 2018), and a CID code (Compound ID, as used by PubChem, Kim et al. 2019), as well as the official name and defined daily dose (DDD) for each antiviral agent.
• example_isolates A 'data.frame' containing test results of 2,000 microbial isolates. The data set reflects real patient data and can be used to practice AMR analysis. It is structured in the typical format of laboratory information systems with one row per isolate and one column per tested antimicrobial agent (i.e., an antibiogram).
• example_isolates_unclean A 'data.frame' containing test results of 3,000 microbial isolates that require cleaning up before they can be used for analysis. This data set can be used to practice AMR analysis and is featured in Section 7.
• WHONET A 'data.frame' containing 500 observations and 53 columns, with the exact same structure as an export file from WHONET 2019 software (WHO Collaborating Centre for Surveillance of Antimicrobial Resistance 2019). Such files can be used with the AMR package, as this example data set demonstrates. The antibiotic test results are from the example_isolates data set. All patient names are created using online surname generators and are only in place for practice purposes.