Main Article Content
In many applications such as transaction data analysis, the classification of long chains of sequences is required. For example, brand purchase history in customer transaction data is in a form like AABCABAA, where A, B, and C are brands of a consumer product. The decision tree-based package mbonsai is designed to handle sequence data of varying lengths using one or multiple variables of interest as predictor variables. This software package uses tree growing and pruning strategies adopted from C4.5 and CART algorithms, and includes new features for handling sequence data and indexing for classification purpose. The software uses a simple command line program for learning and predicting processes, and has the ability to generate user-friendly graphics depicting decision trees. The underlying C++ codes are designed to efficiently process large data sets in ASCII files. Two examples from transaction data sets are used to illustrate the application of mbonsai.