The R Package bgmm: Mixture Modeling with Uncertain Knowledge

Przemyslaw Biecek; Ewa Szczurek; Martin Vingron; Jerzy Tiuryn

doi:10.18637/jss.v047.i03

Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn

Abstract

Classical supervised learning enjoys the luxury of accessing the true known labels for the observations in a modeled dataset. Real life, however, poses an abundance of problems, where the labels are only partially defined, i.e., are uncertain and given only for a subset of observations. Such partial labels can occur regardless of the knowledge source. For example, an experimental assessment of labels may have limited capacity and is prone to measurement errors. Also expert knowledge is often restricted to a specialized area and is thus unlikely to provide trustworthy labels for all observations in the dataset. Partially supervised mixture modeling is able to process such sparse and imprecise input. Here, we present an R package called bgmm, which implements two partially supervised mixture modeling methods: soft-label and belief-based modeling. For completeness, we equipped the package also with the functionality of unsupervised, semi- and fully supervised mixture modeling. On real data we present the usage of bgmm for basic model-fitting in all modeling variants. The package can be applied also to selection of the best-fitting from a set of models with different component numbers or constraints on their structures. This functionality is presented on an artificial dataset, which can be simulated in bgmm from a distribution defined by a given model.

Files:

Paper R package (bgmm) R example code from the paper

Published:

Apr 17, 2012

DOI:

10.18637/jss.v047.i03

Main Article Content

Abstract

Article Details

Article Sidebar