\name{BHC-package} \alias{BHC-package} \alias{BHC} \docType{package} \title{ Bayesian Hierarchical Clustering } \description{ The BHC method performs bottom-up hierarchical clustering, using a Dirichlet Process (infinite mixture) to model uncertainty in the data and Bayesian model selection to decide at each step which clusters to merge. This avoids several limitations of traditional methods, for example how many clusters there should be and how to choose a principled distance metric. This implementation accepts multinomial (i.e. discrete, with 2+ categories) data. } \details{ \tabular{ll}{ Package: \tab BHC\cr Type: \tab Package\cr Version: \tab 1.0\cr Date: \tab 2008-04-02\cr License: \tab GPL-3\cr } The Bayesian Hierarchical Clustering (BHC) algorithm is a black-box, accessed via the bhc() function (see help for details on how to do this). All other functions in this package are accessed via bhc() and the user should not need to access them directly. } \author{ Rich Savage (C++ code originally written for binomial case by Yang Xu) Maintainer: Rich Savage } \references{\emph{Bayesian Hierarchical Clustering}, Heller + Ghahramani, Gatsby Unit Technical Report GCNU-TR 2005-002 (2005); also see shorter version in ICML-2005; \emph{R/BHC: fast Bayesian hierarchical clustering for microarray data}, Savage et al, BMC Bioinformatics 10:242 (2009)} \keyword{ package } \examples{ require(graphics) require(BHC) require(affydata) require(gcrma) data(Dilution) ai <- compute.affinities(cdfName(Dilution)) Dil.expr <- gcrma(Dilution,affinity.info=ai,type="affinities") testData <- exprs(Dil.expr) keep <- sd(t(testData))>0 testData <- testData[keep,] testData <- testData[1:100,] geneNames <- row.names(testData) nGenes <- (dim(testData))[1]; nFeatures <- (dim(testData))[2]; nFeatureValues <- 4 ##NORMALISE EACH EXPERIMENT TO ZERO MEAN, UNIT VARIANCE for (i in 1:nFeatures){ newData <- testData[,i] newData <- (newData - mean(newData)) / sd(newData) testData[,i] <- newData } ##DISCRETISE THE DATA ON A GENE-BY-GENE BASIS ##(defining the bins by equal quartiles) for (i in 1:nGenes){ newData <- testData[i,] newData <- rank(newData) - 1 testData[i,] <- newData } ##PERFORM THE CLUSTERING hc <- bhc(testData, geneNames, nFeatureValues=nFeatureValues) plot(hc, axes=FALSE) }