# Principal Coordinate Analysis using R programming(PCoA)

Principal Coordinate Analysis (PCoA) is a method to represent on a 2 or 3 dimensional chart objects described by a square matrix containing resemblance indices between these objects.

This method is due to Gower (1966). It is sometimes called metric MDS (MDS: Mutidimensional scaling) as opposed to the MDS (or non-metric MDS). Both methods have the same objective and produce similar results if the similarity matrix of square distances are metric and if the dimensionality is sufficient.

PCoA works roughly like this, see follwoing example:

• you start with a distance matrix D
• you transform D into A using  A = (-1⁄2)*D².
• you scale matrix a resulting in Matrix Delta, so that delta[j,i]=A[j,i]mean(j)mean(i)+mean(A)
• mean(j) is rowMean.
• mean(i) is column mean.
• You calculate eigenvectors and eigenvalues of Delta (results in n eigenvectors with n being the number of samples)
• The Eigenvectors are scaled using their corresponding eigenvalues.
• $NewVector <- Eigenvector*sqrt(Eigenvalue) • When the scaled eigenvectors are written as columns, the rows of the resulting table are the coordinates of the objects in PCoA space. • In other words the columns given by for example cmdscale are the eigenvectors. Here is the R code > D<-matrix(c(0,3.16228, 3.16228, 7.07197, 7.07197, 3.16228, 0, 4.47214, 4.47214, + 6.32456, 3.16228, 4.47214, 0, 6.32456, 4.47214, + 7.07197, 4.47214, 6.32456, 0, 4.47214, + 7.07197, 6.32456, 4.47214, 4.47214, 0), nrow = 5) > D [,1] [,2] [,3] [,4] [,5] [1,] 0.00000 3.16228 3.16228 7.07197 7.07197 [2,] 3.16228 0.00000 4.47214 4.47214 6.32456 [3,] 3.16228 4.47214 0.00000 6.32456 4.47214 [4,] 7.07197 4.47214 6.32456 0.00000 4.47214 [5,] 7.07197 6.32456 4.47214 4.47214 0.00000 > A <- -1/2*D^2 > A [,1] [,2] [,3] [,4] [,5] [1,] 0.000000 -5.000007 -5.000007 -25.00638 -25.00638 [2,] -5.000007 0.000000 -10.000018 -10.00002 -20.00003 [3,] -5.000007 -10.000018 0.000000 -20.00003 -10.00002 [4,] -25.006380 -10.000018 -20.000030 0.00000 -10.00002 [5,] -25.006380 -20.000030 -10.000018 -10.00002 0.00000 > RowM <- rowMeans(A) > ColM <- colMeans(A) > RowM [1] -12.002555 -9.000015 -9.000015 -13.001289 -13.001289 > ColM [1] -12.002555 -9.000015 -9.000015 -13.001289 -13.001289 > M <- mean(A) > Delta <- A > for (i in 1:5){ + for (j in 1:5) { + Delta[i,j] <- Delta[i,j] - RowM[i] - ColM[j] + M + } + }  > > M [1] -11.20103 > Delta [,1] [,2] [,3] [,4] [,5] [1,] 12.80408 4.8015296 4.8015296 -11.2035683 -11.2035683 [2,] 4.80153 6.7989968 -3.2010213 0.8002532 -9.1997583 [3,] 4.80153 -3.2010213 6.7989968 -9.1997583 0.8002532 [4,] -11.20357 0.8002532 -9.1997583 14.8015458 4.8015277 [5,] -11.20357 -9.1997583 0.8002532 4.8015277 14.8015458 > Eigen <- eigen(Delta) > Eigen$values
[1] 3.600795e+01 2.000003e+01 6.582800e-06 4.729290e-15 -2.820049e-03

$vectors [,1] [,2] [,3] [,4] [,5] [1,] -0.5963431 0.0 -8.610832e-12 -0.4472136 0.6666145 [2,] -0.2235630 -0.5 5.000000e-01 0.4472136 -0.5000196 [3,] -0.2235630 0.5 -5.000000e-01 -0.4472136 -0.5000196 [4,] 0.5217346 -0.5 -5.000000e-01 -0.4472136 0.1667123 [5,] 0.5217346 0.5 5.000000e-01 -0.4472136 0.1667123 > FirstEigenvector <- Eigen$vectors[,1]
> FirstEigenvalue <- Eigen$values[1] > SecondEigenvector <- Eigen$vectors[,2]
> SecondEigenvalue <- Eigen\$values[2]
> FirstEigenvector
[1] -0.5963431 -0.2235630 -0.2235630 0.5217346 0.5217346
> FirstEigenvalue
[1] 36.00795
> PrincipalCoords <- data.frame(First = FirstEigenvector*sqrt(FirstEigenvalue), + Second = SecondEigenvector * sqrt(SecondEigenvalue))
> PrincipalCoords
First               Second
1 -3.578454   0.00000
2 -1.341526   -2.23607
3 -1.341526   2.23607
4 3.130753    -2.23607
5 3.130753    2.23607
> CmdScale <- cmdscale(D, k =2)
> CmdScale
[,1]          [,2]
[1,] -3.578454   0.00000
[2,] -1.341526   -2.23607
[3,] -1.341526   2.23607
[4,] 3.130753   -2.23607
[5,] 3.130753   2.23607

Warnings

If a PCoA axis has a negative eigenvalue associated with it, imaginary numbers are generated during the analysis and prevent Euclidean representation. Such eigenvalues may arise when using certain (dis)similarity measures that are either semi- or non-metric or those that exhibit other forms of non-Euclideanarity. To correct for these, transformations of the original data are needed which aim at making small dissimilarities larger relative to large dissimilarities. Taking the square root of dissimilarities or adding a constant to all dissimilarities sufficient to remove negative eigenvalues are viable options (Legendre and Legendre 1998).

Objects that have variable values that introduce large amounts of variation to the overall data set may strongly influence the ordination, making patterns of other objects less visible. It may be instructive to examine a PCoA solution that excludes such objects.

The values of the objects along a PCoA axis of interest may be correlated (using an appropriate measure) with those of environmental variables to assess association. However, PCoA is a form of indirect gradient analysis; therefore, other methods, such as distance-based redundancy analysis (db-RDA)[will explain in next tutorial], are likely to offer more utility in assessing the influence of environmental variables.

If you have any doubts please mention in your comments or shoot me a mail at irrfankhann29@gmail.com