We load up library vegan
and we source (i.e. read in) a file that has some useful functions for exploring PCA outputs (this file is in the folder for Aula T23).
library(vegan)
source("brocardfunctions.R")
Recolheram-se dados de abundância de várias espécies de anfíbios em vários distritos de Portugal (DataTP10anfibios.csv). Efectue uma análise de componentes principais aos dados e retire as principais conclusões.
As usual, we begin by reading the data
anfibios<-read.csv2("DataTP10anfibios.csv")
and we check it all looks OK
str(anfibios)
## 'data.frame': 15 obs. of 7 variables:
## $ Distritos : Factor w/ 15 levels "AVEI","BEJA",..: 7 2 8 13 11 10 5 14 9 1 ...
## $ Rana : num 4.44 2.18 0.47 1.22 4.66 2.68 1.45 0 1 2.35 ...
## $ Triturus : num 0 0 0 2.45 0 0 1.88 0 0 0 ...
## $ Salamandra : num 35.96 41.1 35.38 20.92 5.12 ...
## $ Hyla : num 3.2 0 0.71 0 1.82 0 0 1.5 1.71 6.47 ...
## $ Bufo : num 4.8 16.4 14.8 44.1 23.6 ...
## $ Pleurodeles: num 16 0.48 9.3 11.46 20.76 ...
We have 15 observations for 7 variables, which correspond respectively to locations and taxa (genera of amphibians). The first column does not represent a genera, but it contains the labels of the locations
anfibios[,1]
## [1] EVOR BEJA FARO SETU LISB LEIR CBRA VISE GUAR AVEI COIM PORT BRAG BRCA
## [15] VREA
## 15 Levels: AVEI BEJA BRAG BRCA CBRA COIM EVOR FARO GUAR LEIR LISB ... VREA
The names of the genera are
names(anfibios)[2:7]
## [1] "Rana" "Triturus" "Salamandra" "Hyla" "Bufo"
## [6] "Pleurodeles"
We can also see and if some locations have much more amphibians than others
nbyloc=data.frame(loc=anfibios[,1],n=rowSums(anfibios[,2:7]))
nbyloc
## loc n
## 1 EVOR 64.40
## 2 BEJA 60.12
## 3 FARO 60.68
## 4 SETU 80.13
## 5 LISB 56.00
## 6 LEIR 64.44
## 7 CBRA 55.34
## 8 VISE 59.52
## 9 GUAR 44.71
## 10 AVEI 61.30
## 11 COIM 67.15
## 12 PORT 72.92
## 13 BRAG 63.86
## 14 BRCA 58.43
## 15 VREA 0.00
with Vila Real representing a really weird location, with no data (might that be a mistake?).
We can also evaluate what are the most abundant taxa,
colSums(anfibios[,2:7])
## Rana Triturus Salamandra Hyla Bufo Pleurodeles
## 21.99 5.44 146.68 36.63 471.83 186.43
By now, on lecture TP13, we should be fast at doing all the above.
Given that a PCA is a dimension reduction technique, we can see if the genera are highly correlated or not. The analysis will be more efficient if there are high correlations to explore (e.g. two variables with perfect correlation can be summarized in a single component!)
panel.hist <- function(x, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5) )
h <- hist(x, plot = FALSE)
breaks <- h$breaks; nB <- length(breaks)
y <- h$counts; y <- y/max(y)
rect(breaks[-nB], 0, breaks[-1], y, col = "cyan", ...)
}
panel.corP <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y))
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste0(prefix, txt)
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
panel.corS <- function(x, y, digits = 2, prefix = "", cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
r <- abs(cor(x, y,method="spearman"))
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste0(prefix, txt)
if(missing(cex.cor)) cex.cor <- 0.8/strwidth(txt)
text(0.5, 0.5, txt, cex = cex.cor * r)
}
pairs(anfibios[,2:7], lower.panel = panel.smooth, upper.panel = panel.corP,
gap=0, row1attop=FALSE,diag.panel = panel.hist)
As it turns out, there is actually no correlation between any two variables larger than 0.65, and this is hinting to the fact that the PCA might not be very efficient to reduce the dimensionality of this data set.
For the PCA plots it might be useful to define row.names
in the object anfibios
, as these will be used by default in the plotting of biplots, hence making the plots easier to interpret.
row.names(anfibios)=anfibios[,1]
Now, we implement the PCA itself. There are several possible functions to do, and we begin by using princomp
. Note that we must remove the first column, as otherwise the labels would be used as a (factor) covariate in the PCA, which would be unreasonable.
acp1<-princomp(anfibios[,-1])
acp1
## Call:
## princomp(x = anfibios[, -1])
##
## Standard deviations:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## 21.3241377 13.8202048 8.8379189 2.0249230 1.0881376 0.5173327
##
## 6 variables and 15 observations.
and we visualize the summary of the analysis
summary(acp1)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4
## Standard deviation 21.324138 13.8202048 8.8379189 2.02492300
## Proportion of Variance 0.623434 0.2618644 0.1070897 0.00562166
## Cumulative Proportion 0.623434 0.8852984 0.9923880 0.99800971
## Comp.5 Comp.6
## Standard deviation 1.088137650 0.5173326744
## Proportion of Variance 0.001623361 0.0003669335
## Cumulative Proportion 0.999633067 1.0000000000
As we can see, we were able to extract 6 principal components, which is the number of genera available in the dataset. This is always the case, at most we extract as many PCs as original variables.
We can also explore some of the components of the PCA output, namely the loadings (i.e. the correlations between the PC’s and the original variables)
acp1$loadings
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## Rana 0.375 0.920
## Triturus -0.239 0.195 -0.951
## Salamandra 0.471 -0.710 0.521
## Hyla 0.893 -0.330 -0.292
## Bufo -0.861 -0.255 0.434
## Pleurodeles 0.171 0.655 0.732
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## SS loadings 1.000 1.000 1.000 1.000 1.000 1.000
## Proportion Var 0.167 0.167 0.167 0.167 0.167 0.167
## Cumulative Var 0.167 0.333 0.500 0.667 0.833 1.000
Note there is an argument to the function that prints the loadings that can be used to force all loadings to appear, since by default only those above 0.1 are displayed
print(acp1$loadings,digits = 2, cutoff = 0.05)
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## Rana 0.06 0.38 0.92 0.10
## Triturus -0.24 0.20 -0.95
## Salamandra 0.47 -0.71 0.52
## Hyla -0.08 0.89 -0.33 -0.29
## Bufo -0.86 -0.26 0.43 -0.06
## Pleurodeles 0.17 0.66 0.73 -0.07
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## SS loadings 1.00 1.00 1.00 1.00 1.00 1.00
## Proportion Var 0.17 0.17 0.17 0.17 0.17 0.17
## Cumulative Var 0.17 0.33 0.50 0.67 0.83 1.00
the scores (the coordinates of the locations in the principal components space) - note one could use these to manually plot the results of a PCA
acp1$scores
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## EVOR 35.9372068 -9.447106 4.8185073 3.725788576 0.39758994
## BEJA 25.9322705 -26.132662 1.0883920 -1.005602272 0.47170637
## FARO 25.9645802 -15.938199 3.7883005 -0.737356805 -1.71864114
## SETU -5.6126292 -11.697835 10.6619966 -3.377098293 0.85090691
## LISB 6.1061455 10.813623 0.4620701 1.348352884 2.53012718
## LEIR 8.9848858 25.213559 10.5964813 -0.466393130 0.23644253
## CBRA 5.5366438 16.597692 2.8113376 -1.813620589 0.31169056
## VISE 6.2936806 19.570766 6.8724397 -0.427544499 -2.25862036
## GUAR -0.4735304 8.222119 -5.4756022 -0.578501670 -0.04296380
## AVEI -13.9550085 1.242643 -2.2288006 3.256310122 0.24966376
## COIM -30.6908939 -6.783789 -0.2957481 -0.008096814 -0.54770479
## PORT -28.3591494 -3.467991 2.1077881 2.418836310 -0.07153958
## BRAG -31.5838299 -8.565286 -1.7535031 -2.719565086 0.72190123
## BRCA -24.5776694 -6.527845 -5.5590466 1.756994063 -1.07362658
## VREA 20.4972974 6.900310 -27.8946126 -1.372502796 -0.05693222
## Comp.6
## EVOR -0.534750516
## BEJA 0.673293918
## FARO 0.200646515
## SETU -0.826266873
## LISB 0.511082088
## LEIR 0.709903936
## CBRA -1.076690623
## VISE 0.118997893
## GUAR 0.355027113
## AVEI -0.365408342
## COIM 0.462117161
## PORT -0.008871033
## BRAG 0.208693187
## BRCA -0.265844705
## VREA -0.161929720
and the standard deviation associated with each PC
acp1$sdev
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## 21.3241377 13.8202048 8.8379189 2.0249230 1.0881376 0.5173327
We can plot the objects produced by a PCA in two different ways. The default plot
function presents a display of the amount of variation explained by each principal component. remember that ideally, the first few components will explain most of the variability!
par(mfrow=c(1,1),mar=c(4,4,0.5,0.5))
plot(acp1)
A second plot, representing sites and species in a biplot, is obtained via function biplot
biplot(acp1,cex=0.7)
Finally, we can use one of the bespoke functions in file brocardfunctions.R
to plot the variances associated with each PC and evaluate two criteria to choose the number of PCs to interpret
evplot(acp1$sdev^2)
Both criteria seem to agree that the more sensible would be to interpret just the first two PCs.
Note that we could have implemented the same analysis using function rda
in library vegan
, which produces slightly different output, but the same results
acpbyrda1<-rda(anfibios[,-1])
acpbyrda1
## Call: rda(X = anfibios[, -1])
##
## Inertia Rank
## Total 781.5
## Unconstrained 781.5 6
## Inertia is variance
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6
## 487.2 204.6 83.7 4.4 1.3 0.3
The summary of this object gives us direct access to something that was hidden above, namely the coordinates of the covariates in the PC space
summary(acpbyrda1)
##
## Call:
## rda(X = anfibios[, -1])
##
## Partitioning of variance:
## Inertia Proportion
## Total 781.5 1
## Unconstrained 781.5 1
##
## Eigenvalues, and their contribution to the variance
##
## Importance of components:
## PC1 PC2 PC3 PC4 PC5
## Eigenvalue 487.1988 204.6408 83.6880 4.393193 1.268618
## Proportion Explained 0.6234 0.2619 0.1071 0.005622 0.001623
## Cumulative Proportion 0.6234 0.8853 0.9924 0.998010 0.999633
## PC6
## Eigenvalue 0.2867497
## Proportion Explained 0.0003669
## Cumulative Proportion 1.0000000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 10.2273
##
##
## Species scores
##
## PC1 PC2 PC3 PC4 PC5 PC6
## Rana -0.24279 0.04961 -0.18772 0.287830 0.378957 -0.0188070
## Triturus 0.04546 -0.01272 -0.08416 -0.183302 0.080555 0.1862655
## Salamandra -3.80663 -3.71720 -1.74307 0.008117 -0.018248 -0.0002021
## Hyla 0.63170 -0.17200 0.06577 0.685012 -0.135978 0.0572872
## Bufo 6.95334 -1.33604 -1.45330 -0.042191 0.009485 -0.0070138
## Pleurodeles -1.38213 3.42864 -2.45037 0.021880 -0.028093 0.0008849
##
##
## Site scores (weighted sums of species scores)
##
## PC1 PC2 PC3 PC4 PC5 PC6
## EVOR -4.45029 -1.8051 -1.43972 4.85876 0.9649 2.72958
## BEJA -3.21133 -4.9933 -0.32520 -1.31139 1.1447 -3.43677
## FARO -3.21533 -3.0454 -1.13190 -0.96158 -4.1708 -1.02418
## SETU 0.69504 -2.2351 -3.18569 -4.40403 2.0650 4.21760
## LISB -0.75616 2.0662 -0.13806 1.75837 6.1401 -2.60877
## LEIR -1.11264 4.8176 -3.16612 -0.60822 0.5738 -3.62364
## CBRA -0.68563 3.1714 -0.84000 -2.36512 0.7564 5.49587
## VISE -0.77938 3.7395 -2.05341 -0.55756 -5.4812 -0.60741
## GUAR 0.05864 1.5710 1.63605 -0.75442 -0.1043 -1.81220
## AVEI 1.72812 0.2374 0.66594 4.24651 0.6059 1.86519
## COIM 3.80061 -1.2962 0.08837 -0.01056 -1.3292 -2.35883
## PORT 3.51186 -0.6626 -0.62978 3.15437 -0.1736 0.04528
## BRAG 3.91119 -1.6366 0.52393 -3.54655 1.7519 -1.06526
## BRCA 3.04358 -1.2473 1.66098 2.29127 -2.6055 1.35698
## VREA -2.53829 1.3185 8.33462 -1.78986 -0.1382 0.82656
We can use a function in brocardfunctions.R
to make a clean plot of this object
cleanplot.pca(acpbyrda1)
and this representation is slightly more useful as it shows us that the variables most important (those which the arrows go outside the “control” circle) are Pleurodeles
, Salamandra
and Bufo
(question: what do these 3 taxa share? They were the most abundant! This is not good.). This last genera name is not printed in the plot, for reasons which are unclear (a possible bug in the function). Nonetheless, the rda
function creates an object that can help us, because it provides the coordinates of the variables in the PC space. We could therefore plot these in PC space if we so wanted and check that the variable that was not printed above was Bufo
plot(summary(acpbyrda1)$species[,1:2],type="n",xlim=c(-5,7))
text(summary(acpbyrda1)$species[,1:2],names(anfibios[,-1]))
Note that the results of the princomp
and of rda
might look different, for example, the sites scores. The sites scores in the first component are
#princomp
acp1$scores[,1]
## EVOR BEJA FARO SETU LISB LEIR
## 35.9372068 25.9322705 25.9645802 -5.6126292 6.1061455 8.9848858
## CBRA VISE GUAR AVEI COIM PORT
## 5.5366438 6.2936806 -0.4735304 -13.9550085 -30.6908939 -28.3591494
## BRAG BRCA VREA
## -31.5838299 -24.5776694 20.4972974
#rda
summary(acpbyrda1)$sites[,1]
## EVOR BEJA FARO SETU LISB LEIR
## -4.45028832 -3.21132583 -3.21532690 0.69504061 -0.75615526 -1.11264441
## CBRA VISE GUAR AVEI COIM PORT
## -0.68563095 -0.77937869 0.05863969 1.72812015 3.80061052 3.51185867
## BRAG BRCA VREA
## 3.91118736 3.04357864 -2.53828528
but what we should keep in mind is that signs in scores are not possible to interpret, and in fact magnitudes are not easily interpreted either. It is only relative to each other that they can be interpreted. And relative to each other, the correlations between the sites scores across methods is perfect - as we can see below all 1 or -1 - so the results just look different at first, but are exactly the same!
cor(acp1$scores[,1],summary(acpbyrda1)$sites[,1])
## [1] -1
cor(acp1$scores[,2],summary(acpbyrda1)$sites[,2])
## [1] 1
cor(acp1$scores[,3],summary(acpbyrda1)$sites[,3])
## [1] -1
cor(acp1$scores[,4],summary(acpbyrda1)$sites[,4])
## [1] 1
cor(acp1$scores[,5],summary(acpbyrda1)$sites[,5])
## [1] 1
cor(acp1$scores[,6],summary(acpbyrda1)$sites[,6])
## [1] -1
Finally, we should note that it might be more sensible, given the data were abundances, to scale the data prior to the analysis. If we do that, say with the rda
function, this is what we get
acpbyrda2<-rda(scale(anfibios[,-1]))
acpbyrda2
## Call: rda(X = scale(anfibios[, -1]))
##
## Inertia Rank
## Total 6
## Unconstrained 6 6
## Inertia is variance
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6
## 2.3287 1.3351 1.1423 0.8014 0.2688 0.1238
As noted above, the summary of this object gives us direct access to the coordinates of the covariates in the PC space
summary(acpbyrda2)
##
## Call:
## rda(X = scale(anfibios[, -1]))
##
## Partitioning of variance:
## Inertia Proportion
## Total 6 1
## Unconstrained 6 1
##
## Eigenvalues, and their contribution to the variance
##
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Eigenvalue 2.3287 1.3351 1.1423 0.8014 0.26875 0.12379
## Proportion Explained 0.3881 0.2225 0.1904 0.1336 0.04479 0.02063
## Cumulative Proportion 0.3881 0.6106 0.8010 0.9346 0.97937 1.00000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 3.0274
##
##
## Species scores
##
## PC1 PC2 PC3 PC4 PC5 PC6
## Rana -0.70518 -0.5605 0.35220 -0.71704 0.26273 0.09418
## Triturus -0.05266 1.0845 -0.09913 -0.54777 0.10755 -0.16477
## Salamandra -0.74030 -0.2325 -0.81806 -0.34136 -0.37349 -0.01298
## Hyla 0.95051 -0.6409 0.15236 -0.31626 -0.09727 -0.28401
## Bufo 1.07507 0.2143 0.12125 -0.43920 -0.22054 0.26382
## Pleurodeles -0.67071 0.1960 0.95069 0.02149 -0.36379 -0.05137
##
##
## Site scores (weighted sums of species scores)
##
## PC1 PC2 PC3 PC4 PC5 PC6
## EVOR -1.11472 -1.16822 -0.2980 -0.9783 -0.05659 -1.1615
## BEJA -0.80364 -0.48583 -1.5122 -0.1381 -0.18387 1.0847
## FARO -0.59632 -0.16407 -1.1444 0.5814 -1.21466 -0.1498
## SETU -0.19929 1.71236 -0.6032 -1.2792 -0.12597 -0.1142
## LISB -0.64957 -0.67438 0.9097 -0.6987 1.13161 1.0599
## LEIR -0.83183 0.04880 1.4780 0.3464 -0.54002 0.9706
## CBRA -0.50210 1.43906 0.6390 -0.3091 0.37601 -1.0968
## VISE -0.21402 0.29417 0.8576 1.1450 -1.39824 -0.4538
## GUAR 0.03154 -0.01817 0.2578 0.6898 0.30029 0.3949
## AVEI 0.63058 -0.78602 0.4346 -0.5638 0.33398 -0.7674
## COIM 1.18052 -0.15962 -0.1215 0.1496 -0.42285 0.6488
## PORT 1.07042 -0.62408 0.3282 -0.5670 -0.21849 -0.1281
## BRAG 0.94564 0.88104 -0.3997 -0.2379 0.34510 1.0314
## BRCA 1.26232 -0.41002 -0.2108 0.1312 -0.16822 -0.7446
## VREA -0.20951 0.11499 -0.6152 1.7287 1.84190 -0.5739
Make a clean plot of the output
cleanplot.pca(acpbyrda2)
The results are now considerably different, and this means that prior results were (inadvertently) dominated by the most abundant species. Typically, that is not desired, and therefore this might be a better analysis to interpret.
In this case we might need more than 2 PCs to interpret. Can you explain why this is a better analysis and yet it does not manage to represent as well the original information in just 2 PC’s?
evplot(acpbyrda2$CA$eig)
Efectue uma análise de componentes principais aos dados das características de diferentes serras portuguesas (DataTP10serras.csv), com base na matriz de variância/covariância e na matriz de correlação. Comente os principais resultados obtidos em cada uma das análises e refira em que circunstâncias cada uma destas abordagens seria mais adequada.
We read in the data
serras<-read.csv("DataTP10serras.csv", sep=";")
as before, check it is all OK
str(serras)
## 'data.frame': 15 obs. of 6 variables:
## $ Serra : Factor w/ 15 levels "Acor","Alvao",..: 5 8 13 1 15 11 12 3 2 7 ...
## $ Altitude : int 1993 1548 1486 1418 1416 1415 1382 1362 1283 1227 ...
## $ Perc.Floresta: int 20 21 18 17 16 15 25 32 45 54 ...
## $ Dispon.agua : int 5 6 7 8 9 5 4 10 12 14 ...
## $ Matos : int 75 48 76 98 56 47 56 35 34 54 ...
## $ Incendios : int 3 4 6 3 6 5 4 12 15 14 ...
We have 15 observations for 6 variables, which correspond respectively to serras and their characteristics. The first column does not represent a a characteristic, but it contains the labels of the serras.
As above, we begin with exploring the data
serras[,1]
## [1] Estrela Geres Montesinho Acor Soajo Marao
## [7] Montemuro Amarela Alvao Gardunha Lousa Bornes
## [13] Padrela Falperra Gralheira
## 15 Levels: Acor Alvao Amarela Bornes Estrela Falperra Gardunha ... Soajo
The names of the serras are
names(serras)[2:6]
## [1] "Altitude" "Perc.Floresta" "Dispon.agua" "Matos"
## [5] "Incendios"
What are the correlations between descriptors?
pairs(serras[,2:6], lower.panel = panel.smooth, upper.panel = panel.corP,gap=0, row1attop=FALSE,diag.panel = panel.hist)
Some of the correlations are quite high, as high as 0.95, so the PCA technique might be efficient.
As before, we add the right labels to the row.names
row.names(serras)=serras[,1]
Now, we implement the PCA itself, first on the covariance matrix
acp.serras<-princomp(serras[,-1])
#this is the same as
#acp.serras<-princomp(serras[,-1],cor=FALSE)
acp.serras
## Call:
## princomp(x = serras[, -1])
##
## Standard deviations:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## 215.332725 15.664152 11.295886 2.235848 1.010098
##
## 5 variables and 15 observations.
We can look at the summary of the resulting object
summary(acp.serras)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4
## Standard deviation 215.3327250 15.664151562 11.295885593 2.2358479945
## Proportion of Variance 0.9918929 0.005248781 0.002729514 0.0001069373
## Cumulative Proportion 0.9918929 0.997141723 0.999871237 0.9999781741
## Comp.5
## Standard deviation 1.010098e+00
## Proportion of Variance 2.182589e-05
## Cumulative Proportion 1.000000e+00
We can also look at some of the components of the object created
acp.serras$loadings
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Altitude 0.997
## Perc.Floresta 0.555 0.779 -0.263 0.116
## Dispon.agua 0.130 0.792 0.593
## Matos -0.805 0.589
## Incendios 0.186 0.171 0.551 -0.795
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## SS loadings 1.0 1.0 1.0 1.0 1.0
## Proportion Var 0.2 0.2 0.2 0.2 0.2
## Cumulative Var 0.2 0.4 0.6 0.8 1.0
Apparently, the main driver of the variability would be Altitude
. But in fact, the truth is that the altitude is measured with a higher order of magnitude (1000’s) than the other variables (10’s or 100’s). This suggests that the analysis based on the correlation matrix is much more sensible than one based on the covariances.
We can look at the scores of the analysis
acp.serras$scores
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Estrela 637.772883 18.577037 15.8304839 0.6159163 0.401905823
## Geres 192.883507 8.458511 -11.0379411 -1.6814589 1.333573829
## Montesinho 132.409285 -19.853092 1.9160680 0.9369192 -1.609093482
## Acor 65.680113 -43.598861 11.8774196 0.1551196 -0.022792148
## Soajo 61.852382 -9.868268 -13.0582419 2.3130933 0.366487802
## Marao 60.605771 -3.704939 -19.8566483 -1.2639212 -0.835996979
## Montemuro 27.529712 -8.078386 -7.9617387 -5.3448661 0.003224666
## Amarela 6.004445 13.130049 -13.2810904 1.5715964 -0.852477944
## Alvao -73.671336 16.039050 -5.1146234 0.8413669 -0.563993596
## Gardunha -129.183895 0.768920 12.2499784 -0.6179373 1.312161511
## Lousa -152.331717 13.870548 16.2403720 0.8775077 -1.907480327
## Bornes -155.527864 -3.763524 -4.0235285 4.6340288 1.106989884
## Padrela -210.395347 1.503908 9.0297213 -1.4836588 1.111213174
## Falperra -222.481752 5.347614 -0.1898093 0.9765820 0.807563740
## Gralheira -241.146186 11.171433 7.3795786 -2.5302879 -0.651285955
and the plot it
par(mfrow=c(1,2),mar=c(4,4,0.5,0.5))
plot(acp.serras)
biplot(acp.serras,cex=0.7)
No, we implement the PCA on the correlation matrix, which will decrease the influence of altitude in the analysis
acp.serras2<-princomp(serras[,-1],cor=TRUE)
which we note in passing is equivalent with an analysis on the covariance matrix of the scaled data (scaled data have, by definition, variances of 1, and hence the covariances and correlations are the same)
#which is the same as
acp.serras2<-princomp(scale(serras[,-1]))
Then we can look at the object resulting
acp.serras2
## Call:
## princomp(x = serras[, -1], cor = TRUE)
##
## Standard deviations:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## 1.9774983 0.7738141 0.5699272 0.3709750 0.1681455
##
## 5 variables and 15 observations.
summary(acp.serras2)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4
## Standard deviation 1.9774983 0.7738141 0.5699272 0.37097499
## Proportion of Variance 0.7820999 0.1197576 0.0649634 0.02752449
## Cumulative Proportion 0.7820999 0.9018575 0.9668209 0.99434542
## Comp.5
## Standard deviation 0.168145492
## Proportion of Variance 0.005654581
## Cumulative Proportion 1.000000000
acp.serras2$loadings
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Altitude 0.436 0.867 0.218
## Perc.Floresta -0.472 -0.171 0.401 -0.617 0.454
## Dispon.agua -0.463 -0.367 0.739 0.313
## Matos 0.359 -0.907 -0.157 -0.153
## Incendios -0.494 0.281 -0.820
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## SS loadings 1.0 1.0 1.0 1.0 1.0
## Proportion Var 0.2 0.2 0.2 0.2 0.2
## Cumulative Var 0.2 0.4 0.6 0.8 1.0
the summary of the object
summary(acp.serras2)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4
## Standard deviation 1.9774983 0.7738141 0.5699272 0.37097499
## Proportion of Variance 0.7820999 0.1197576 0.0649634 0.02752449
## Cumulative Proportion 0.7820999 0.9018575 0.9668209 0.99434542
## Comp.5
## Standard deviation 0.168145492
## Proportion of Variance 0.005654581
## Cumulative Proportion 1.000000000
and some of its components, as before, the loadings
acp.serras2$loadings
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Altitude 0.436 0.867 0.218
## Perc.Floresta -0.472 -0.171 0.401 -0.617 0.454
## Dispon.agua -0.463 -0.367 0.739 0.313
## Matos 0.359 -0.907 -0.157 -0.153
## Incendios -0.494 0.281 -0.820
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## SS loadings 1.0 1.0 1.0 1.0 1.0
## Proportion Var 0.2 0.2 0.2 0.2 0.2
## Cumulative Var 0.2 0.4 0.6 0.8 1.0
and the scores
acp.serras2$scores
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Estrela 3.3843786 -0.1701555286 1.72232398 0.07514264 0.07994403
## Geres 1.6882639 0.9496873031 0.06625469 0.01739621 0.24404943
## Montesinho 1.9511841 -0.6533639671 -0.19031455 -0.02727515 -0.28748056
## Acor 2.4374279 -1.9059101727 -0.65104457 -0.07036841 -0.02629920
## Soajo 1.2117037 0.2167791652 -0.43885015 0.52022847 -0.01106022
## Marao 1.5864603 1.0781288710 -0.57641648 -0.08816721 -0.12645700
## Montemuro 1.6429874 0.5896419318 -0.57088183 -0.72723636 0.10318056
## Amarela -0.3735097 1.0037137166 0.04488904 0.26602348 -0.17258861
## Alvao -1.3746378 0.6836330476 0.20072489 0.09967202 -0.10143626
## Gardunha -1.4495942 -0.6636659398 0.13075576 -0.08309614 0.23338604
## Lousa -2.6682495 -0.4638399285 0.65613646 -0.24793196 -0.29699170
## Bornes -1.4855502 -0.3421301766 -0.25978905 0.83528254 0.06845791
## Padrela -1.9024131 -0.4209728434 -0.07063063 -0.21719952 0.21754209
## Falperra -2.1819737 -0.0009751006 -0.14613111 0.20411953 0.11004382
## Gralheira -2.4664777 0.0994296218 0.08297355 -0.55659014 -0.03429033
Now, all variables have a reasonable influence on the first component. We can plot the resulting object
par(mfrow=c(1,2),mar=c(4,4,0.5,0.5))
plot(acp.serras2)
biplot(acp.serras2,cex=0.7)
In this case, given all the variables are measured in different units, it would only make sense to interpret the analysis over the correlation matrix, since the analysis based on the covariances suffers a strong impact from the measuring units of each variable (in other words, the analysis would return different results if say altitude was measured in km and in meters!). Leveraging on the nice clean biplot function (cleanplot.pca
), for the scaled analysis we get the following biplots
rda.serras2s<-rda(scale(serras[,-1]))
cleanplot.pca(rda.serras2s)
showing that matos
seems to be the main driver of the variability in Serras. Notice that the conclusions would be extremely different if interpreting the analysis based on the covariance and the correlation matrix.
Just to finish, note how the different principal components are indeed independent (i.e. uncorrelated, and while not shown here for both, this obviously happens for both covariance and correlation based analysis)
round(cor(acp.serras$scores),4)
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Comp.1 1 0 0 0 0
## Comp.2 0 1 0 0 0
## Comp.3 0 0 1 0 0
## Comp.4 0 0 0 1 0
## Comp.5 0 0 0 0 1
This happens by construction, i.e. a PCA is the mathematical procedure that obtains the linear combinations of the original covariates which simultaneous respect two conditions:
Recolheu dados relativos a densidades de várias espécies de anfíbios e répteis em três tipos florestais (pinhal, eucaliptal e sobro) (DataTP10herpetofauna.csv). Realize uma análise de correspondências e comente os resultados obtidos. Efectue uma análise de componentes principais aos mesmos dados compare os resultados com os obtidos na análise de correspondências.
We read the data in
herpeto <- read.csv2("DataTP10herpetofauna.csv")
and we look at the data
str(herpeto)
## 'data.frame': 22 obs. of 11 variables:
## $ Habitat : Factor w/ 22 levels "Eucalip1","Eucalip2",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Chalcides : num 0.05 0.02 0.01 0.01 0.02 0.08 0.03 0.02 0 0.01 ...
## $ Triturus : num 0.22 0.18 0.18 0.15 0.12 0.03 0 0.02 0.03 0 ...
## $ Bufo : num 0 0.04 0.05 0.01 0.04 0 0 0 0.01 0.01 ...
## $ Rana : num 0.2 0.13 0.11 0.1 0.02 0.02 0.02 0.02 0.01 0 ...
## $ Salamandra : num 0.03 0 0 0.01 0 0.02 0 0 0.01 0.01 ...
## $ Lacerta : num 0.06 0.15 0.13 0.17 0.37 0.21 0.19 0.19 0.2 0.23 ...
## $ Podarcis : num 0.16 0 0.09 0.1 0.12 0.12 0.34 0.2 0.26 0.11 ...
## $ Coluber : num 0.03 0.05 0.08 0.04 0.08 0.16 0.11 0.31 0.25 0.21 ...
## $ Malpolon : num 0.06 0.04 0.06 0.05 0.02 0.05 0.1 0.03 0.08 0.1 ...
## $ Psamodromus: num 0.16 0.36 0.32 0.33 0.16 0.32 0.17 0.2 0.14 0.27 ...
We can check upfront how many observations (replicates) exist in each habitat by using function substr
(subset a string, i.e. a word), then table
to count how many different habitats, then barplot
to make a suitable plot of the result
barplot(table(substr(herpeto$Habitat,1,4)))
Given the way the material was presented in the lectures, we first implement the PCA.
The labels of the locations are
herpeto[,1]
## [1] Eucalip1 Eucalip2 Eucalip3 Eucalip4 Eucalip5 Eucalip6 Eucalip7
## [8] Eucalip8 Pinhal1 Pinhal2 Pinhal3 Pinhal4 Pinhal5 Pinhal6
## [15] Pinhal7 Pinhal8 Sobro1 Sobro2 Sobro3 Sobro4 Sobro5
## [22] Sobro6
## 22 Levels: Eucalip1 Eucalip2 Eucalip3 Eucalip4 Eucalip5 ... Sobro6
and this seems to imply there are 3 types of habitats, Eucaliptal, Pinhal e Sobro. We can create a separate variable to hold the type of habitat - this could be useful for plotting say points with different colors per habitat
herpeto.hab=substr(herpeto[,1],1,4)
herpeto.hab
## [1] "Euca" "Euca" "Euca" "Euca" "Euca" "Euca" "Euca" "Euca" "Pinh" "Pinh"
## [11] "Pinh" "Pinh" "Pinh" "Pinh" "Pinh" "Pinh" "Sobr" "Sobr" "Sobr" "Sobr"
## [21] "Sobr" "Sobr"
The names of the genera are
names(herpeto)[-1]
## [1] "Chalcides" "Triturus" "Bufo" "Rana" "Salamandra"
## [6] "Lacerta" "Podarcis" "Coluber" "Malpolon" "Psamodromus"
We can also see and if some locations have much more animals than others
nbyloc=data.frame(loc=herpeto[,1],n=rowSums(herpeto[,-1]))
nbyloc
## loc n
## 1 Eucalip1 0.97
## 2 Eucalip2 0.97
## 3 Eucalip3 1.03
## 4 Eucalip4 0.97
## 5 Eucalip5 0.95
## 6 Eucalip6 1.01
## 7 Eucalip7 0.96
## 8 Eucalip8 0.99
## 9 Pinhal1 0.99
## 10 Pinhal2 0.95
## 11 Pinhal3 0.98
## 12 Pinhal4 0.96
## 13 Pinhal5 0.96
## 14 Pinhal6 0.92
## 15 Pinhal7 0.92
## 16 Pinhal8 0.95
## 17 Sobro1 0.93
## 18 Sobro2 0.92
## 19 Sobro3 0.90
## 20 Sobro4 0.91
## 21 Sobro5 0.91
## 22 Sobro6 1.00
and in fact these numbers are extremely suspicious to me, as they are way too close to 1 to being true… Not sure what happened here, but I suspect some scaling must have happened at some level.
We can also evaluate what are the most abundant taxa, and show them in increasing order
sort(colSums(herpeto[,-1]))
## Bufo Chalcides Rana Triturus Salamandra Podarcis
## 0.18 0.34 0.67 0.95 1.17 1.66
## Malpolon Psamodromus Lacerta Coluber
## 1.89 2.70 5.52 5.97
Given that a PCA is a dimension reduction technique, we can see if the genera are highly correlated or not. The analysis will be more efficient if there are high correlations to explore (e.g. two variables with perfect correlation can be summarized in a single component!)
pairs(herpeto[,-1], lower.panel = panel.smooth, upper.panel = panel.corP,
gap=0, row1attop=FALSE,diag.panel = panel.hist)
As it turns out, there are a couple of large correlations, but many low correlations too, so it’s unclear how the PCA might perform.
For the PCA plots it might be useful to define row.names
in the object herpeto
, as these will be used by default in the plotting of biplots, hence making the plots easier to interpret.
row.names(herpeto)=herpeto[,1]
Now, we implement the PCA itself
acph1<-princomp(herpeto[,-1])
acph1
## Call:
## princomp(x = herpeto[, -1])
##
## Standard deviations:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## 0.222253341 0.116857752 0.084798534 0.065174317 0.049421709 0.027323770
## Comp.7 Comp.8 Comp.9 Comp.10
## 0.019466369 0.012803054 0.007711547 0.005515373
##
## 10 variables and 22 observations.
and we visualize the summary
sumacp1=summary(acph1)
sumacp1
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4
## Standard deviation 0.2222533 0.1168578 0.08479853 0.06517432
## Proportion of Variance 0.6307611 0.1743747 0.09182164 0.05424020
## Cumulative Proportion 0.6307611 0.8051358 0.89695745 0.95119765
## Comp.5 Comp.6 Comp.7 Comp.8
## Standard deviation 0.04942171 0.027323770 0.019466369 0.012803054
## Proportion of Variance 0.03118917 0.009533439 0.004838807 0.002093127
## Cumulative Proportion 0.98238683 0.991920266 0.996759072 0.998852199
## Comp.9 Comp.10
## Standard deviation 0.0077115470 0.0055153726
## Proportion of Variance 0.0007593664 0.0003884347
## Cumulative Proportion 0.9996115653 1.0000000000
We can also explore some of the components of the PCA output, namely the loadings (i.e. the correlations between the PC’s and the original variables)
print(acph1$loadings,digits = 2, cutoff = 0.2)
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## Chalcides 0.75 0.58 0.31
## Triturus 0.25 -0.38 -0.45 -0.34 0.33 0.35
## Bufo -0.43 0.33
## Rana -0.25 -0.25 -0.28 0.31 -0.64 0.36
## Salamandra -0.27 0.33 -0.81 0.29
## Lacerta -0.87 0.36
## Podarcis 0.25 0.87 0.31
## Coluber -0.71 0.41 0.36 0.35
## Malpolon -0.40 0.64 0.53 0.33
## Psamodromus 0.51 0.68 0.36 0.30
## Comp.10
## Chalcides
## Triturus -0.45
## Bufo 0.81
## Rana 0.31
## Salamandra
## Lacerta
## Podarcis
## Coluber
## Malpolon
## Psamodromus
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## SS loadings 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
## Proportion Var 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1
## Cumulative Var 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
## Comp.9 Comp.10
## SS loadings 1.0 1.0
## Proportion Var 0.1 0.1
## Cumulative Var 0.9 1.0
the scores (the coordinates of the locations in the principal components space) - note one could use these to manually plot the results of a PCA
acph1$scores
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Eucalip1 0.33036012 0.1190746337 -0.0536501941 -0.173598501 -0.090526132
## Eucalip2 0.34584060 0.0517842147 -0.1721385189 0.058846301 0.009082210
## Eucalip3 0.32476098 0.0779560507 -0.0841678556 0.019993831 -0.005314638
## Eucalip4 0.34143688 0.0258898542 -0.0655258483 0.025832367 0.013281281
## Eucalip5 0.17800655 -0.1704727058 -0.0023542392 0.002300882 -0.068136960
## Eucalip6 0.20429760 0.0187811424 0.0293535323 0.106670305 0.061945854
## Eucalip7 0.21143613 -0.0009541545 0.2399409213 -0.066183602 0.019471640
## Eucalip8 0.06037370 0.0814274760 0.1267152444 0.075954562 -0.032648361
## Pinhal1 0.07974812 0.0415702696 0.1736002759 -0.017164678 -0.022469305
## Pinhal2 0.12084691 0.0036613757 0.0441539446 0.089594180 0.082344779
## Pinhal3 -0.03808968 -0.0671599482 -0.0092753287 -0.076925610 0.044006065
## Pinhal4 -0.10939491 -0.2021268920 -0.0189188778 -0.001490877 0.002516037
## Pinhal5 -0.11011557 -0.1101631123 0.0179693053 0.027027647 -0.028576384
## Pinhal6 -0.16187489 -0.1856887962 -0.0266235105 -0.018859311 -0.003899037
## Pinhal7 -0.20549424 -0.0722419492 -0.0272707540 0.078608606 -0.073942208
## Pinhal8 -0.09507087 -0.1578061781 -0.0281760532 -0.060250364 0.050353478
## Sobro1 -0.25176387 0.0667264114 -0.0264528531 -0.007841541 -0.006905388
## Sobro2 -0.12468269 -0.0152867492 -0.0482293582 -0.075773264 0.091258354
## Sobro3 -0.20139685 -0.0139815965 -0.0293047807 -0.030779206 0.009969235
## Sobro4 -0.29326921 0.1501800341 0.0001913402 0.040198823 -0.043555560
## Sobro5 -0.27961522 0.0673598930 -0.0156762094 0.045156030 -0.062687574
## Sobro6 -0.32633958 0.2914707263 -0.0241601824 -0.041316579 0.054432614
## Comp.6 Comp.7 Comp.8 Comp.9
## Eucalip1 0.0003849084 0.0340361565 -0.0057888571 -0.0002534738
## Eucalip2 0.0119767752 -0.0030303004 -0.0061535217 -0.0058286363
## Eucalip3 0.0168104838 -0.0340484229 0.0052779494 0.0111880085
## Eucalip4 -0.0099672205 -0.0091393928 -0.0152754544 -0.0041853267
## Eucalip5 -0.0261431923 -0.0267621951 0.0395666453 -0.0044229935
## Eucalip6 -0.0198959109 0.0530281565 0.0213244051 0.0070892394
## Eucalip7 -0.0027819539 0.0040587949 -0.0018174578 -0.0099902259
## Eucalip8 0.0012641350 0.0018305978 -0.0121284771 0.0067218530
## Pinhal1 0.0064195679 -0.0319496192 -0.0026199560 0.0049651031
## Pinhal2 0.0190603464 -0.0059887697 -0.0101173348 -0.0098615361
## Pinhal3 0.0159393830 0.0123083101 -0.0081942702 0.0105938377
## Pinhal4 0.0241696557 -0.0097383806 -0.0060933523 0.0095122430
## Pinhal5 -0.0201931467 0.0136854862 -0.0031822026 0.0080259948
## Pinhal6 0.0516448822 -0.0013358857 -0.0074970018 0.0016667964
## Pinhal7 -0.0479037860 0.0075133298 -0.0171264087 -0.0014230612
## Pinhal8 -0.0100291625 0.0110885427 0.0065037999 0.0035186253
## Sobro1 -0.0258301614 -0.0041949108 -0.0013418082 -0.0033168842
## Sobro2 -0.0248019521 -0.0072512355 0.0027701275 -0.0114785828
## Sobro3 0.0016150264 -0.0030127952 0.0002010651 -0.0116901771
## Sobro4 0.0773930484 0.0168840405 0.0195055872 -0.0065897458
## Sobro5 -0.0142070233 0.0005152552 -0.0058044624 -0.0063703662
## Sobro6 -0.0249247029 -0.0184967623 0.0079909856 0.0121293084
## Comp.10
## Eucalip1 -0.0008319545
## Eucalip2 0.0092156639
## Eucalip3 0.0061611483
## Eucalip4 -0.0159141142
## Eucalip5 -0.0025343131
## Eucalip6 -0.0015222878
## Eucalip7 0.0090531127
## Eucalip8 -0.0030570981
## Pinhal1 -0.0044147046
## Pinhal2 0.0005058102
## Pinhal3 0.0023686708
## Pinhal4 0.0083093312
## Pinhal5 -0.0002895905
## Pinhal6 -0.0085563999
## Pinhal7 0.0028535561
## Pinhal8 -0.0019751886
## Sobro1 0.0015824262
## Sobro2 -0.0007066841
## Sobro3 -0.0002147896
## Sobro4 -0.0010813382
## Sobro5 0.0025031725
## Sobro6 -0.0014544285
We can plot the objects produced by the PCA
par(mfrow=c(1,1),mar=c(4,4,0.5,0.5))
plot(acph1)
biplot(acph1,cex=0.7)
as for previous exercises, and not surprisingly given the unscaled analysis (i.e. using the covariance matrix), Lacerta
and Coluber
, the most abundant species, are the ones driving most of the variability on the first PC.
Finally, we can use one of the bespoke functions in file brocardfunctions.R
to plot the variances associated with each PC and two criteria to choose the number of PCs to interpret
evplot(acph1$sdev^2)
Both criteria seem to agree that the more sensible would be to interpret just the first two PCs.
Note that we could have implemented the same analysis using function rda
in library vegan
, which produces slightly different output, but the same results
acphbyrda1<-rda(herpeto[,-1])
acphbyrda1
## Call: rda(X = herpeto[, -1])
##
## Inertia Rank
## Total 0.08204
## Unconstrained 0.08204 10
## Inertia is variance
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9
## 0.05175 0.01431 0.00753 0.00445 0.00256 0.00078 0.00040 0.00017 0.00006
## PC10
## 0.00003
The summary of this object gives us direct access to something that was hidden above, namely the coordinates of the covariates in the PC space
summary(acphbyrda1)
##
## Call:
## rda(X = herpeto[, -1])
##
## Partitioning of variance:
## Inertia Proportion
## Total 0.08204 1
## Unconstrained 0.08204 1
##
## Eigenvalues, and their contribution to the variance
##
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Eigenvalue 0.05175 0.01431 0.007533 0.00445 0.002559 0.0007821
## Proportion Explained 0.63076 0.17437 0.091822 0.05424 0.031189 0.0095334
## Cumulative Proportion 0.63076 0.80514 0.896957 0.95120 0.982387 0.9919203
## PC7 PC8 PC9 PC10
## Eigenvalue 0.000397 0.0001717 0.0000623 3.187e-05
## Proportion Explained 0.004839 0.0020931 0.0007594 3.884e-04
## Cumulative Proportion 0.996759 0.9988522 0.9996116 1.000e+00
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 1.145681
##
##
## Species scores
##
## PC1 PC2 PC3 PC4 PC5 PC6
## Chalcides -0.03776 0.008854 0.0098076 0.002762 -0.0003448 0.002596
## Triturus -0.23020 0.055851 -0.1308192 -0.049064 0.0904079 0.003909
## Bufo -0.03385 -0.006658 -0.0220420 0.012559 0.0069510 0.006214
## Rana -0.16181 0.060539 -0.0870802 -0.066131 0.0566041 0.015235
## Salamandra 0.16429 0.049853 -0.0560908 -0.072646 -0.0673632 -0.090889
## Lacerta 0.17234 -0.416903 -0.0031983 0.053345 0.0222889 -0.005150
## Podarcis -0.22365 0.031512 0.3006997 -0.047010 0.0342179 -0.012577
## Coluber 0.64874 0.197801 0.0315186 0.097320 0.0332375 0.014769
## Malpolon 0.07940 -0.014905 0.0002864 -0.106045 -0.1296873 0.059417
## Psamodromus -0.46076 0.073078 -0.0259752 0.182206 -0.0733715 -0.005168
##
##
## Site scores (weighted sums of species scores)
##
## PC1 PC2 PC3 PC4 PC5 PC6
## Eucalip1 -0.36307 0.248894 -0.1545380 -0.650611 0.44741 0.003441
## Eucalip2 -0.38008 0.108241 -0.4958405 0.220544 -0.04489 0.107066
## Eucalip3 -0.35692 0.162946 -0.2424433 0.074933 0.02627 0.150277
## Eucalip4 -0.37524 0.054116 -0.1887455 0.096814 -0.06564 -0.089102
## Eucalip5 -0.19563 -0.356328 -0.0067813 0.008623 0.33676 -0.233706
## Eucalip6 -0.22453 0.039257 0.0845521 0.399778 -0.30616 -0.177859
## Eucalip7 -0.23237 -0.001994 0.6911435 -0.248043 -0.09624 -0.024869
## Eucalip8 -0.06635 0.170202 0.3649999 0.284662 0.16136 0.011301
## Pinhal1 -0.08764 0.086892 0.5000510 -0.064330 0.11105 0.057388
## Pinhal2 -0.13281 0.007653 0.1271843 0.335781 -0.40698 0.170389
## Pinhal3 0.04186 -0.140380 -0.0267173 -0.288301 -0.21749 0.142490
## Pinhal4 0.12023 -0.422492 -0.0544953 -0.005588 -0.01244 0.216064
## Pinhal5 0.12102 -0.230267 0.0517601 0.101294 0.14123 -0.180516
## Pinhal6 0.17790 -0.388133 -0.0766883 -0.070681 0.01927 0.461678
## Pinhal7 0.22584 -0.151003 -0.0785527 0.294609 0.36545 -0.428234
## Pinhal8 0.10448 -0.329852 -0.0811604 -0.225806 -0.24887 -0.089655
## Sobro1 0.27669 0.139474 -0.0761967 -0.029388 0.03413 -0.230908
## Sobro2 0.13703 -0.031953 -0.1389234 -0.283983 -0.45103 -0.221716
## Sobro3 0.22134 -0.029225 -0.0844116 -0.115354 -0.04927 0.014437
## Sobro4 0.32231 0.313911 0.0005512 0.150657 0.21527 0.691853
## Sobro5 0.30730 0.140798 -0.0451549 0.169236 0.30982 -0.127003
## Sobro6 0.35865 0.609242 -0.0695928 -0.154846 -0.26903 -0.222814
We can make a clean plot of this object
cleanplot.pca(acphbyrda1)
The corresponding analysis based on the correlation matrix is perhaps more sensible
sacphbyrda1<-rda(scale(herpeto[,-1]))
sacphbyrda1
## Call: rda(X = scale(herpeto[, -1]))
##
## Inertia Rank
## Total 10
## Unconstrained 10 10
## Inertia is variance
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
## 4.937 1.387 1.315 0.800 0.692 0.421 0.224 0.198 0.021 0.006
The summary of this object gives us direct access to something that was hidden above, namely the coordinates of the covariates in the PC space
summary(sacphbyrda1)
##
## Call:
## rda(X = scale(herpeto[, -1]))
##
## Partitioning of variance:
## Inertia Proportion
## Total 10 1
## Unconstrained 10 1
##
## Eigenvalues, and their contribution to the variance
##
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Eigenvalue 4.9365 1.3874 1.3152 0.79974 0.69153 0.42064 0.22441
## Proportion Explained 0.4937 0.1387 0.1315 0.07997 0.06915 0.04206 0.02244
## Cumulative Proportion 0.4937 0.6324 0.7639 0.84389 0.91304 0.95510 0.97754
## PC8 PC9 PC10
## Eigenvalue 0.19814 0.020642 0.0057846
## Proportion Explained 0.01981 0.002064 0.0005785
## Cumulative Proportion 0.99736 0.999422 1.0000000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 3.806754
##
##
## Species scores
##
## PC1 PC2 PC3 PC4 PC5 PC6
## Chalcides -0.6148 -0.66989 0.28721 -0.020219 0.65342 -0.2655
## Triturus -1.0417 0.47923 0.24165 0.014366 0.10989 0.2037
## Bufo -0.7323 0.69684 -0.42256 0.062473 -0.18893 -0.3362
## Rana -0.9712 0.32579 0.47437 0.091677 0.16752 0.3383
## Salamandra 0.9168 0.30657 0.56875 0.018431 0.03683 -0.0234
## Lacerta 0.5502 0.04452 -0.92847 0.232372 0.42470 0.1977
## Podarcis -0.6334 -0.80017 -0.08305 0.144766 -0.53992 0.2077
## Coluber 1.0287 0.02834 0.21519 -0.506741 -0.11054 -0.1168
## Malpolon 0.6740 0.02272 0.34552 0.902175 -0.07640 -0.1687
## Psamodromus -1.0743 -0.05322 -0.05785 0.002388 -0.08965 -0.3810
##
##
## Site scores (weighted sums of species scores)
##
## PC1 PC2 PC3 PC4 PC5 PC6
## Eucalip1 -1.4296 -0.17228 1.859965 0.18866 0.996881 2.079766
## Eucalip2 -1.4284 1.46077 0.081130 -0.17814 0.333498 -0.786716
## Eucalip3 -1.3878 1.43203 -0.083985 0.16053 -0.787980 -0.814436
## Eucalip4 -1.0597 0.38443 0.208352 0.02952 -0.238384 0.581143
## Eucalip5 -0.7728 0.51739 -1.530892 -0.20115 0.198427 -0.063827
## Eucalip6 -0.7151 -1.69389 0.267106 -0.41076 1.747837 -1.880656
## Eucalip7 -0.4685 -1.88450 -0.100348 0.79593 -1.224439 0.344216
## Eucalip8 -0.3374 -1.07102 -0.212962 -0.99714 -0.774187 0.180882
## Pinhal1 -0.2211 -0.70609 -0.396428 0.08884 -1.867745 0.523299
## Pinhal2 -0.1692 -0.39642 -0.461122 0.33845 -0.749594 -1.098069
## Pinhal3 0.3565 -0.13499 0.364301 1.30791 0.482541 0.177416
## Pinhal4 0.4695 0.55664 -1.255823 0.82222 0.237081 0.002148
## Pinhal5 0.3795 -0.37481 -0.620047 -0.41217 0.718626 0.394531
## Pinhal6 0.7116 0.17490 -0.857397 0.92070 0.523985 0.668720
## Pinhal7 0.5600 0.26689 -0.791365 -1.67646 0.452221 0.934844
## Pinhal8 0.6562 -0.07702 -0.144873 1.19855 0.940616 -0.018770
## Sobro1 0.8416 0.40817 0.448639 -0.60596 -0.180739 0.134376
## Sobro2 0.8648 0.46235 0.643822 1.14046 -0.136818 -0.281960
## Sobro3 0.7979 0.33708 0.134159 0.15440 -0.009382 0.217083
## Sobro4 0.5456 -0.37114 0.508297 -0.86208 0.443939 -0.800235
## Sobro5 0.7097 0.29638 -0.001117 -1.50433 -0.086574 0.445470
## Sobro6 1.0966 0.58513 1.940588 -0.29798 -1.019809 -0.939224
We can make a clean plot of this object
cleanplot.pca(sacphbyrda1)
and we can see clearly that the separation on the first PC is between sites from Eucaliptal versus Pinhal versus Sobro, without any clear separation regarding habitat on the second component.
Making a biplot by hand has the advantage of letting us use bespoke colors as well as fine tune the limits of the plot. Check where are the species and locations scores
names(summary(sacphbyrda1))
## [1] "species" "sites" "call" "tot.chi" "unconst.chi"
## [6] "cont" "scaling" "digits" "inertia" "method"
and then use those to make the biplot
#a couple of useful objects to make the code more readable
myacp=summary(sacphbyrda1)
plot(myacp$sites[,1:2],type="n",ylim=c(-2,2),xlim=c(-2,2))
text(myacp$sites[,1:2],labels=substr(herpeto[,1],1,4),col=as.numeric(as.factor(substr(herpeto[,1],1,4))))
#species
arrows(x0=0,y0=0,x1=myacp$species[,1],y1=myacp$species[,2],col=2)
text(myacp$species[,1:2],labels=names(herpeto[-1]),cex=0.5)
We now implement the CA
ca.herp<-cca(herpeto[,-1])
summary(ca.herp)
##
## Call:
## cca(X = herpeto[, -1])
##
## Partitioning of scaled Chi-square:
## Inertia Proportion
## Total 0.7634 1
## Unconstrained 0.7634 1
##
## Eigenvalues, and their contribution to the scaled Chi-square
##
## Importance of components:
## CA1 CA2 CA3 CA4 CA5 CA6
## Eigenvalue 0.4453 0.1260 0.06964 0.04409 0.03147 0.02391
## Proportion Explained 0.5833 0.1651 0.09122 0.05776 0.04123 0.03132
## Cumulative Proportion 0.5833 0.7484 0.83963 0.89739 0.93862 0.96993
## CA7 CA8 CA9
## Eigenvalue 0.01227 0.008908 0.001775
## Proportion Explained 0.01607 0.011669 0.002326
## Cumulative Proportion 0.98601 0.997674 1.000000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
##
##
## Species scores
##
## CA1 CA2 CA3 CA4 CA5 CA6
## Chalcides -0.5736 -0.22736 0.20757 0.04381 0.14096 0.908077
## Triturus -1.4050 0.73688 0.05765 -0.16664 -0.22457 -0.089970
## Bufo -1.0305 0.47098 -0.79879 0.31354 -0.20087 -0.851458
## Rana -1.4129 0.76139 0.37817 -0.35195 -0.10910 0.163806
## Salamandra 0.7689 0.40083 0.33082 -0.09962 0.30992 -0.110610
## Lacerta 0.2275 -0.06517 -0.35909 -0.13754 -0.04889 0.050076
## Podarcis -0.6532 -0.92747 0.34514 -0.19702 -0.10903 -0.126929
## Coluber 0.5715 0.06924 0.16295 0.16178 -0.13829 0.002801
## Malpolon 0.2864 0.02326 0.06220 -0.22884 0.34774 -0.078629
## Psamodromus -0.8750 -0.09060 -0.11225 0.36752 0.19678 0.005858
##
##
## Site scores (weighted averages of species scores)
##
## CA1 CA2 CA3 CA4 CA5 CA6
## Eucalip1 -1.8378 1.24107 1.9682714 -2.2844 -0.8864 1.46841
## Eucalip2 -1.6903 1.70073 -0.7683320 1.1289 0.3499 -0.18031
## Eucalip3 -1.5521 0.95266 -0.2880435 0.6300 -0.1375 -1.59192
## Eucalip4 -1.4784 0.51800 -0.1493270 0.3650 0.5072 -0.16045
## Eucalip5 -0.7855 -0.21678 -1.6392392 -0.5009 -1.2758 -0.90399
## Eucalip6 -0.6798 -0.90007 -0.1168264 1.5494 1.3665 2.67892
## Eucalip7 -0.6587 -2.68435 1.0167252 -0.9794 0.2880 -0.42184
## Eucalip8 -0.3277 -1.34982 0.6323698 0.9567 -0.8997 0.14569
## Pinhal1 -0.3174 -1.67710 0.7070600 -0.2640 -0.7725 -1.62401
## Pinhal2 -0.2729 -0.98691 -0.5607742 1.4227 1.2752 -0.38206
## Pinhal3 0.2864 0.10054 0.0322563 -1.2341 1.0312 0.42323
## Pinhal4 0.5922 -0.03768 -1.6523057 -0.7708 -0.1172 -0.43146
## Pinhal5 0.5368 -0.35369 -0.6373104 -0.2149 -0.5060 0.84825
## Pinhal6 0.8310 0.06962 -1.3868831 -1.1190 -0.2211 0.38989
## Pinhal7 0.8616 0.28254 -0.7186440 0.5525 -1.5195 0.52699
## Pinhal8 0.6973 0.07484 -0.7684806 -1.2209 1.2522 0.57878
## Sobro1 1.0841 0.60666 0.6022968 0.2312 -0.1758 -0.36525
## Sobro2 0.8779 0.55443 0.1752309 -0.6501 2.0473 -0.72125
## Sobro3 0.9984 0.44945 -0.0006649 -0.3582 0.1627 -0.22688
## Sobro4 0.9901 0.30751 0.7987011 1.0692 -1.3731 1.22595
## Sobro5 1.0784 0.48887 0.3728823 0.7538 -1.4952 0.06327
## Sobro6 1.2374 0.96757 2.2301775 0.8152 0.9128 -1.24361
plot(ca.herp)
we could have used package ca
, but we note these analysis are exactly the same, and as before for different PCA implementations, only the structure of the output objects is slightly different
library(ca)
caca.herp<-ca(herpeto[-1])
summary(caca.herp)
##
## Principal inertias (eigenvalues):
##
## dim value % cum% scree plot
## 1 0.445292 58.3 58.3 ***************
## 2 0.126030 16.5 74.8 ****
## 3 0.069639 9.1 84.0 **
## 4 0.044091 5.8 89.7 *
## 5 0.031472 4.1 93.9 *
## 6 0.023908 3.1 97.0 *
## 7 0.012269 1.6 98.6
## 8 0.008908 1.2 99.8
## 9 0.001775 0.2 100.0
## -------- -----
## Total: 0.763383 100.0
##
##
## Rows:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | Ecl1 | 46 746 137 | -1226 661 156 | 441 85 71 |
## 2 | Ecl2 | 46 938 105 | -1128 729 132 | 604 209 133 |
## 3 | Ecl3 | 49 923 82 | -1036 834 118 | 338 89 44 |
## 4 | Ecl4 | 46 937 65 | -987 906 101 | 184 31 12 |
## 5 | Ecl5 | 45 443 37 | -524 433 28 | -77 9 2 |
## 6 | Ecl6 | 48 460 42 | -454 307 22 | -320 152 39 |
## 7 | Ecl7 | 46 899 73 | -440 158 20 | -953 741 329 |
## 8 | Ecl8 | 47 732 23 | -219 126 5 | -479 605 86 |
## 9 | Pnh1 | 47 766 32 | -212 86 5 | -595 680 132 |
## 10 | Pnh2 | 45 458 20 | -182 97 3 | -350 360 44 |
## 11 | Pnh3 | 47 252 9 | 191 244 4 | 36 9 0 |
## 12 | Pnh4 | 46 399 23 | 395 398 16 | -13 0 0 |
## 13 | Pnh5 | 46 686 13 | 358 611 13 | -126 75 6 |
## 14 | Pnh6 | 44 561 31 | 555 559 30 | 25 1 0 |
## 15 | Pnh7 | 44 635 31 | 575 617 32 | 100 19 3 |
## 16 | Pnh8 | 45 563 23 | 465 561 22 | 27 2 0 |
## 17 | Sbr1 | 44 933 35 | 723 857 52 | 215 76 16 |
## 18 | Sbr2 | 44 691 32 | 586 621 34 | 197 70 13 |
## 19 | Sbr3 | 43 982 27 | 666 928 43 | 160 53 9 |
## 20 | Sbr4 | 43 595 43 | 661 579 42 | 109 16 4 |
## 21 | Sbr5 | 43 830 37 | 720 784 50 | 174 46 10 |
## 22 | Sbr6 | 48 640 78 | 826 546 73 | 343 94 44 |
##
## Columns:
## name mass qlt inr k=1 cor ctr k=2 cor ctr
## 1 | Chlc | 16 257 31 | -574 222 12 | -227 35 7 |
## 2 | Trtr | 45 956 156 | -1405 750 200 | 737 206 194 |
## 3 | Bufo | 9 399 36 | -1030 330 20 | 471 69 15 |
## 4 | Rana | 32 875 123 | -1413 678 143 | 761 197 146 |
## 5 | Slmn | 56 692 79 | 769 544 74 | 401 148 71 |
## 6 | Lcrt | 262 265 72 | 227 245 30 | -65 20 9 |
## 7 | Pdrc | 79 872 152 | -653 289 76 | -927 583 538 |
## 8 | Clbr | 284 818 151 | 571 806 208 | 69 12 11 |
## 9 | Mlpl | 90 259 38 | 286 257 17 | 23 2 0 |
## 10 | Psmd | 128 802 162 | -875 793 221 | -91 9 8 |
plot(caca.herp)
We note that the separation per habitat is perhaps not as clean as for the PCA. Again, we can make our own biplot
#a couple of useful objects to make the code more readable
names(summary(ca.herp))
## [1] "species" "sites" "call" "tot.chi" "unconst.chi"
## [6] "cont" "scaling" "digits" "inertia" "method"
mycas=summary(ca.herp)
#the biplot
plot(mycas$sites[,1:2],type="n",ylim=c(-2,2),xlim=c(-2,2))
text(mycas$sites[,1:2],labels=substr(herpeto[,1],1,4),col=as.numeric(as.factor(substr(herpeto[,1],1,4))))
#species
arrows(x0=0,y0=0,x1=mycas$species[,1],y1=mycas$species[,2],col=2)
text(mycas$species[,1:2],labels=names(herpeto[-1]),cex=0.5)
The results from both analysis are not that different in this case. While this is not a standard procedure (i.e. I have never seen anyone else doing it explicitly), we could compare the relations between the scores in the PCA and the CA, for the first and second axis.
par(mfrow=c(1,2),mar=c(4,4,0.5,0.5))
plot(myacp$sites[,1],mycas$sites[,1],xlab="PCA score 1st CP",ylab="CA score 1st CP")
abline(0,1)
text(-1,1,paste0("R= ",round(cor(myacp$sites[,1],mycas$sites[,1]),2)))
plot(myacp$sites[,2],mycas$sites[,2],xlab="PCA score 2nd CP",ylab="CA score 2nd CP")
text(-1,1,paste0("R= ",round(cor(myacp$sites[,2],mycas$sites[,2]),2)))
abline(0,1)
With a correlation of over 99% in the first axis and over 80% in the second, there is nothing structurally different in both analysis.
In general, a CA might be more sensible to use directly over count data.
Actually, in this case, similar plots to the pair above comparing the PCA with and without scaling (i.e., over the correlation and over the covariance matrices) and between the covariance-based PCA and the CA would also be informative.
par(mfrow=c(2,2),mar=c(4,4,0.5,0.5))
plot(myacp$sites[,1],sumacp1$scores[,1],xlab="PCA score 1st CP",ylab="PCA unscaled score 1st CP")
abline(0,1)
text(-1,1,paste0("R= ",round(cor(myacp$sites[,1],sumacp1$scores[,1]),2)))
plot(myacp$sites[,2],sumacp1$scores[,2],xlab="PCA score 2nd CP",ylab="PCA unscaled score 2nd CP")
text(-1,1,paste0("R= ",round(cor(myacp$sites[,2],sumacp1$scores[,2]),2)))
abline(0,1)
# PCA unscaled vs CA
plot(mycas$sites[,1],sumacp1$scores[,1],ylab="PCA unscaled score 1st CP",xlab="CA score 1st CP")
abline(0,1)
text(-1,1,paste0("R= ",round(cor(sumacp1$scores[,1],mycas$sites[,1]),2)))
plot(mycas$sites[,2],sumacp1$scores[,2],ylab="PCA unscaled score 2nd CP",xlab="CA score 2nd CP")
text(-1,1,paste0("R= ",round(cor(sumacp1$scores[,2],mycas$sites[,2]),2)))
abline(0,1)
These plots show us what we already knew. The results of the unscaled PCA would not be sensible, being very different from those of the CA and the scaled PCA. This was as expected, as both the CA and the unscaled PCA are not dominated by the raw abundances, unlike the unscaled PCA (covariance matrix based).
Efectue uma análise canónica de correspondências aos dados da alínea anterior, incluindo também as características dos locais (DataTP10herpamb.csv). Discuta os resultados. Em que situações esta análise é preferível em relação às anteriores técnicas?
First we read the environmental variables
herpamb <- read.csv("DataTP10herpamb.csv", sep=";")
and take a quick peak at it to make sure all is good
str(herpamb)
## 'data.frame': 22 obs. of 6 variables:
## $ Habitat : Factor w/ 22 levels "Eucalip1","Eucalip2",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Perc.floresta : int 75 85 49 65 68 67 59 58 45 45 ...
## $ Agua : int 2 1 1 1 1 0 0 0 0 0 ...
## $ Altura : int 12 10 9 8 12 15 14 15 13 14 ...
## $ Idade : int 9 8 9 8 7 8 9 8 8 9 ...
## $ Activ.agricola: int 1 1 1 2 2 1 1 1 2 2 ...
Let’s look at the different variables as a function of habitat
hab=substr(herpamb$Habitat,1,4)
par(mfrow=c(3,2))
boxplot(herpamb$Perc.floresta~hab)
boxplot(herpamb$Agua~hab)
boxplot(herpamb$Altura~hab)
boxplot(herpamb$Idade~hab)
boxplot(herpamb$Activ.agricola~hab)
Then, we implement the CCA using function cca
from package vegan
cca.herp<-cca(herpeto[,-1]~.,data=herpamb[,-1])
We can look at the summary of the results
summary(cca.herp)
##
## Call:
## cca(formula = herpeto[, -1] ~ Perc.floresta + Agua + Altura + Idade + Activ.agricola, data = herpamb[, -1])
##
## Partitioning of scaled Chi-square:
## Inertia Proportion
## Total 0.7634 1.0000
## Constrained 0.4915 0.6439
## Unconstrained 0.2719 0.3561
##
## Eigenvalues, and their contribution to the scaled Chi-square
##
## Importance of components:
## CCA1 CCA2 CCA3 CCA4 CCA5 CA1
## Eigenvalue 0.3658 0.09177 0.0200 0.009837 0.00413 0.1080
## Proportion Explained 0.4792 0.12022 0.0262 0.012885 0.00541 0.1415
## Cumulative Proportion 0.4792 0.59938 0.6256 0.638467 0.64388 0.7853
## CA2 CA3 CA4 CA5 CA6 CA7
## Eigenvalue 0.05750 0.03331 0.02914 0.01788 0.01235 0.008604
## Proportion Explained 0.07532 0.04363 0.03818 0.02343 0.01618 0.011271
## Cumulative Proportion 0.86067 0.90430 0.94247 0.96590 0.98208 0.993351
## CA8 CA9
## Eigenvalue 0.004061 0.001015
## Proportion Explained 0.005320 0.001329
## Cumulative Proportion 0.998671 1.000000
##
## Accumulated constrained eigenvalues
## Importance of components:
## CCA1 CCA2 CCA3 CCA4 CCA5
## Eigenvalue 0.3658 0.09177 0.02000 0.009837 0.004130
## Proportion Explained 0.7442 0.18671 0.04069 0.020012 0.008402
## Cumulative Proportion 0.7442 0.93090 0.97159 0.991598 1.000000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
##
##
## Species scores
##
## CCA1 CCA2 CCA3 CCA4 CCA5 CA1
## Chalcides -0.6113 -0.29111 -0.572651 0.19521 -0.33274 -0.06196
## Triturus -1.2902 0.71449 -0.009936 0.02656 0.02877 -0.48284
## Bufo -0.9295 0.23703 0.280710 -0.36696 0.15084 -0.15629
## Rana -1.3053 0.71249 -0.114317 0.18142 0.06693 -0.54825
## Salamandra 0.7705 0.43184 0.118288 0.02470 -0.11919 0.08011
## Lacerta 0.1164 -0.10010 0.173742 0.06916 -0.01033 0.34121
## Podarcis -0.5020 -0.63199 -0.091435 0.11269 0.07508 -0.61205
## Coluber 0.5613 0.06190 -0.116579 -0.04956 0.03247 0.11979
## Malpolon 0.2333 0.07466 -0.055202 0.01425 0.01962 0.15423
## Psamodromus -0.7509 -0.19039 0.031421 -0.17627 -0.05380 -0.40467
##
##
## Site scores (weighted averages of species scores)
##
## CCA1 CCA2 CCA3 CCA4 CCA5 CA1
## Eucalip1 -2.0152 1.87412 -2.8930 4.81834 1.10819 -0.81906
## Eucalip2 -1.8793 1.65598 0.6424 -3.91983 -1.31085 0.50931
## Eucalip3 -1.6983 1.00010 0.2777 -3.16262 2.05190 -1.49156
## Eucalip4 -1.6230 0.50958 0.4475 -1.65147 -0.43073 -0.87193
## Eucalip5 -0.9153 -0.38134 2.3268 0.35012 0.95296 0.60554
## Eucalip6 -0.7387 -1.32824 -1.5781 -1.51391 -7.43262 0.02136
## Eucalip7 -0.6709 -2.79755 -1.5910 2.85385 2.85197 -0.95704
## Eucalip8 -0.3082 -1.53374 -1.5519 -0.66192 2.00687 -0.37185
## Pinhal1 -0.2947 -1.69826 -0.7898 0.65662 5.24818 -1.38288
## Pinhal2 -0.2811 -1.37301 0.3499 -3.18278 -0.73277 -0.78594
## Pinhal3 0.2879 0.21061 0.2525 2.17733 -0.94590 0.23520
## Pinhal4 0.5812 -0.19994 2.5969 0.87950 1.02682 1.61298
## Pinhal5 0.5601 -0.48814 0.9088 1.36086 -1.23298 0.69684
## Pinhal6 0.8351 -0.01716 1.9636 1.96754 1.23469 2.41756
## Pinhal7 0.9260 0.17637 1.5098 0.04353 -0.52020 0.46418
## Pinhal8 0.7119 0.06910 1.6312 2.19994 -3.05349 0.94164
## Sobro1 1.2139 0.79587 -0.1443 -0.24031 -0.24692 0.48075
## Sobro2 0.9624 0.73900 1.0770 0.24228 -2.53746 -0.19991
## Sobro3 1.0866 0.57212 0.5625 0.55470 0.02748 0.65598
## Sobro4 1.1086 0.34669 -2.9852 -0.86866 1.34425 0.86836
## Sobro5 1.2042 0.57493 -0.3852 -0.55785 1.13225 -1.35850
## Sobro6 1.4536 1.41329 -2.2943 -1.95167 -0.46756 -0.91294
##
##
## Site constraints (linear combinations of constraining variables)
##
## CCA1 CCA2 CCA3 CCA4 CCA5 CA1
## Eucalip1 -1.80552 1.5084 -0.832575 1.58192 0.21614 -0.81906
## Eucalip2 -2.05667 0.8323 -0.195676 -1.03400 -1.38874 0.50931
## Eucalip3 -0.77643 0.5318 0.902637 -0.84872 1.90616 -1.49156
## Eucalip4 -1.20256 1.0830 0.685743 -0.89444 -0.16342 -0.87193
## Eucalip5 -0.99294 0.4472 -0.227171 0.80894 -0.71615 0.60554
## Eucalip6 -0.74313 -1.4356 -0.617769 -0.20748 -0.87298 0.02136
## Eucalip7 -0.53871 -1.3844 -0.292870 -0.54116 -0.05337 -0.95704
## Eucalip8 -0.39734 -1.5486 -0.369172 -0.04068 -0.07734 -0.37185
## Pinhal1 0.18782 -1.2099 0.318893 -0.03086 0.52915 -1.38288
## Pinhal2 0.23282 -1.4062 -0.061265 0.33855 0.52477 -0.78594
## Pinhal3 0.27892 -0.2193 -0.638832 3.05769 0.44809 0.23520
## Pinhal4 0.06598 -0.9885 0.643808 -0.43734 0.35672 1.61298
## Pinhal5 0.29016 -0.7996 0.599201 0.22062 -0.44519 0.69684
## Pinhal6 -0.35292 0.6012 1.057962 0.42225 0.97228 2.41756
## Pinhal7 0.77362 -0.7442 1.476561 0.14841 0.65444 0.46418
## Pinhal8 0.33392 0.1281 0.987506 0.12912 -2.17349 0.94164
## Sobro1 0.76593 1.3335 -0.064933 0.28099 2.07343 0.48075
## Sobro2 1.19962 0.6491 -0.228898 -0.72637 -0.21643 -0.19991
## Sobro3 0.82461 0.6593 0.002506 -1.84708 0.46705 0.65598
## Sobro4 0.87917 0.1031 -3.775111 -0.95601 0.39607 0.86836
## Sobro5 1.99442 1.0410 0.400087 0.92582 -1.50583 -1.35850
## Sobro6 1.41722 1.0218 0.134275 -0.41928 -0.92556 -0.91294
##
##
## Biplot scores for constraining variables
##
## CCA1 CCA2 CCA3 CCA4 CCA5 CA1
## Perc.floresta -0.9402 -0.1418 0.06674 0.1045 -0.28363 0
## Agua -0.6362 0.6024 -0.01142 0.3892 0.28406 0
## Altura -0.5474 -0.7157 0.06631 0.4279 0.02579 0
## Idade 0.5211 0.3345 -0.70051 -0.3370 0.11056 0
## Activ.agricola 0.8294 0.4350 -0.26062 -0.1403 -0.18782 0
and finally we plot the triplot that contains the information on the sites and the species information, as the previous analysis did, but now it also contains directly the information about the environmental variables. The ecological interpretation would now have to be done considering the 3 sets of variables, as the environmental variables were used to constrain the axis.
plot(cca.herp)
Obteve dados relativos à abundância de diversas espécies de aves em três tipos de habitat. Efectue várias análises de ordenação (análise de componentes principais, análise de correspondências e análise canónica de correspondências). A análise canónica de correspondências permite integrar informação de uma segunda matriz de dados. Indique em que situações esta análise poderá ser mais informativa e como poderia tentar obter informação análoga nas outras duas análises efectuadas.
We read the data in, both the species matrix and the environmental variables matrix
aves <- read.csv("DataTP10aves.csv", sep=";")
avesamb <- read.csv("DataTP10avesamb.csv", sep=";")
row.names(aves)=row.names(avesamb)=aves[,1]
We look at both of these, first the species data
str(aves)
## 'data.frame': 22 obs. of 11 variables:
## $ Habitat : Factor w/ 22 levels "Bosq 1","Bosq 2",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Erithacus : num 0.05 0.02 0.01 0.01 0.02 0.08 0.03 0.02 0 0.01 ...
## $ Turdus : num 0.22 0.18 0.18 0.15 0.12 0.03 0 0.02 0.03 0 ...
## $ Passer : num 0 0.04 0.05 0.01 0.04 0 0 0 0.01 0.01 ...
## $ Parus : num 0.2 0.13 0.11 0.1 0.02 0.02 0.02 0.02 0.01 0 ...
## $ Athene : num 0.03 0 0 0.01 0 0.02 0 0 0.01 0.01 ...
## $ Anas : num 0.06 0.15 0.13 0.17 0.37 0.21 0.19 0.19 0.2 0.23 ...
## $ Sylvia : num 0.16 0 0.09 0.1 0.12 0.12 0.34 0.2 0.26 0.11 ...
## $ Carduelis : num 0.03 0.05 0.08 0.04 0.08 0.16 0.11 0.31 0.25 0.21 ...
## $ Motacilla : num 0.06 0.04 0.06 0.05 0.02 0.05 0.1 0.03 0.08 0.1 ...
## $ Dendrocopos: num 0.16 0.36 0.32 0.33 0.16 0.32 0.17 0.2 0.14 0.27 ...
We have 22 observations for 11 variables, which correspond respectively to locations and taxa (genera of amphibians). The first column does not represent a genera, but it contains the labels of the locations, which are separated by habitats, of which there are the following 3
unique(substr(aves[,1],1,4))
## [1] "Bosq" "Prad" "Ripi"
which we assume correspond to “Bosque”, “Pradaria”, and “Ripícola”.
We can see if there are locations with much more abundance than others
sort(rowSums(aves[,-1]))
## Ripic 3 Ripic 4 Ripic 5 Prado 6 Prado 7 Ripic 2 Ripic 1 Bosq 5 Prado 8
## 0.90 0.91 0.91 0.92 0.92 0.92 0.93 0.95 0.95
## Prado 2 Prado 5 Bosq 7 Prado 4 Bosq 1 Bosq 2 Bosq 4 Prado 3 Bosq 8
## 0.95 0.96 0.96 0.96 0.97 0.97 0.97 0.98 0.99
## Prado 1 Ripic 6 Bosq 6 Bosq 3
## 0.99 1.00 1.01 1.03
(as before, some standardization must have been made before analysis)
or if there are species which dominate the communities.
sort(colSums(aves[,-1]))
## Passer Erithacus Parus Turdus Athene Sylvia
## 0.18 0.34 0.67 0.95 1.17 1.66
## Motacilla Dendrocopos Anas Carduelis
## 1.89 2.70 5.52 5.97
In fact, Anas and Carduelis seem to dominate the community in terms of abundance.
Now we look at environmental variables
str(avesamb)
## 'data.frame': 22 obs. of 4 variables:
## $ Habitat : Factor w/ 22 levels "Bosq 1","Bosq 2",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Perc.veg: int 75 85 62 56 59 89 84 86 12 12 ...
## $ Agua : int 25 23 32 34 35 36 34 28 7 8 ...
## $ Solo : int 50 52 62 52 54 58 59 63 23 25 ...
We have 22 observations for 4 variables, which correspond respectively to locations and environmental variables (genera of amphibians). The first column does not represent a variable, but it contains the labels of the locations.
A number of different analysis could be attempted on this data, as shown below. We use the PCA unscaled, but this is wrong given the raw abundances (for the environmental variables analysis might be equivalent scaled or unscaled, as they seem to be measured in similar scales). Your task is to repeat the analysis with the correlation matrix and compare the results.
We could see how the sites would group as a function of the environmental variables, both considering a PCA and a CA
pca.avesamb<-princomp(avesamb[,-1])
summary(pca.avesamb)
## Importance of components:
## Comp.1 Comp.2 Comp.3
## Standard deviation 37.473629 25.5582232 5.3671214
## Proportion of Variance 0.673092 0.3131008 0.0138072
## Cumulative Proportion 0.673092 0.9861928 1.0000000
biplot(pca.avesamb,cex=0.7,xlabs=substr(avesamb[,1],1,4))
We can see a very clear separation of the 3 habitat types based on the sites environmental variables.
ca.avesamb<-ca(avesamb[,-1])
ca.avesamb
##
## Principal inertias (eigenvalues):
## 1 2
## Value 0.093586 0.011924
## Percentage 88.7% 11.3%
##
##
## Rows:
## Bosq 1 Bosq 2 Bosq 3 Bosq 4 Bosq 5 Bosq 6
## Mass 0.050744 0.054127 0.052774 0.048038 0.050068 0.061908
## ChiDist 0.404234 0.475416 0.200678 0.167570 0.177074 0.365468
## Inertia 0.008292 0.012234 0.002125 0.001349 0.001570 0.008269
## Dim. 1 -1.315845 -1.549911 -0.641840 -0.536668 -0.568257 -1.162157
## Dim. 2 0.338420 0.317972 -0.379574 0.307247 0.308511 0.775381
## Bosq 7 Bosq 8 Prado 1 Prado 2 Prado 3 Prado 4
## Mass 0.059878 0.059878 0.014208 0.015223 0.016576 0.019621
## ChiDist 0.343336 0.384992 0.316218 0.319674 0.179729 0.197294
## Inertia 0.007058 0.008875 0.001421 0.001556 0.000535 0.000764
## Dim. 1 -1.108038 -1.258329 -0.113654 0.021691 0.034577 -0.632347
## Dim. 2 0.499884 -0.054639 -2.878278 -2.926849 -1.643051 -0.355071
## Prado 5 Prado 6 Prado 7 Prado 8 Ripic 1 Ripic 2
## Mass 0.020636 0.023681 0.020974 0.015900 0.069012 0.067997
## ChiDist 0.247898 0.226933 0.368830 0.228426 0.338865 0.322988
## Inertia 0.001268 0.001220 0.002853 0.000830 0.007925 0.007094
## Dim. 1 0.136803 0.628879 0.555023 0.403498 1.104777 1.051996
## Dim. 2 -2.237598 -1.102222 -2.998450 -1.760127 0.225210 0.250809
## Ripic 3 Ripic 4 Ripic 5 Ripic 6
## Mass 0.071719 0.066644 0.072057 0.068336
## ChiDist 0.301397 0.269222 0.430147 0.286161
## Inertia 0.006515 0.004830 0.013332 0.005596
## Dim. 1 0.985218 0.864708 1.266927 0.913179
## Dim. 2 -0.005636 -0.458255 1.708621 0.568026
##
##
## Columns:
## Perc.veg Agua Solo
## Mass 0.316982 0.279432 0.403586
## ChiDist 0.428703 0.362222 0.161991
## Inertia 0.058257 0.036663 0.010590
## Dim. 1 -1.391381 1.109118 0.324890
## Dim. 2 0.467768 1.161271 -1.171424
plot(ca.avesamb,labels=2)
Again, the separation across habitats is very clear.
pca.aves<-princomp(aves[,-1])
summary(pca.aves)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4
## Standard deviation 0.2222533 0.1168578 0.08479853 0.06517432
## Proportion of Variance 0.6307611 0.1743747 0.09182164 0.05424020
## Cumulative Proportion 0.6307611 0.8051358 0.89695745 0.95119765
## Comp.5 Comp.6 Comp.7 Comp.8
## Standard deviation 0.04942171 0.027323770 0.019466369 0.012803054
## Proportion of Variance 0.03118917 0.009533439 0.004838807 0.002093127
## Cumulative Proportion 0.98238683 0.991920266 0.996759072 0.998852199
## Comp.9 Comp.10
## Standard deviation 0.0077115470 0.0055153726
## Proportion of Variance 0.0007593664 0.0003884347
## Cumulative Proportion 0.9996115653 1.0000000000
biplot(pca.aves,cex=0.7,xlabs=substr(avesamb[,1],1,4))
ca.aves<-ca(aves[,-1])
ca.aves
##
## Principal inertias (eigenvalues):
## 1 2 3 4 5 6 7
## Value 0.445292 0.12603 0.069639 0.044091 0.031472 0.023908 0.012269
## Percentage 58.33% 16.51% 9.12% 5.78% 4.12% 3.13% 1.61%
## 8 9
## Value 0.008908 0.001775
## Percentage 1.17% 0.23%
##
##
## Rows:
## Bosq 1 Bosq 2 Bosq 3 Bosq 4 Bosq 5 Bosq 6
## Mass 0.046081 0.046081 0.048931 0.046081 0.045131 0.047981
## ChiDist 1.508272 1.321314 1.133841 1.036599 0.796308 0.818294
## Inertia 0.104828 0.080451 0.062906 0.049516 0.028618 0.032128
## Dim. 1 -1.837804 -1.690296 -1.552097 -1.478447 -0.785535 -0.679849
## Dim. 2 1.241071 1.700734 0.952661 0.517997 -0.216784 -0.900073
## Bosq 7 Bosq 8 Prado 1 Prado 2 Prado 3 Prado 4
## Mass 0.045606 0.047031 0.047031 0.045131 0.046556 0.045606
## ChiDist 1.107109 0.615837 0.721845 0.583780 0.386926 0.625996
## Inertia 0.055898 0.017837 0.024506 0.015380 0.006970 0.017872
## Dim. 1 -0.658662 -0.327748 -0.317424 -0.272950 0.286386 0.592153
## Dim. 2 -2.684345 -1.349816 -1.677101 -0.986909 0.100543 -0.037683
## Prado 5 Prado 6 Prado 7 Prado 8 Ripic 1 Ripic 2 Ripic 3
## Mass 0.045606 0.043705 0.043705 0.045131 0.044181 0.043705 0.042755
## ChiDist 0.458169 0.741372 0.732269 0.620944 0.781261 0.743498 0.691484
## Inertia 0.009573 0.024022 0.023436 0.017401 0.026966 0.024160 0.020443
## Dim. 1 0.536833 0.830987 0.861648 0.697250 1.084080 0.877864 0.998400
## Dim. 2 -0.353691 0.069624 0.282541 0.074841 0.606658 0.554429 0.449455
## Ripic 4 Ripic 5 Ripic 6
## Mass 0.043230 0.043230 0.047506
## ChiDist 0.868291 0.812487 1.117646
## Inertia 0.032593 0.028538 0.059341
## Dim. 1 0.990091 1.078401 1.237441
## Dim. 2 0.307511 0.488870 0.967568
##
##
## Columns:
## Erithacus Turdus Passer Parus Athene Anas
## Mass 0.016152 0.045131 0.008551 0.031829 0.055582 0.262233
## ChiDist 1.217291 1.622363 1.792900 1.715545 1.042239 0.459361
## Inertia 0.023934 0.118787 0.027487 0.093676 0.060377 0.055334
## Dim. 1 -0.859618 -2.105453 -1.544222 -2.117302 1.152327 0.340921
## Dim. 2 -0.640447 2.075667 1.326694 2.144726 1.129084 -0.183568
## Sylvia Carduelis Motacilla Dendrocopos
## Mass 0.078860 0.283610 0.089786 0.128266
## ChiDist 1.214767 0.636643 0.565053 0.982436
## Inertia 0.116370 0.114951 0.028667 0.123800
## Dim. 1 -0.978853 0.856413 0.429250 -1.311213
## Dim. 2 -2.612553 0.195041 0.065533 -0.255203
par(mfrow=c(1,2),mar=c(4,4,0.5,0.5))
plot(ca.aves,labels=2)
Note that in a CA we can also overlay on the ordination biplot, if available, a set of environmental covariates. We do this using function envfit
(to do so we need to fit the CA with function cca
in package vegan
)
ca.aves2<-cca(aves[,-1])
ca.aves2env<-envfit(ca.aves2,avesamb[,-1])
par(mfrow=c(1,1),mar=c(4,4,0.5,0.5))
plot(ca.aves2)
plot(ca.aves2env)
cca.aves<-cca(aves[,-1]~., data=avesamb[,-1])
summary(cca.aves)
##
## Call:
## cca(formula = aves[, -1] ~ Perc.veg + Agua + Solo, data = avesamb[, -1])
##
## Partitioning of scaled Chi-square:
## Inertia Proportion
## Total 0.7634 1.0000
## Constrained 0.3587 0.4699
## Unconstrained 0.4047 0.5301
##
## Eigenvalues, and their contribution to the scaled Chi-square
##
## Importance of components:
## CCA1 CCA2 CCA3 CA1 CA2 CA3
## Eigenvalue 0.3280 0.02924 0.001425 0.1688 0.08045 0.05299
## Proportion Explained 0.4297 0.03830 0.001866 0.2211 0.10539 0.06941
## Cumulative Proportion 0.4297 0.46802 0.469886 0.6910 0.79640 0.86581
## CA4 CA5 CA6 CA7 CA8 CA9
## Eigenvalue 0.03458 0.03081 0.01695 0.01137 0.007464 0.001256
## Proportion Explained 0.04530 0.04037 0.02221 0.01489 0.009778 0.001646
## Cumulative Proportion 0.91111 0.95148 0.97368 0.98858 0.998354 1.000000
##
## Accumulated constrained eigenvalues
## Importance of components:
## CCA1 CCA2 CCA3
## Eigenvalue 0.3280 0.02924 0.001425
## Proportion Explained 0.9145 0.08152 0.003972
## Cumulative Proportion 0.9145 0.99603 1.000000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
##
##
## Species scores
##
## CCA1 CCA2 CCA3 CA1 CA2 CA3
## Erithacus -0.7768 -0.19685 -0.065959 0.18436 -0.077779 0.24382
## Turdus -0.9648 -0.23347 -0.002096 -1.24513 0.026119 0.10185
## Passer -0.6850 0.02306 0.113574 -0.95215 -0.142186 -0.92793
## Parus -1.0307 -0.21659 -0.029620 -1.20241 0.033001 0.55886
## Athene 0.7737 -0.22104 -0.123545 0.06307 -0.227752 0.37841
## Anas 0.1512 0.24331 -0.011338 0.13870 -0.189073 -0.14366
## Sylvia -0.7323 0.08601 -0.011922 0.35166 0.873994 0.09333
## Carduelis 0.5073 -0.13729 0.031330 0.23570 -0.106902 0.06677
## Motacilla 0.2809 0.06105 0.017772 0.07177 0.002034 0.18827
## Dendrocopos -0.7739 -0.03457 0.011154 -0.32177 0.184739 -0.35045
##
##
## Site scores (weighted averages of species scores)
##
## CCA1 CCA2 CCA3 CA1 CA2 CA3
## Bosq 1 -1.9920 -3.130118 -8.8184 -1.69906 0.3412784 3.09366
## Bosq 2 -1.7911 -1.888582 2.5962 -1.45636 -1.6684158 -0.61482
## Bosq 3 -1.6737 -1.516559 4.0742 -1.83093 0.4307769 -0.96136
## Bosq 4 -1.6444 -0.864148 -0.9645 -1.65590 0.8319174 -0.71830
## Bosq 5 -0.9269 1.788843 1.0347 -0.36205 -0.0350761 -1.66804
## Bosq 6 -0.9199 -0.002147 -1.9069 1.26653 -0.7150326 -0.64443
## Bosq 7 -0.9902 1.794148 -1.2132 1.99500 1.8709653 0.41297
## Bosq 8 -0.4996 0.098511 4.2427 1.85373 -0.1426594 0.08602
## Prado 1 -0.4851 0.884286 3.5376 0.06650 2.6371451 0.02591
## Prado 2 -0.4075 1.058940 4.9426 -0.23050 1.3348065 -1.39378
## Prado 3 0.3345 0.626284 -3.9496 -0.13035 -0.1440932 1.23049
## Prado 4 0.6681 2.554465 2.2132 0.55476 -1.3515627 -0.33915
## Prado 5 0.5626 1.356450 -1.4583 0.46104 -0.3868010 0.06131
## Prado 6 0.9488 2.468812 2.6743 0.36616 -1.0540694 -0.03454
## Prado 7 0.9990 0.584783 -0.4481 0.35392 -1.1336281 -0.06426
## Prado 8 0.7921 1.646708 -5.7737 0.27631 -0.8460425 0.70205
## Ripic 1 1.3131 -1.080066 -1.8547 -0.03093 0.0005633 0.05120
## Ripic 2 1.0748 -0.077492 -6.5715 -0.22769 0.1093549 0.01896
## Ripic 3 1.1950 0.085138 -1.0070 0.04955 -0.1143608 -0.24653
## Ripic 4 1.1537 -1.536710 9.1861 0.35968 -0.1316139 0.07247
## Ripic 5 1.2864 -0.961352 2.2643 -0.11797 0.1860379 -0.67368
## Ripic 6 1.5511 -3.599707 -2.6757 0.22163 -0.1628344 1.52354
##
##
## Site constraints (linear combinations of constraining variables)
##
## CCA1 CCA2 CCA3 CA1 CA2 CA3
## Bosq 1 -1.31648 -0.24662 -0.2960733 -1.69906 0.3412784 3.09366
## Bosq 2 -1.66810 -0.43045 -0.1310699 -1.45636 -1.6684158 -0.61482
## Bosq 3 -0.65583 -0.47859 1.0734735 -1.83093 0.4307769 -0.96136
## Bosq 4 -0.50655 -0.09351 -0.3818036 -1.65590 0.8319174 -0.71830
## Bosq 5 -0.57171 -0.19692 -0.3174287 -0.36205 -0.0350761 -1.66804
## Bosq 6 -1.52015 -0.70390 -0.9236776 1.26653 -0.7150326 -0.64443
## Bosq 7 -1.38010 -0.66702 -0.3701555 1.99500 1.8709653 0.41297
## Bosq 8 -1.51834 -0.80233 0.8309498 1.85373 -0.1426594 0.08602
## Prado 1 0.23409 1.44511 0.1563554 0.06650 2.6371451 0.02591
## Prado 2 0.26899 1.37955 0.3271438 -0.23050 1.3348065 -1.39378
## Prado 3 0.24667 1.37789 -0.2362737 -0.13035 -0.1440932 1.23049
## Prado 4 -0.04446 1.29317 -0.8153716 0.55476 -1.3515627 -0.33915
## Prado 5 0.28423 1.09568 0.6085555 0.46104 -0.3868010 0.06131
## Prado 6 0.47614 1.02432 0.0006532 0.36616 -1.0540694 -0.03454
## Prado 7 0.46977 1.01766 1.2083258 0.35392 -1.1336281 -0.06426
## Prado 8 0.36439 1.41311 -0.2461690 0.27631 -0.8460425 0.70205
## Ripic 1 1.32532 -1.11375 0.3241837 -0.03093 0.0005633 0.05120
## Ripic 2 1.23941 -1.05818 0.2342329 -0.22769 0.1093549 0.01896
## Ripic 3 1.21733 -1.31891 1.0163511 0.04955 -0.1143608 -0.24653
## Ripic 4 1.03051 -1.18296 1.9005736 0.35968 -0.1316139 0.07247
## Ripic 5 1.48182 -0.84024 -3.4193528 -0.11797 0.1860379 -0.67368
## Ripic 6 1.03817 -0.99541 -0.5173174 0.22163 -0.1628344 1.52354
##
##
## Biplot scores for constraining variables
##
## CCA1 CCA2 CCA3 CA1 CA2 CA3
## Perc.veg -0.7484 -0.6559 -0.09883 0 0 0
## Agua 0.5344 -0.8183 -0.21145 0 0 0
## Solo 0.2911 -0.9544 0.06580 0 0 0
plot(cca.aves)
We conclude that with the direct gradient analysis (i.e. the CCA) the same separation across habitats is present, and we can relate the environmental variables to the abundances. Naturally, the percentage of vegetation is larger on Bosques than on the Pradaria or Ripicola areas, and there is more water near rivers (Ripicola). It is however not easy to see in the CCA output what are the species contributing more to this segregation. That is simpler to do in the CA/PCA plots, where it seems clear that Turdus and Parus are characteristic of Bosques, as is Sylvia (segregating sites on the second component), and Athene of Ripicola. Prados are harder to characterize in terms of species in the CA, but the PCA shows that Anas seems frequent.
All of these multivariate techniques reduce the information in a large matrix (or sets of matrices) to explore the relationships between the sites and the species, or in the case of the CCA, to see how the species distribute themselves over the sites using linear combinations of the environmental variables to explain that variation.
This process of dimension reduction inevitably leads to distortions. Try to find examples of points that were well represented in the resulting reduced space. A good starting point might be e.g. site Eucaliptal 7 in exercise 3. Look at its position in the biplot and try to imagine what might be the values of the different genera in it. What do you expect to be the most abundant genera say? Then go to the original data frame and check if your predictions were correct. Do the same with other sites to check the distortion that some sites suffered in the reduced space.