Introduction

Hello everyone!

This is called an “RMarkdown” file. You can do all sorts of amazing stuff using RMarkdown, including preparing pedagogic documents/websites, showing off your work you do on R, etc. For more details on using R Markdown see http://rmarkdown.rstudio.com.

I’ll be using RMarkdown to prepare the summaries. You can skip the R sections if you are not interested. But they are much easier to get into than you’d probably think. Let us start with some preliminaries:


Why use it at all?

There are two main reasons why you would want to use factor analysis: to reduce the number of variables in your data set without using much information contained within them(for computational reasons), to find out what may be the latent, unobservable characteristics that your data set may have that could give you a bigger-picture view on the data you have (summarisation/for drawing insights) while making it easier to making sense from the chaos you have in your hands.

Your data set may allow you to indirectly get a measure on those unobservable characteristics. So you’re basically hitting two birds with one stone here. After running a factor analysis method and having factors that adequately explain the variation in your original data set, you end up having a much manageable data set with similar/correlated variables grouped together and represented by a variable that can be named arbitrarily, depending on the context.


Variance accounted for/explained

You’ve seen the same logic at work in regression analysis. Variance means information, covariance means the degree of similarity between information carried by two variables. Covariance makes up the most basic block in factor analysis, therefore it is important to grasp what it means.

If we have variables with high correlation, then we can assume that they have some information in common. You might even say that you have some duplicate information stored by two variables that you could compress without losing much, pretty similar (but not identical to) to what Winrar, Winzip and co. do, we compress the data set while also making sure that we don’t lose much (unique) information in the process. Communality (\(h^2\)) therefore represents variance explained in a particular variable by the factors generated, whereas uniqueness (\(u^2\)) represents the variance that the factors are not able to explain, you could say “information lost” during compression. If you had as many factors as you have variables, you could perfectly explain the variance and you would have a communality value of 1 for each variable. But then it would not make any sense to use factor analysis, would it?

The resulting system of equations in matrix notation looks like the following, assume there are \(m\) observations, \(n\) variables in your data set and you have reduced your variables to \(r\) factors. This is the system we want to achieve, we want to decompose our variables in the following way:

\[ X = {\Lambda}F + {\Psi}E\]

where X is the \(n \times m\) matrix that stores all the variables we are interested in, \({\Lambda}\) represents the \(n \times r\) loading matrix, that stores the correlations between \(n\) variables and \(r\) factors. \(F\) represents \(r \times m\) matrix of extracted factors, \(\Psi\) is the \(n \times 1\) loading vector for “unique” factors, which does not appear as a factor when we run an analysis, and \(E\) represents the unique factor, or error term. This is not a formula, you won’t be able to calculate this (without having the full factor analysis results in your hands), there are numerous algorithms that do it for you, mostly using the eigenvalues of the correlation matrix of the variables in the data set. The variance is decomposed to explained/unexplained as follows:

\[ \Sigma = \Lambda'\Lambda + \Psi\]

\(\Lambda\) again represents the loading matrix and \(\Sigma\) is the covariance matrix of the data set. Our goal when running factor analysis is to minimize \(\Psi\) with the input we are given. But since all our variables are standardized, they have a variance of 1

\[ 1 = \Lambda'\Lambda + \Psi\]

which could also be denoted as the following.

\[ 1 = h^2 + u^2\]


Distinction - Supervised vs Unsupervised Methods

We don’t have a model to work with in factor analysis, that makes it an unsupervised method, while it may make sense to take a look at the correlation matrix beforehand to see if it is worthwhile to run a factor analysis, we don’t have to pre-group variables, or even name factors before running. You could argue that the expression I showed above might qualify as a model, but no, that’s not what I mean. We have no notion what the factors must look like, contrary to regression where we are trying to make the combination of our explanatory variables look as much as possible like the dependent variable. In factor analysis, however, you can have a look at what the factors could look like, but have no say whatsoever on what they must look like. You can think of a dependent variable as a supervisor that you should obey, we have no dependent variables in factor analysis.

Yet it still makes sense to have some expectations, and if you don’t have any variables that can be grouped together (read: high correlation), you might have to conclude that it is not worth using factor analysis. Because even if you reduce your data set to a number of factors, if the variance explained by these factors is around 30% percent, it will not have made sense to use factor analysis.


Application

Reminder: even if you are not interested in coding, it makes sense to read through what’s happening here. Just look at the tables/results/comments and factor analysis should start making sense, quite fast.

Let us see how easy it is to run a factor analysis. First, we’ll include the psych package (install it first if you don’t have it, as shown below) that gives us the factor analysis function. and the openxlsx package that lets us import excel files into R.

install.packages("psych") #install packages, you only need to do it once
install.packages("openxlsx")
install.packages("ggplot2") #for visualization. I strongly recommend installing and learning how to use the "tidyverse", which ggplot2 is a part of.
install.packages("tidyr") #required for visualization, not necessary for the analysis
require(psych) #attach the package so that we can work with what it offers to us
require(openxlsx)

Now to use the toothpaste data set! This version is slightly different than what we have seen in the lecture. so we’ll get rid of the extra columns. There are packages (e.g. dplyr) that do this much more elegantly, but let us use the basics for now, as much as possible.

toothpaste <- read.xlsx("https://samuelfranssens.files.wordpress.com/2022/03/toothpaste.xlsx")
toothpaste <- toothpaste[,-c(1,8,9)]

Take a look at the dataset below:

prevents_cavities shiny_teeth strengthens_gums freshens_breath decay_prevention_unimportant attractive_teeth
7 3 6 4 2 4
1 3 2 4 5 4
6 2 7 4 1 3
4 5 4 6 2 5
1 2 2 3 6 2
6 3 6 4 2 4
5 3 6 3 4 3
6 4 7 4 1 4
3 4 2 3 6 3
2 6 2 6 7 6
6 4 7 3 2 3
2 3 1 4 5 4
7 2 6 4 1 3
4 6 4 5 3 6
1 3 2 2 6 4
6 4 6 3 3 4
5 3 6 3 3 4
7 3 7 4 1 4
2 4 3 3 6 3
3 5 3 6 4 6
1 3 2 3 5 3
5 4 5 4 2 4
2 2 1 5 4 4
4 6 4 6 4 7
6 5 4 2 1 4
3 5 4 6 4 7
4 4 7 2 2 5
3 7 2 6 4 3
4 6 3 7 2 7
2 3 2 4 7 2
7 3 6 4 2 4
1 3 2 4 5 4
6 2 7 4 1 3
4 5 4 6 2 5
1 2 2 3 6 2
6 3 6 4 2 4
5 3 6 3 4 3
6 4 7 4 1 4
3 4 2 3 6 3
2 6 2 6 7 6
6 4 7 3 2 3
2 3 1 4 5 4
7 2 6 4 1 3
4 6 4 5 3 6
1 3 2 2 6 4
6 4 6 3 3 4
5 3 6 3 3 4
7 3 7 4 1 4
2 4 3 3 6 3
3 5 3 6 4 6
1 3 2 3 5 3
5 4 5 4 2 4
2 2 1 5 4 4
4 6 4 6 4 7
6 5 4 2 1 4
3 5 4 6 4 7
4 4 7 2 2 5
3 7 2 6 4 3
4 6 3 7 2 7
2 3 2 4 7 2

OK, now that we have the data set, let us take a look at the correlations. R needs only a single function to give you the correlation matrix! But before that, let us make sure that we standardize all the variables in the data set. apply() is one of the ways to implement a loop in R and it is not necessarily related to factor analysis. What we are doing is the usual z-standardization:

\[ z = \frac{x - {\mu}_x}{{\sigma}_x}\]

toothpaste <- apply(toothpaste,MARGIN = 2, function(x) (x -mean(x))/sd(x))
cor_mat <- cor(toothpaste) #finding the correlation matrix

The following part deals with visualization and it is by no means essential, if you are into data visualization, ggplot2 is an amazing library to learn as it gives endless customization options! Don’t worry about this part, as it features some stuff necessary for visualizing the correlation matrix.

require(tidyr) #
require(ggplot2) #
cor_mat <- data.frame(cor_mat) #finding the correlation matrix
cor_mat$var1 <- rownames(cor_mat)
cor_mat <- cor_mat %>% pivot_longer(cols = prevents_cavities:attractive_teeth)


ggplot(data = cor_mat, aes(var1, name, fill = value))+
 geom_tile(color = "white",alpha=0.75)+
 geom_text(aes(label=round(value,2),color=value)) +
 scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
   midpoint = 0, limit = c(-1,1), space = "Lab",
   name="Pearson\nCorrelation") +
 scale_color_gradient2(low = "darkblue", high = "darkred", mid = "gray", 
   midpoint = 0, limit = c(-1,1), space = "Lab",guide = "none") +
 theme_minimal()+ 
 theme(axis.text.x = element_text(angle = 45, vjust = 1, 
     hjust = 1))+
 coord_fixed() + labs(x="",y="")

You can see that, as discussed in the class, there is a strong (positive or negative) correlation between the variables strengthens_gums,decay_preventaion_unimportant and prevents_cavities, which we can group under “health concerns” and attractive_teeth,freshens_breath, and shiny_teeth under “aesthetic concerns”. This grouping is not necessary for factor analysis, however, as I mentioned above it is nice see whether it is worthwhile to do a factor analysis, as the correlations are the building blocks of factor analysis. With no strong correlation, we cannot expect to have factors that can cover the variance in variables well enough.

Reminder: Factor analysis first calculates the eigenvalues, loadings and communalities before actually extracting the factors themselves. Luckily, the psych package gives us a function to create a scree plot without extracting factors one by one.

In the figure below, we are interested in the dashed line. Don’t get confused, algorithms for factor analysis deal with eigenvalues of modified correlation matrices at each iteration. There is an other method called Principal Component Analysis that is in some ways similar to Factor Analysis, but we are not interested in that at the moment.

Anyway, the dashed line gives us the eigenvalues we are interested in. After the second factor, the marginal explanation power of each factor dips dramatically. So, it would be wise to go with 2 factors for this dataset.

psych::scree(toothpaste) # the part before "::" is the package name, sometimes it necessary to use it to avoid naming conflicts, I used it here to demonstrate it for two reasons: 1) to show you that such functionality exists, e.g. you can call a function without including its package 2)because the built-int "stats" package has a function with the same name.

We are now ready to go through with 2 factors as our selection. Running a factor analysis, again, is very simple with R. As you can see just below. Let us first choose the no-rotation version and use that as a basis this has the assumption that factors are uncorrelated with each other (We don’t really know at the moment if that assumption has any truth to it).

Reminder: we are able to use the entire data set inside the fa() function because we want to use all the variables for the analysis, if we did not need all the variables then we would have to modify the data frame to work with a subset of the variables.

Below we have a factor analysis output. Let us investigate it. Don’t panic, we are not interested (at least immediately) everything that is written in the report.

fact_result_nr <- fa(toothpaste,nfactors = 2,rotate = "none")
fact_result_nr
## Factor Analysis using method =  minres
## Call: fa(r = toothpaste, nfactors = 2, rotate = "none")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                                MR1   MR2   h2    u2 com
## prevents_cavities             0.95  0.17 0.93 0.071 1.1
## shiny_teeth                  -0.21  0.72 0.56 0.439 1.2
## strengthens_gums              0.91  0.04 0.84 0.164 1.0
## freshens_breath              -0.25  0.73 0.60 0.401 1.2
## decay_prevention_unimportant -0.85 -0.26 0.79 0.211 1.2
## attractive_teeth             -0.10  0.85 0.72 0.275 1.0
## 
##                        MR1  MR2
## SS loadings           2.57 1.87
## Proportion Var        0.43 0.31
## Cumulative Var        0.43 0.74
## Proportion Explained  0.58 0.42
## Cumulative Proportion 0.58 1.00
## 
## Mean item complexity =  1.1
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  15  and the objective function was  4.25 with Chi Square of  238.93
## The degrees of freedom for the model are 4  and the objective function was  0.22 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.05 
## 
## The harmonic number of observations is  60 with the empirical chi square  1.1  with prob <  0.89 
## The total number of observations was  60  with Likelihood Chi Square =  11.82  with prob <  0.019 
## 
## Tucker Lewis Index of factoring reliability =  0.866
## RMSEA index =  0.18  and the 90 % confidence intervals are  0.067 0.306
## BIC =  -4.56
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    MR1  MR2
## Correlation of (regression) scores with factors   0.98 0.92
## Multiple R square of scores with factors          0.95 0.85
## Minimum correlation of possible factor scores     0.91 0.70

The first table gives us the most vital values: loadings (MR1 and MR2), communalities (h2) and uniqueness (u2).

The second table present different measures for variance explained. The metric of interest is, “cumulative variance”. We can see here that two factors explain 74% of the total variance in the data set! Which is a great result.

Let’s talk about loadings and how central they are to a factor analysis model. Remember that each loading is a correlation value between the given variable and the factor. Let us extract loadings. I’ll demonstrate how you can get communalities and eigenvalues, which correspond to how explained a variable is and how explanatory a factor is, respectively. We can square the loadings and sum squared loadings for each variable to find communalities, and sum squared loadings for each factor to find eigenvalues.

ldg <- as.data.frame(fact_result_nr$loadings[]) #this is some weird syntax, don't dwell on it. 
knitr::kable(ldg,col.names = c("Factor 1","Factor 2")) %>% kableExtra::kable_styling()
Factor 1 Factor 2
prevents_cavities 0.9493111 0.1679548
shiny_teeth -0.2062724 0.7201383
strengthens_gums 0.9134100 0.0382759
freshens_breath -0.2461427 0.7336414
decay_prevention_unimportant -0.8493783 -0.2591322
attractive_teeth -0.1010156 0.8452070
# the pipe operator ( %>% ) is an essential component in modern R. You can think of it as applying another layer to whatever object you are working with.
as.matrix(ldg)^2 %>%  knitr::kable(col.names = c("Squared F1","Squared F2")) %>% kableExtra::kable_styling()
Squared F1 Squared F2
prevents_cavities 0.9011915 0.0282088
shiny_teeth 0.0425483 0.5185991
strengthens_gums 0.8343179 0.0014650
freshens_breath 0.0605862 0.5382298
decay_prevention_unimportant 0.7214435 0.0671495
attractive_teeth 0.0102041 0.7143748

The first table above shows us the loadings, second the squared loadings. Squared loadings correspond to the variance in a particular variable explained by a particular factor, whereas loadings correspond the correlations between each variable and each factor. The table below, on the other hand, shows us the sum of squared loadings for each factor. Notice that they are identical to the results in the factor analysis output.

as.matrix(ldg)^2  %>% apply(MARGIN = 2,sum) %>% knitr::kable(col.names = c("Eigenvalues")) %>% kableExtra::kable_styling()
Eigenvalues
MR1 2.570292
MR2 1.868027

For communalities you can take a look at the table below. Notice, again, that they are identical to \(h^2\) results in the original factor analysis output above.

as.matrix(ldg)^2  %>% apply(MARGIN = 1,sum) %>% knitr::kable(col.names = c("Communalities")) %>% kableExtra::kable_styling()
Communalities
prevents_cavities 0.9294004
shiny_teeth 0.5611474
strengthens_gums 0.8357829
freshens_breath 0.5988160
decay_prevention_unimportant 0.7885930
attractive_teeth 0.7245790


Rotations

refer to http://static.esmt.org/pub/edtech/customer_perceptions.html for a detailed account of factor rotations.


Factors Scores

Factors are readily available in the factor analysis output object -unless otherwise stated-, and can be accessed as follows:

fac_scores <- fact_result_nr$scores
knitr::kable(fac_scores) %>% kableExtra::kable_styling() %>%  kableExtra::scroll_box(width = "100%", height = "200px")
MR1 MR2
1.3371546 0.0777128
-1.2042479 -0.5292726
1.3008765 -0.5992375
-0.0574518 0.9478862
-1.1395018 -1.6829693
1.0325559 -0.0398628
0.6441449 -0.8122027
1.1751724 0.1242133
-0.6560086 -0.7243674
-1.3677483 1.1414842
1.1696141 -0.4827841
-1.0194113 -0.3512548
1.4857131 -0.4212196
-0.1482193 1.2575362
-1.1669198 -0.9556045
0.9566932 -0.1001608
0.7074943 -0.3944878
1.5351708 0.0650375
-0.8408452 -0.9023852
-0.6532255 1.1651669
-1.1315521 -1.0885031
0.5527955 0.0797553
-0.9435486 -0.2909567
-0.2991692 1.7689997
0.8760687 0.1037265
-0.5483681 1.4746727
0.5883985 -0.1673218
-0.8390729 0.4692681
-0.3202141 2.1142582
-1.0263481 -1.2471268
1.3371546 0.0777128
-1.2042479 -0.5292726
1.3008765 -0.5992375
-0.0574518 0.9478862
-1.1395018 -1.6829693
1.0325559 -0.0398628
0.6441449 -0.8122027
1.1751724 0.1242133
-0.6560086 -0.7243674
-1.3677483 1.1414842
1.1696141 -0.4827841
-1.0194113 -0.3512548
1.4857131 -0.4212196
-0.1482193 1.2575362
-1.1669198 -0.9556045
0.9566932 -0.1001608
0.7074943 -0.3944878
1.5351708 0.0650375
-0.8408452 -0.9023852
-0.6532255 1.1651669
-1.1315521 -1.0885031
0.5527955 0.0797553
-0.9435486 -0.2909567
-0.2991692 1.7689997
0.8760687 0.1037265
-0.5483681 1.4746727
0.5883985 -0.1673218
-0.8390729 0.4692681
-0.3202141 2.1142582
-1.0263481 -1.2471268

We now have reduced our data set of 6 variables, effectively to just 2 variables, while still retaining the 74% of the variance that exists in the original data set. We are now ready to use these factors as inputs in a different analysis! Note that all the factors have 0 mean and 1 variance, as this is how they are constructed.


Questions

These are the questions I gathered from the whiteboard in section 2.7

  • Is communality and eigenvalue are the same?

No, communalities represent how well the original variables are explained by the factors constructed, whereas eigenvalues exist to show us how explanatory a factor is.

  • Why common factors has a mean = 0 and variance=1?

Aside from between factor standardization (e.g. making it easier to compare the first factor to the second), having a zero mean and a variance of 1 makes it much easier to compute different components of factor analysis. If you are interested, you can take a look at any textbook dealing with factor analysis. Moreover, not only common factors have zero mean and unit variance but unique factors have the same properties as well.

  • What is the sequential order in factor analysis? (eg: first determine which factors to extract by looking at eigenvalue, second give factors a name (if ambiguous apply rotation), third validate if total variance explained by selected factors is satisfactory)

The workflow is pretty much given above but yes this is more or less how it happens. But we don’t have to extract factors to see how much variance is explained by the selected number of factors, that’s why we use the scree plot in the first place.

  • Could you share an example where oblique rotation is more useful rather than orthogonal rotation?

This depends on whether factors are correlated with each other or not.

  • Why is it assumed that the total number of possible factors equals the total number of variables?

Well, that is because the correlation matrix of the variables have the dimension \(n \times n\), where \(n\) is the number of variables we have. Recall that the correlation matrix is the cornerstone to building a factor analysis model. We just don’t have any information to build more factors than there are variables in our data set.

  • Why in the toothpaste problem we assume from the beginning there are just two factors? Shouldn´t we have considered that there might be 6 possible factors until we have extracted the Eigenvalues and extracted the scree plots?

We can see how many factors would make sense to use by taking a look at the scree plot, that doesn’t require us to extract the factors. Besides, if we go for \(n\) factors when there are \(n\) variables, we are not really pursuing any goal that can be pursued with factor analysis: we’re not reducing dimensions and we’re not summarizing anything.

  • Do we work with the factor scores with the values before or after the rotation?

We can still have factor scores without applying any rotation, that would be the “no rotation” case. With a rotation, however, yes we would first have to apply rotation to work with rotated factor scores.

  • How could we work with them if there are more than two factors?

We are not only interested in visualization here. Imagine you want to run a regression model: the principle of parsimony tells us that we should use as few variables as possible when building a model. Factor analysis provides us with a method here for that. This becomes even crucial if you are dealing with huge data sets with many variables. Bonus concept: https://en.wikipedia.org/wiki/Occam%27s_razor

  • Should we use the orthogonal rotation as our preferred one and only go to the oblique rotation if the first one does not work?

Well, this is somehow related to my answer just above, if you can, don’t make it more complicated than it is. If the factors are uncorrelated then it does not really make sense to use an oblique rotation. It would just be against the principle of parsimony. There could be some cases in which it would be useful, but in general, I would not do it.