Script Use of Lexicometry in Sensometrics

4.4. MFACT. Global significance of the results

MFACT allows us to obtain an average of the individual configurations, and to place each one in relation to this average, thereby providing elements for comparison.

We will use the MFA function of FactoMineR package:

MFA (base, group, type = rep("s",length(group)), excl = NULL, ind.sup = NULL, ncp = 5, name.group = NULL,
num.group.sup = NULL, graph = TRUE, weight.col.mfa = NULL, row.w = NULL, axes = c(1,2), tab.comp=NULL)

base parameter

base is a a data frame joining column-wise:

1.- Dataframe TMul24, (multiple table 8 wines x 393 words). 393 is the number of columns (variables) yuxtaposing the 24 DocumentTerm tables. These variables shall be considered active variables.

dim(TMul24)

2.- The DocumentTerm matrix for 15 French judges (8 wines x 137 terms) ; as.matrix(sum.TD.Fr15$DocTerm)

sum.TD.Fr15$DocTerm

3.- The DocumentTerm matrix for 9 Catalan judges (8 wines x 95 terms) ; as.matrix(sum.TD.Fr15$DocTerm)

sum.TD.Cat9$DocTerm

4.- The average scores for each wine from French (FrScore, position 25) and Catalan (CatScore, position 26)

base[,25:26]

We join these four objects in the multiple table (8 x 627) MFA.Data24:

MFA.Data24 <- cbind(TMul24, as.matrix(sum.TD.Fr15$DocTerm),
as.matrix(sum.TD.Cat9$DocTerm), base[,25:26])
dim(MFA.Data24)

Tmul24 will be the active table (group). The rest (French sum table -sum.TD.Fr15$DocTerm-, Catalan sum table -sum.TD.Cat9$DocTerm- and scores) will be supplementary groups.

 

group parameter

group is a vector with the number of variables in each group.

We have 24 groups or DocumentTerm tables.

French panel

The first group is composed by the 15 French judges. # To build a vector with the number of words used for each French judge:

cols.Fr15 <- unlist(lapply(1:15, function(i) res.TD.Fr.list[[i]]$DocTerm$ncol))
cat(cols.Fr15)

The names of 15 French judges:

cat(names(res.TD.Fr.list))

This way, to obtain the first juxtaposed table for FE5 French judge

cat(names(baseFr)[1])

with texts:

baseFr[1]

has 23 different words:

cat(res.TD.Fr.list[[1]]$DocTerm$ncol)

colnames(res.TD.Fr.list[[1]]$DocTerm)

with a total of 43 occurrences

sum(res.TD.Fr.list[[1]]$DocTerm)

The number total of the words of the French active group is 262:

cat(sum(cols.Fr15))

Catalan panel

The second group is the Catalan group (9 judges) with the number of words for each judge:

cols.Cat9 <- unlist(lapply(1:9, function(i) res.TD.Cat.list[[i]]$DocTerm$ncol))
cat(cols.Cat9)

The total number of columns for Catalan group is 131:

sum(cols.Cat9)

Grouping the two vectors we hace 24 judges and 393 words:

ColTab24 <- c(cols.Fr15, cols.Cat9)
length(ColTab24)

sum(ColTab24)

The number of columns for the French DocTerm is 137m and 95 for the Catalan DocTerm.
The last two positions correspond to the average scores of the French and Catalan judges.

posit.groups <- c(ColTab24,
ncol(sum.TD.Fr15$DocTerm), # 137
ncol(sum.TD.Cat9$DocTerm), # 95
2)
cat(posit.groups)

type parameter

There are four possibilities to select the type of groups of variables (columns) of the tables:

"c" or "s" for quantitative variables/groups (the difference is that for "s" variables are scaled to unit variance), "n" for categorical variables and "f" for frequencies (from a contingency tables).
By default, all variables are quantitative and scaled to unit variance

It will be 24 variables or judges for frequencies, two "f" for the sum tables for French and Catalan, and finally "s" for the FrScore and CatScore group,

quantitative variables scaled to unit variance that will be noted as "Liking.score":

type=c(rep ('f',24),"f","f","s")

ncp parameter

Where ncp is the number of dimensions to keep, in this case 8.

name-group parameter

name.group is a vector containing the names of the groups.

num.group.sup parameter

The indexes of the illustrative groups (by default, NULL and no group are illustrative).

In our case: num.group.sup=c(27)

The complete MFA function:

res.mfact.24 <- FactoMineR::MFA(MFA.Data24,group=posit.groups,
ncp=8,
type=c(rep ('f',24),"f","f","s"),
name.group=c(names(res.TD.Fr.list), names(res.TD.Cat.list),
"SumTable_Fr", "SumTable_Cat","Liking.score"),
num.group.sup=c(25,26,27),graph=FALSE)
Results for MFA and 24 judges

Correlation between judges and dimensions:

res.mfact.24$group$correlation

The preliminary analysis led us to assign a non-active role to the French judge FP2 because he/she does not share the first global dispersion direction, common to all the other judges, as shown by the very low value of the corresponding canonical # correlation computed by MFACT. Thus, only 23 individual tables are kept.

Correlation between FP2 judge and dimensions:

res.mfact.24$group$correlation["FP2",]

As can be seen in the following table, the first judge FP2 has a very low value for the first dimension (0.2218152).

res.mfact.24$group$correlation[order(res.mfact.24$group$correlation[,"Dim.1"]),]

For this reason we repeat the MFA eliminating the French judge FP2.

TMul24 multiple table is now TMul23 taking into account. For this reason we repeat the MFA eliminating the French judge FP2 taking into account that this judge ranks 15th in the dataframe. The dimension is 8 wines x 256 words for the French table and 8 x 387 for the French and Catalan table Tmul23:

TMulFr14<-do.call(cbind, lapply(lapply(1:14, function(i) as.matrix(res.TD.Fr.list[[i]]$DocTerm)), unlist))
TMulFr14 <- data.frame(TMulFr14, check.names=TRUE)
cat(dim(TMulFr14))

TMul23 <- cbind(TMulFr14, TMulCat9)
cat(dim(TMul23))

Sub table for the 14 French judges:

sum.TD.Fr14 <- TextData(baseFr, var.text=c(1:14), Fmin=1,stop.word.user = str.Fr.stopworduser)
sum.TD.Fr14$DocTerm

It has 595 occurrences, 135 distinct words retained after stopwords:

sum.TD.Fr14$summGen

To build the data frame MFA.Data23 with the 8 rows (wines) and 619 columns (variables)

MFA.Data23 <- cbind(TMul23, as.matrix(sum.TD.Fr14$DocTerm),
as.matrix(sum.TD.Cat9$DocTerm), base[,25:26])
cat(dim(MFA.Data23))

To compute the positions (in this case the same vector as cols.Fr14 but eliminating the last position from FP2):

cols.Fr14 <- unlist(lapply(1:14, function(i) res.TD.Fr.list[[i]]$DocTerm$ncol))
cat(cols.Fr14)

Grouping the two vectors (French and Catalan) (23 judges and 387 words)

ColTab23 <- c(cols.Fr14, cols.Cat9)
cat(length(ColTab23))

cat(sum(ColTab23))

Joining the group positions:

posit.groups.23 <- c(ColTab23,
ncol(sum.TD.Fr14$DocTerm),
ncol(sum.TD.Cat9$DocTerm),
2)
cat(posit.groups.23)

cat(length(posit.groups.23))

The new MFA for the 23 judges:

res.mfact.23 <- FactoMineR::MFA(MFA.Data23,group=posit.groups.23,
ncp=8,
type=c(rep ('f',23),"f","f","s"),
name.group=c(names(res.TD.Fr.list)[1:14], names(res.TD.Cat.list),
"SumTable_Fr","SumTable_Cat","Liking.score"),
num.group.sup=c(24,25,26),graph=FALSE)
cat(dim(MFA.Data23))

names(res.mfact.23)

 

The inertia of the first factor is equal to 13.3 (13.268381). It should be noted that the maximum value would be 23, that is, the number of active judges (= active groups). Thus, the first global axis of the MFACT does not correspond to the first direction of inertia in each of the 23 active individual configurations.

Eigenvalues and barplot
res.mfact.23$eig

barplot(res.mfact.23$eig[,1], main="Eigenvalues")

Nevertheless, the correlation coefficient between the first factor and the projection of the 23 configurations on this axis, called the canonical correlation coefficient in MFACT , is over 0.70 for 19 of the judges.

round(res.mfact.23$group$correlation,2)

cat(nrow(res.mfact.23$group$correlation[res.mfact.23$group$correlation[,1] > 0.70,]))

The inertia of the second factor is equal to 12.1. The canonical correlation coefficients are over 0.70 for 18 judges, so this second axis also corresponds to the direction of inertia present in the majority of the individual configurations.

cat(nrow(res.mfact.23$group$correlation[res.mfact.23$group$correlation[,2] > 0.70,]))

Some numerical results of MFACT

RV coefficients. The RV respective values are 0.96 (French sum table) and 0.98 (Catalan sum table), while 1 indicates a perfect homothety:

round(res.mfact.23$group$RV,2)

round(res.mfact.23$group$RV[24:27, 24:27],2)

Lg coefficients:

round(res.mfact.23$group$Lg,2)

Contributions:

round(res.mfact.23$group$contrib,2)

And the inertia:

round(res.mfact.23$inertia.ratio,2)