Script Use of Lexicometry in Sensometrics

4.1.2. Mean words used by the judges

In order to know the words used by each taster, it is necessary to transpose the DocTerm matrix (documents x terms) to TermDoc (terms x documents).

To load Xplortext:

library(Xplortext)

Loading required package: FactoMineR
Loading required package: ggplot2
Loading required package: tm
Loading required package: NLP

French panel

15 French judges and 8 wines:

t.baseFr <- as.data.frame(t(baseFr)) # transposing French DocTerm to TermDoc
dim(t.baseFr)

15 8

15 French judges use between 6 and 43 different words.

Number of different words for the 15 French judges:

res.TD.TasterFr <-TextData(t.baseFr,var.text=c(1:ncol(t.baseFr)), stop.word.tm=FALSE,Fmin=1)
cat(res.TD.TasterFr$summDoc$DistinctWords.before)

24 10 12 24 8 9 43 15 27 34 20 12 21 14 6

Minimum number of words used by French judges:

cat(min(res.TD.TasterFr$summDoc$DistinctWords.before))

Maximum number of words used by French judges:

cat(max(res.TD.TasterFr$summDoc$DistinctWords.before))

Overall, they produce 655 occurrences (611 retained without deleting stopwords) issued from 149 distinct words (137 retained after) for French judges.

res.TD.TasterFr.Before <-TextData(t.baseFr,var.text=c(1:ncol(t.baseFr)), stop.word.tm=FALSE,Fmin=1)
res.TD.TasterFr$summGen # 655 ocurrences, 149 distinct words

             Before  After
Documents     15.00  15.00
Occurrences  655.00 655.00
Words        149.00 149.00
Mean-length   43.67  43.67

To select the French vocabulary without stopwords:

str.Fr.stopworduser <-c("à","d","de","du","en","et","la","le","par","sur","un","une")
res.TD.TasterFr.After <-TextData(t.baseFr, var.text=c(1:ncol(t.baseFr)), stop.word.tm=FALSE, stop.word.user = str.Fr.stopworduser,Fmin=1)
res.TD.TasterFr.After$summGen # 611 ocurrences, 137 distinct words retained after stopwords

            Before  After
Documents    15.00  15.00
Occurrences 655.00 611.00
Words       149.00 137.00
Mean-length  43.67  40.73

Catalan panel

9 Catalan judges and 8 wines:

t.baseCat <- as.data.frame(t(baseCat)) # transposing Catalan DocTerm to TermDoc
cat(dim(t.baseCat))

9 8

9 Catalan judges use between 12 and 22 different words:

res.TD.TasterCat <-TextData(t.baseCat,var.text=c(1:ncol(t.baseCat)), stop.word.tm=FALSE,Fmin=1)
cat(res.TD.TasterCat$summDoc$DistinctWords.before)

17 14 12 14 16 15 22 10 13

Minimum number of words used by Catalan judges

cat(min(res.TD.TasterCat$summDoc$DistinctWords.before))

Maximum number of words used by Catalan judges

cat(max(res.TD.TasterCat$summDoc$DistinctWords.before))

Overall, they produce 323 occurrences (323 retained without deleting stopwords) issued from 97 distinct words (95 retained after) for Catalan judges.

res.TD.TasterCat.Before <-TextData(t.baseCat,var.text=c(1:ncol(t.baseFr)), stop.word.tm=FALSE,Fmin=1)
res.TD.TasterCat$summGen # 323 ocurrences 97 distinct words

            Before  After
Documents     9.00   9.00
Occurrences 323.00 323.00
Words        97.00  97.00
Mean-length  35.89  35.89

To select the Catalan vocabulary without stopwords:

str.Cat.stopworduser <-c("de", "en", "la","le","un","une","amb","i")
res.TD.TasterCat.After <-TextData(t.baseCat, var.text=c(1:ncol(t.baseCat)), stop.word.tm=FALSE, stop.word.user = str.Cat.stopworduser,Fmin=1)
res.TD.TasterCat.After$summGen # 319 ocurrences, 95 distinct words retained after stopwords

            Before  After
Documents     9.00   9.00
Occurrences 323.00 319.00
Words        97.00  95.00
Mean-length  35.89  35.44