4.1.2. Mean words used by the judges
In order to know the words used by each taster, it is necessary to transpose the DocTerm matrix (documents x terms) to TermDoc (terms x documents).
To load Xplortext:
library(Xplortext)
Loading required package: FactoMineR Loading required package: ggplot2 Loading required package: tm Loading required package: NLP
French panel
15 French judges and 8 wines:
t.baseFr <- as.data.frame(t(baseFr)) # transposing French DocTerm to TermDoc
dim(t.baseFr)
15 8
15 French judges use between 6 and 43 different words.
Number of different words for the 15 French judges:
res.TD.TasterFr <-TextData(t.baseFr,var.text=c(1:ncol(t.baseFr)), stop.word.tm=FALSE,Fmin=1)
cat(res.TD.TasterFr$summDoc$DistinctWords.before)
24 10 12 24 8 9 43 15 27 34 20 12 21 14 6
Minimum number of words used by French judges:
cat(min(res.TD.TasterFr$summDoc$DistinctWords.before))
6
Maximum number of words used by French judges:
cat(max(res.TD.TasterFr$summDoc$DistinctWords.before))
43
Overall, they produce 655 occurrences (611 retained without deleting stopwords) issued from 149 distinct words (137 retained after) for French judges.
res.TD.TasterFr.Before <-TextData(t.baseFr,var.text=c(1:ncol(t.baseFr)), stop.word.tm=FALSE,Fmin=1)
res.TD.TasterFr$summGen # 655 ocurrences, 149 distinct words
Before After Documents 15.00 15.00 Occurrences 655.00 655.00 Words 149.00 149.00 Mean-length 43.67 43.67
To select the French vocabulary without stopwords:
str.Fr.stopworduser <-c("à","d","de","du","en","et","la","le","par","sur","un","une")
res.TD.TasterFr.After <-TextData(t.baseFr, var.text=c(1:ncol(t.baseFr)), stop.word.tm=FALSE, stop.word.user = str.Fr.stopworduser,Fmin=1)
res.TD.TasterFr.After$summGen # 611 ocurrences, 137 distinct words retained after stopwords
Before After Documents 15.00 15.00 Occurrences 655.00 611.00 Words 149.00 137.00 Mean-length 43.67 40.73
Catalan panel
9 Catalan judges and 8 wines:
t.baseCat <- as.data.frame(t(baseCat)) # transposing Catalan DocTerm to TermDoc
cat(dim(t.baseCat))
9 8
9 Catalan judges use between 12 and 22 different words:
res.TD.TasterCat <-TextData(t.baseCat,var.text=c(1:ncol(t.baseCat)), stop.word.tm=FALSE,Fmin=1)
cat(res.TD.TasterCat$summDoc$DistinctWords.before)
17 14 12 14 16 15 22 10 13
Minimum number of words used by Catalan judges
cat(min(res.TD.TasterCat$summDoc$DistinctWords.before))
10
Maximum number of words used by Catalan judges
cat(max(res.TD.TasterCat$summDoc$DistinctWords.before))
22
Overall, they produce 323 occurrences (323 retained without deleting stopwords) issued from 97 distinct words (95 retained after) for Catalan judges.
res.TD.TasterCat.Before <-TextData(t.baseCat,var.text=c(1:ncol(t.baseFr)), stop.word.tm=FALSE,Fmin=1)
res.TD.TasterCat$summGen # 323 ocurrences 97 distinct words
Before After Documents 9.00 9.00 Occurrences 323.00 323.00 Words 97.00 97.00 Mean-length 35.89 35.89
To select the Catalan vocabulary without stopwords:
str.Cat.stopworduser <-c("de", "en", "la","le","un","une","amb","i")
res.TD.TasterCat.After <-TextData(t.baseCat, var.text=c(1:ncol(t.baseCat)), stop.word.tm=FALSE, stop.word.user = str.Cat.stopworduser,Fmin=1)
res.TD.TasterCat.After$summGen # 319 ocurrences, 95 distinct words retained after stopwords
Before After Documents 9.00 9.00 Occurrences 323.00 319.00 Words 97.00 95.00 Mean-length 35.89 35.44