Script Use of Lexicometry in Sensometrics

4.1.2. Mean words used by the judges

In order to know the words used by each taster, it is necessary to transpose the DocTerm matrix (documents x terms) to TermDoc (terms x documents).

To load Xplortext:

library(Xplortext)

French panel

15 French judges and 8 wines:

t.baseFr <- as.data.frame(t(baseFr)) # transposing French DocTerm to TermDoc
dim(t.baseFr)

15 French judges use between 6 and 43 different words.

Number of different words for the 15 French judges:

res.TD.TasterFr <-TextData(t.baseFr,var.text=c(1:ncol(t.baseFr)), stop.word.tm=FALSE,Fmin=1)
cat(res.TD.TasterFr$summDoc$DistinctWords.before)

Minimum number of words used by French judges:

cat(min(res.TD.TasterFr$summDoc$DistinctWords.before))

Maximum number of words used by French judges:

cat(max(res.TD.TasterFr$summDoc$DistinctWords.before))

Overall, they produce 655 occurrences (611 retained without deleting stopwords) issued from 149 distinct words (137 retained after) for French judges.

res.TD.TasterFr.Before <-TextData(t.baseFr,var.text=c(1:ncol(t.baseFr)), stop.word.tm=FALSE,Fmin=1)
res.TD.TasterFr$summGen # 655 ocurrences, 149 distinct words

To select the French vocabulary without stopwords:

str.Fr.stopworduser <-c("à","d","de","du","en","et","la","le","par","sur","un","une")
res.TD.TasterFr.After <-TextData(t.baseFr, var.text=c(1:ncol(t.baseFr)), stop.word.tm=FALSE, stop.word.user = str.Fr.stopworduser,Fmin=1)
res.TD.TasterFr.After$summGen # 611 ocurrences, 137 distinct words retained after stopwords

 

Catalan panel

9 Catalan judges and 8 wines:

t.baseCat <- as.data.frame(t(baseCat)) # transposing Catalan DocTerm to TermDoc
cat(dim(t.baseCat))

9 Catalan judges use between 12 and 22 different words:

res.TD.TasterCat <-TextData(t.baseCat,var.text=c(1:ncol(t.baseCat)), stop.word.tm=FALSE,Fmin=1)
cat(res.TD.TasterCat$summDoc$DistinctWords.before)

Minimum number of words used by Catalan judges

cat(min(res.TD.TasterCat$summDoc$DistinctWords.before))

Maximum number of words used by Catalan judges

cat(max(res.TD.TasterCat$summDoc$DistinctWords.before))

Overall, they produce 323 occurrences (323 retained without deleting stopwords) issued from 97 distinct words (95 retained after) for Catalan judges.

res.TD.TasterCat.Before <-TextData(t.baseCat,var.text=c(1:ncol(t.baseFr)), stop.word.tm=FALSE,Fmin=1)
res.TD.TasterCat$summGen # 323 ocurrences 97 distinct words

To select the Catalan vocabulary without stopwords:

str.Cat.stopworduser <-c("de", "en", "la","le","un","une","amb","i")
res.TD.TasterCat.After <-TextData(t.baseCat, var.text=c(1:ncol(t.baseCat)), stop.word.tm=FALSE, stop.word.user = str.Cat.stopworduser,Fmin=1)
res.TD.TasterCat.After$summGen # 319 ocurrences, 95 distinct words retained after stopwords