Spanish Discourses _Pg4

4. Cluster from CA coordinates (three factors ncp=3)

Previous syntax

swu <- c("consiguiente", "ello", "hacia", "punto", "Señorías", "si", "Sus", "vista", "A", "B", "C", "D", "E",
"F", "a", "b", "c", "d")
TD <- TextData(SpanishDisc, var.text=c(1), context.quanti="year", Fmin=10, Dmin=2, idiom="es", lower=FALSE,
remov.number=TRUE,, stop.word.user=swu, graph=FALSE)


To select only the first 3 factors:

resLexCA <- LexCA(TD, ncp=3, graph=FALSE)


To cut the tree at the level the user clicks on use nb.clust=0 (the default). In this case, a suggested level is provided.

If -1, the tree is automatically cut at the suggested level.

If a (positive) integer, the tree is cut with nb.clust clusters (for example nb.clust=4 provides in this case the same result as the automatic cut nb.clust=-1):

res.ccah <- LexCHCca(resLexCA, nb.clust=4, graph=TRUE)

Cluster description using hierarchical words

res.Label function extracts the hierarchical characteristic words associated to the nodes of a chronological hierarchical tree; the characteristic words of each node are extracted, then each word is associated to the node that it best characterizes. The argument “proba” is a threshold on the p-value to select the characteristic words(by default 0.05):

res.Label <- LabelTree(res.ccah, proba=0.0005)

Hierarchical Agglomerative Clustering without Contiguity-Constrained

res.HCca <- LexHCca(resLexCA, nb.clust=-1)