JADT 2022. 2. Cluster analysis and contiguity constrained clustering methods

2. Cluster analysis and contiguity constrained clustering methods

Hierarchical clustering (HC) documents can use as input data the coordinates of the documents as placed in a Euclidean space by CA. Differences among the clusters can correspond to thematic variations over time. If the chronology effect is strongly dominant, the nodes of the dendrogram built by the HC will group only contiguous elements. To force the effect of chronology to be taken into account, we can resort to hierarchical contiguity constrained clustering methods (HCCC; Lebart, 1978, Legendre and Legendre, 2012, Borcard et al., 2018, Gançarski et al., 2020), being the contiguity defined by the chronology. HCCC only groups contiguous nodes. Generally, the dendrograms built from both HC and HCCC exhibit differences due to the evolution of the vocabulary over time. In any case, it is very important to understand the information provided by these differences in relation to the results provided by CA.