This script will be presented at JADT2022 http://jadt2022.vadistat.org/
Order constrained clustering with local stopping rules. Application in textual analysis
Ramón Alvarez-Esteban University of León – firstname.lastname@example.org
Mónica Bécue-Bertaut Universitat Politècnica de Catalunya – email@example.com
The documents of a corpus can be related to an order variable, for example the chronology. This order may explain the variability in the vocabulary. When the vocabulary is highly related to order, correspondence analysis (CA) visualizes the document trajectory on the first factorial axes. Further, hierarchical contiguity constrained clustering methods (HCCC), starting from the document coordinates on these axes, will group the documents only if they are contiguous. However, HCCC may group heterogeneous documents, in terms of their content, because over-emphasis is given to order. To obtain a partition respecting the thematic differences, we propose to include in the algorithm a rule which stops the fusion between two nodes when considered as heterogeneous, meaning that a discontinuity has been detected. HCCC with local stopping rule is applied to Spanish political speeches to illustrate how it works in textual analysis.
Keywords: Hierarchical contiguity constrained clustering, local stopping rule, textual clustering.
4. Application on a corpus of Spanish investiture speeches
4.1. The corpus