JADT 2022. 1. Introduction

1. Introduction

Frequently, the corpus is made up of a series of ordered documents, either according to their production date, the age of the respondent in the case of open-ended questions or another order variable. Hereafter, we use the term chronology as generic. No matter what the chronological corpus is, new words appear, others disappear and the frequencies of the most of them change over time.

CA proves to be a fundamental tool for displaying lexical similarities between speeches and so identifying chronological relationships. When the chronology is highly related to the vocabulary, the first axes reflect the evolution of documents. However, measuring the importance of the chronology on the corpus structure is not easy when the number of axes is high (high number of documents and words), since the chronology can be distributed among several different factorial axes. In order to solve this problem, the contribution of hierarchical contiguity constrained clustering method (HCCC) is discussed in Section 2. Section 3 details a stopping rule to avoid grouping together heterogeneous contiguous documents. Section 4 illustrates this procedure in textual analysis by means of an application to investiture Spanish speeches.