Spanish Discourses _Pg1

1. The Database

Download the database

First of all, we must download the SpanishDisc R data file from the following internet URL and save it in our computer: http://www.xplortext.org/Rdata/SpanishDisc.RData

Load RData file in R

The database will be loaded in R format (data frame) or import it if it is in another format. In our case, the SpanishDisc.RData database was saved in C:/RData directory. SpanishDisc database containing a data frame called SpanishDisc.

load("C:/RData/SpanishDisc.RData")

Or:

load(url("http://www.xplortext.org/Rdata/SpanishDisc.RData"))

Data frame structure

Given the SpanishDisc data frame object, the best way to understand the data structure is to use str().

str(SpanishDisc)

Here we are interested in the “investiture speeches” of the candidates for the presidency of the Spanish government pronounced since the re-establishment of democracy. The date of this re-establishment is generally considered to be that of the referendum for the ratification of the Spanish Constitution, that is to say on December 6, 1978. For the record, the long dictatorship of General Franco ended with his died November 20, 1975 at the age of 83 years.

We present 11 investiture speaches (rows) and 7 variables (columns): > SpanishDisc$text

This texts were taken from the version published in the Diario de Sesiones del Congreso de los Diputados (Journal of Sessions of the Congress of Deputies published by the Spanish Parliament).

SpanishDisc$title

Title defined as factor with the number of the speech, the name of the politician and the date.

SpanishDisc$chronology

The number of the speech defined as a number.

SpanishDisc$acronym

Label with four characters (name of the politician and year) usefull to represent the speech in tables and graphs,

SpanishDisc$name

Name of the politician defined as factor. There are 6 different politicians (Aznar, CalvoSotelo, González, Rajoy, Suárez, Zapatero) and 11 investiture speeches.

SpanishDisc$politicparty

Name of the political party defined as factor. There are 3 different political parties (PP, PSOE, UCD).

SpanishDisc$year

The year is defined as numerical variable from 1979 until 2011.

Pretreatment

In order to preserve the capital letters introduced in the corpus at the moment of their capture by the “Diario de Sesiones del Congreso de los Diputados” and the semantic information they provide, capital letters at the beginning of the sentence have been manually eliminated in the database. Those which are preserved serve, in general, to differentiate homographs. Thus, “Gobierno” (the government) is differentiated from “gobierno” (I govern). It will be necessary to specify in the script that it is desired to keep the capital letters.