Inventories

Legislative text corpora inventory

Contributors: Miklós Sebők, Sven-Oliver Proksch, Christian Rauh, Anna Székely, Ágnes Dinnyés, Eszter Lancsár, Jan Schwalbach, Alexander Dalheimer

The long-term objective of WP5 is to compile a database which provides a comprehensive, easy-to-use collection of legislative speeches and legislative documents covering all EU member countries, the most important EU institutions, as well as the United Kingdom and Israel.

In creating this database within the first 6 months of the project, the following parts have already been compiled. The core product of this period is an inventory in the form of a spreadsheet which provides an overview of already existing collections of legislative texts. We collected data from all available sources, such as parliamentary websites and secondary sources created by scholars or NGOs. We identified the set of currently available sources - covering both primary archives and secondary data collections - by reviewing relevant academic literature, by scoping extant linguistic infrastructures (such as CLARIN), and by surveying the computational social science community via social media.

Users interested in further detail than provided here may also review the respective codebook or the technical reports specifying the major primary and secondary sources per country or supranational institution.

Further information on the individual parts can be found in Deliverable 5.1 available for download here.

Legislative text corpora inventory

Expansion of the inventory

We invite additional information on extant databases. We will make sure to update the database with your submission during our pre-scheduled maintenance periods. Thank you for your understanding.

Your submission to the inventory

Inventory for text corpora by political organizations

Contributors: Zachary Greene, Christoph Ivanusch, Pola Lehmann, Thomas Schober, Anthea Alberto, Tobias Burst, Swen Hutter, Heike Klüver, Sven Regel, Bernhard Weßels, Lisa Zehnter

WP4 aims to create a comprehensive database of texts by political parties and interest groups. The database should provide an overview of existing text collections, links to these collections and information on their content.

We have created an inventory, which functions as the initial version of the planned database. When creating this inventory, we have searched large-scale comparative projects (e.g. Comparative Agendas Project, Manifesto Project), data repositories (e.g. Harvard Dataverse, GESIS), national archives and potentially relevant literature for existing text collections. Links to these text collections as well as further information are stored in the inventory. We regard this inventory as a living document that will continue to grow over the lifetime of OPTED.

The inventory is available on this website and comes in the form of two spreadsheets. One spreadsheet focuses on texts by political parties, the other on texts by interest groups.

Users interested in further detail than provided here may also review the respective codebook.

Further information on the individual parts can be found in Deliverable 4.2 available for download here

Inventory for text corpora by political organizations

Inventory

Codebook

Expansion of the inventory

We invite additional information on extant databases. We will make sure to update the database with your submission during our pre-scheduled maintenance periods. Thank you for your understanding.

Your submission to the inventory

Inventory for news media

Contributors: Paul Balluff, Hajo Boomgaarden, Fabienne Lind, Annie Waldherr

The aim of WP3 is to provide an extensive overview about news sources, publicly available data collections, and methods specifically designed to obtain and work with journalistic, mass mediated political texts.

The WP3 inventory for media enables a quick and easy query of news sources or media organisations. All entries are interlinked and therefore, users can find out which databases hold the text data that they are looking for. The inventory is available under meteor.opted.eu.

Further information on the preparatory work towards the inventory can be found in Deliverable 3.1 available for download here.

Inventory for journalistic, mass mediated political texts

Meteor - Media Text Open Registry
Guidelines for inclusion of sources
Related Publication

Expansion of the inventory

We invite you to contribute to the inventory by adding entries to the database. For a detailed information on adding new sources please visit here.

Your submission to the inventory

Living hub for textual research in a multilingual world

Contributors: Christian Baden, Alona Dolinsky, Farzam Fanitabasi, Fabienne Lind, Christian Pipal, Martijn Schoonvelde, Guy Shababo, & Mariken A.C.G. van der Velden

The living hub created by WP6 is one-stop-shop for a wide variety of resources for multilingual textual research, including validity benchmarks and benchmarking data sets, tools, as well as an inventory of key issues that arise from the application of computational tools to different languages. In its present, first stage, it enables users to rapidly identify and access existing research that has applied different textual analysis methods to textual corpora in different languages. In particular, users can browse existing research by language, methodological approach, validation strategy, and the kind of variable under investigation. The living hub addresses both novice and experienced users of computational text analysis.

Further information on the individual parts can be found in Deliverable 6.1 available for download here.