Course content

The goal of the course is to introduce fundamental methodological skills for sampling, collecting, preparing, annotating and analyzing textual data in social science and the humanities. The students learn fundamental theoretical concepts and obtain practical experience of tools and methods, including the following: The relation between research questions, data collection and types of corpora. Text processing in the Unix shell. Regular expressions. Quantitative properties of language. Frequencies, occurrences and co-occurrences. Manual and data-driven annotation. Automatic annotation and analysis of text using existing tools.

Learning outcomes

For a grade pass on the course, students should be able to:

  • describe and apply methods for sampling, collecting and pre-processing of (digitized) material for text corpora
  • describe and apply digital methods for annotation and analysis of textual data, in order to answer a given set of research questions

Practical information

Attendance of at least 90% of all lectures and lab sessions is mandatory.

The course is examined through written lab reports.

Teaching activities include lectures and lab sessions.

NB. The course is offered on campus and online, in a hybrid environment.

Period: 2023-03-06 - 2023-05-07

Course dates: TBA

Language of instruction: English

Course director: Robert Östling

Course title in Swedish: Metoder för hantering av textdata

The course is offered by the Department of Linguistics.