Course content

The goal of the course is to introduce fundamental methodological skills for sampling, collecting, preparing, annotating and analyzing textual data in social science and the humanities. The students learn fundamental theoretical concepts and obtain practical experience of tools and methods, including the following: The relation between research questions, data collection and types of corpora. Text processing in the Unix shell. Regular expressions. Quantitative properties of language. Frequencies, occurrences and co-occurrences. Manual and data-driven annotation. Automatic annotation and analysis of text using existing tools.
Learning outcomes
For a grade pass on the course, students should be able to:
- describe and apply methods for sampling, collecting and pre-processing of (digitized) material for text corpora
- describe and apply digital methods for annotation and analysis of textual data, in order to answer a given set of research questions
Practical information
Attendance of at least 90% of all lectures and lab sessions is mandatory.
The course is examined through written lab reports.
Teaching activities include lectures and lab sessions.
NB. The course is offered on campus and online, in a hybrid environment.
Period: 2023-03-06 - 2023-05-07
Course dates: Link to TimeEdit
Language of instruction: English
Course director: Robert Östling
Course title in Swedish: Metoder för hantering av textdata
The course is offered by the Department of Linguistics.