Constructing and interrogating corpora using heterogeneous datasets
Constructing and interrogating corpora using heterogeneous datasets (half day workshop)
This 2-hour workshop aims to outline a novel approach for the construction and analysis of multi-modal natural language datasets. It draws upon development made as part of two projects based at the
During the workshop we will provide a real-time demonstration of key features of DRS, and will discuss how this free software can provide an ideal platform for constructing bespoke multi-modal corpus datasets. In particular, we will guide participants through the processes of organising, coding and arranging datasets for re-use within this tool (using the DRS 'track viewer'), and how the data can be navigated and manipulated for Corpus Linguistic (CL) based analysis.
We will further showcase the novel multi-modal concordancing facility that has been integrated within the DRS interface. In addition to providing standard mono-modal concordance facilities that are commonly available with current corpora (i.e. to conduct text based searches of data), this concordancer is capable of interrogating data constructed from textual transcriptions anchored to video and/or audio, and from coded annotations of specific features of gesture-in-talk. In other words, once presented with a list of occurrences (concordances), and their surrounding context, the analyst may jump directly to the temporal location of each occurrence within the video or audio clip to which the annotation pertains.
Following this demonstration, participants will have the opportunity to test-drive DRS for themselves (guidelines for use will be provided), and to ask any technical questions that might arise as a result of this. A range of practical, methodological and ethical challenges and considerations will also be discussed here. Participants are also encouraged to provide feedback (and any related questions) on the system and to fuel discussions on the potential long-term applications of such a tool in the future of CL research (encouraging participants to draw upon their own research experiences to contextualise their ideas/ feedback).
As an extension to the work on multi-modal representation of spoken discourse, we will also briefly discuss how DRS is currently being adapted to support the collection and collation of more heterogeneous datasets, including SMS messages, MMS messages, interaction in virtual environments, GPS data and face-to-face situated discourse, as part of the DReSS II project (preliminary datasets collected as part of this new project will be showcased, within the DRS environment). The focus of this work is on enabling a more detailed investigation of the interface between a variety of different communicative modes from an individual's perspective, tracking a specific person's (inter)actions over time (i.e. across an hour, day or even week).

