Linguistic-computing methods for analysing digital records of learning.
Richard Forsyth, Shaaron Ainsworth, David Clarke, Pat Brundell and Claire O'Malley.
School of Psychology, University of Nottingham, NG7 2RD.
Correspondence: rsf@psychology.nottingham.ac.uk
Social scientists face an overload of digitized information. In particular, they must often spend inordinate amounts of time coding and analyzing transcribed speech. This paper describes a study, in the field of learning science, of the feasibility of semi-automatically coding and scoring verbal data. Transcripts from 48 individual learners comprising 2 separate data sets of 44,000 and 23,000 words were used as test domains for the investigation of three research questions: (1) how well can utterancetype codes assigned to text segments by humans be predicted from the linguistic characteristics of those text segments? (2) how well can learning outcomes be predicted from learners' verbalizations? (3) can the material students are learning from be identified from their language? Initial results indicate that the answers to the third question is yes; and that the answer to the first two questions is: well enough to warrant further development of the text-mining techniques so far employed.
