Learning semantic role labeling via bootstrapping with unlabeled data

Rasoul, Samad Zadeh Kaljahi (2010) Learning semantic role labeling via bootstrapping with unlabeled data. Masters thesis, University of Malaya.

[img] PDF

Download (4MB)


Semantic role labeling (SRL) has recently attracted a considerable body of research due to its utility in several natural language processing tasks. Current state-of-the-art semantic role labeling systems use supervised statistical learning methods, which strongly rely on hand-crafted corpora. Creating these corpora is tedious and costly with the resulting corpora not representative of the language due to the extreme diversity of natural language usage. This research investigates self-training and co-training as two semi-supervised algorithms, which aim at addressing this problem by bootstrapping a classifier from a smaller amount of annotated data via a larger amount of unannotated data. Due to the complexity of semantic role labeling and a high number of parameters involved in these algorithms, several problems are associated with this task. One major problem is the propagation of classification noise into successive bootstrapping iterations. The experiments shows that the selection balancing and preselection methods proposed here are useful in alleviating this problem for self-training (e.g. 0.8 points improvement in 􀜨􀬵 for the best setting). In co-training, a main concern is the split of the problem into distinct feature views to derive classifiers based on those views to effectively co-train with each other. This work utilizes constituency-based and dependency-based views of semantic role labeling for co-training and verifies three variations of these algorithms with three different feature splits based on these views. Balancing the feature split to eliminate the performance gap between underlying classifiers proved to be important and effective. Also, co-training with a common training set for both classifiers performed better than with separate training sets for each of them, where the latter degraded the base classifier while the former could improve it by 0.9 􀜨􀬵 for the best setting. All the results show that much more unlabeled data is needed for these algorithms to be practically useful for SRL.

Item Type: Thesis (Masters)
Subjects: Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Date Deposited: 23 Jul 2013 06:32
Last Modified: 23 Jul 2013 06:32
URI: http://repository.um.edu.my/id/eprint/574

Actions (login required)

View Item View Item