University of Lincoln Library Dissertation Showcase - Comparing the results of topic models against the academic consensus on the primary motifs in a dataset of poetry

Topic models, which detect latent themes in a corpus of documents to group co-occurring keywords together in thematically comprehensible ways, were generated using the Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA) algorithms with three datasets of poetry from different time periods. A close reading of the results as well as a study to measure interpretability were used to measure which algorithm was the most successful at uncovering specific themes in each dataset established using relevant literary studies. Comparison between the two algorithms’ performances served to indicate which method was the most successful in modelling this highly figurative language. Our findings indicated that LDA generated the most thematically comprehensible topics, owing to improved performance in identifying context and polysemy in the vocabulary used throughout the corpora, as well as having more parameters available to tune and optimise performance.

PLEASE NOTE: You must be a member of the University of Lincoln to be able to view this dissertation. Please log in here.