Case Study of Analyzing the Variety of ETD Layouts
Loading...
Files
Date
2023-11-17
Journal Title
Journal ISSN
Volume Title
Publisher
INFLIBNET Centre, Gandhinagar
Abstract
Yet, segmenting such documents automatically and accurately is challenging in dealing
with various ETD layouts from different majors, disciplines, and universities. To automatically
segment and determine the chapter boundaries of those ETDs, we need to understand the
variation in document templates across various disciplines and universities. In this study, we
have performed a case study and manual quantitative research on the variation of ETD
layouts. We have taken into account several factors likely to affect the variation of ETD
layouts, such as STEM/non-STEM, university, department, major, and year of publication. We
have found that the layout tends to be similar within a university with slight variation among
the departments. The layouts tend to vary significantly across different universities. This is
likely to occur as each university library or graduate school typically provides an ETD
template. From our analysis of the numbering style of the chapter/section headings, we see
that STEM fields (specifically physics) prefer style 3. On the other hand, non-STEM areas,
such as education and English, prefer style 1. And Then, we performed the Chi-square(?2)
independency test to analyze the dependency of STEM or non-STEM fields on the numbering
styles. The p-value of the Chi-square independency test is <0.001. Thus, we have seen the
statistically significant dependency of the numbering style on STEM/non-STEM areas through
the independence test. The findings of this study can be used to further research in document
object extraction and natural language processing for machine reading.
Description
26th International Symposium, ETD 2023, Gandhinagar, Gujarat, 26-28 October, 2023
Keywords
Document Layout Analysis, ETD Layout Format, Numbering Style