Please use this identifier to cite or link to this item:
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPark, Sung Hee-
dc.contributor.authorBanerjee, Bipasha2-
dc.contributor.authorIngram, William A.-
dc.contributor.authorFox, Edward A.-
dc.description26th International Symposium, ETD 2023, Gandhinagar, Gujarat, 26-28 October, 2023en_US
dc.description.abstractYet, segmenting such documents automatically and accurately is challenging in dealing with various ETD layouts from different majors, disciplines, and universities. To automatically segment and determine the chapter boundaries of those ETDs, we need to understand the variation in document templates across various disciplines and universities. In this study, we have performed a case study and manual quantitative research on the variation of ETD layouts. We have taken into account several factors likely to affect the variation of ETD layouts, such as STEM/non-STEM, university, department, major, and year of publication. We have found that the layout tends to be similar within a university with slight variation among the departments. The layouts tend to vary significantly across different universities. This is likely to occur as each university library or graduate school typically provides an ETD template. From our analysis of the numbering style of the chapter/section headings, we see that STEM fields (specifically physics) prefer style 3. On the other hand, non-STEM areas, such as education and English, prefer style 1. And Then, we performed the Chi-square(?2) independency test to analyze the dependency of STEM or non-STEM fields on the numbering styles. The p-value of the Chi-square independency test is <0.001. Thus, we have seen the statistically significant dependency of the numbering style on STEM/non-STEM areas through the independence test. The findings of this study can be used to further research in document object extraction and natural language processing for machine reading.en_US
dc.publisherINFLIBNET Centre, Gandhinagaren_US
dc.subjectDocument Layout Analysisen_US
dc.subjectETD Layout Formaten_US
dc.subjectNumbering Styleen_US
dc.titleCase Study of Analyzing the Variety of ETD Layoutsen_US
Appears in Collections:26th International Symposium ETD 2023

Files in This Item:
File Description SizeFormat 
8.pdf613.81 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.