Week 7: The Dangerous Art of Text Mining: A Methodology for Digital History
Week 7: The Dangerous Art of Text Mining: A Methodology for Digital History
Text Mining for the Researcher: The Main Point
The Dangerous Art of Text Mining: A Methodology for Digital History is a book published in September 2023 by Jo Guldi that explores the usage and understanding of text mining as it pertains to the field of historical researcher and analysis; Jo Guldi sets out to view how text mining, counting words themselves, helps us understand the frequency of language and the dangers still present within analysis by digital tools. The main theme of her work is creating a map to understand how researchers can take a digital, quantitative approach to history that creates unique interpretation, and instead provides a robustly accurate, original, and profound dimension to this complex discipline.
The Distinctiveness of Certain Eras
One of the chapters that I found most interesting was Guldi's chapter 8 in Part II: The Distinctiveness of Certain Eras. Within the chapter, Guldi wishes to understand how the development of language can be traced by the computing machines used for bit mining; Can a computer also discern and describe the differences of individual blocks of time (Guldi, 229)?
Figure 8.3,Temporally adjusted tf-idf or tf-ipf. Found on page 238. |
By engaging with the mathematics of distinction, Guldi pinpoints the material most useful for understanding the relative time at which forces of reason worked in the past to understand the social dynamics unfolding in discourse. The unique thing about the tf-ipf model, however, is that is models distiction as well, rather than just frequency.
Found on page 243. |
Guldi asks the question "If tf-ipf truly measures “significance” of a term within a period, how can a highly ranked word turn out to have been quite scarce in its period" (Guldi, 242)? The answer is found in the knowledge of analyzing the raw data. For example, the word "boycotting" is not a frequent word used in manuscripts surrounding the 19th century, but it is distinct in encapsulating tendency and understanding, giving it the highest ipf scoring (Guldi, 243).
The TF-IPF model is unique to me in the idea that it doesn't just look at frequency of words, but it also understands the demographic and intensity surrounding the discussion on a larger historical context. A tool like this would be extremely beneficial in an evaluation of history from the top, due to the lack of sources from "the below", or lower classes.
References
Guldi, Jo. The Dangerous Art of Text Mining: A Methodology for Digital History. Cambridge University Press, 2023.
Comments
Post a Comment