Recent Developments

Try out our new Topic Modeling Workbench site. The TMW allows anyone to generate a topic model for a collection of documents by simply uploading two files — a corpus file and a stopwords file — and pressing a button. It is built on top of David Mimno’s jsLDA project, which provides a JavaScript implementation of an LDA-based topic model algorithm. If you are a member of the UVA community, or if you have a guest account on UVaCollab, then you can log in and start working right away.

Screen Shot 2015-09-17 at 9.55.56 PM

General Principles

  • Focus on effects of big data on knowledge, as opposed to privacy and identity
  • E.g. What happens to models, categories, “interpretation,” traditional forms of reasoning?
  • What kinds of knowledge become legitimate and illegitimate in the system?
  • What kinds of questions can we now ask (and answer) that we could not before?
  • How can we critically asses data sets and algorithms that are employed in the development of conclusions for a particular subject domain?

Specific Examples

  • Brad Pasanek
    • What does a database of a million metaphors of mind tell us about the reality of literary periods, such as romanticism? What happens to our older methods and conclusions?
    • What can the implied social network of bibliographies tell us about the social context of knowledge production? What kind of truth does this information supply? Is it necessary to understand a text?
  • Lisa Messeri
    • How do astronomers construct an imagined world of astronomical phenomena – planets, galaxies – out of the petabytes of data they produce from instruments? What are the processes of selection and interpretation that enable this cultural practice to take place?
  • Bill Pearson
    • What kind of conclusions may one draw from big genomic data? How do we filter the various sources of data that produce our results? For example, sometimes the patterns we see are the result of how experiments are conducts, as opposed to the underlying genomic reality. Is there a systematic way to parse these layers of meaning from data sets?
  • Rafael Alvarado
    • What are the epistemological premises that go into our construction of data sets in the first place? What consequences do that have on the interpretation f results? For example, in text mining, we place a high emphasis on the co-occurrence of words within various containers, and we eliminate entirely the sequential structure of langage (e.g. its narrative quality). When we reveal non-random correlations in this way, what do they mean? What realities do they model or map?
  • Alison Booth
    • What can we learn about 19th-century British culture, and women’s lives, from a digitized collection of biographical anthologies? How does the method of prosopography – the extraction of general cultural facts from specific personal cases, such as biographies – change when the textual sources become digital and too large for any single person to read? What do the patterns extracted by algorithmic means possibly tell us about culture and society?

Leave A Reply