I’m delighted to announce that a draft of Data Feminism has been posted on the MIT Press open-access website for online community review. We will be reviewing comments through January 7th, 2019. More information is available on the PubPub site.
What follows is the text of my talk at the 2017 MLA Annual Convention, slightly modified for the web. I spoke on a panel that showcased new forms of nineteenth-century digital scholarship. (Also featured were Mark Algee-Hewitt and Annie Swafford). An essay-length version of the talk is in the works, but since I’m heads-down on my book manuscript at the moment, I’m posting my remarks here.
NB: If you’d like to read some of the more design-oriented work I’ve done on the subject, please see my paper from the 2016 IEEE VIS conference, co-authored with Catherine D’Ignazio, that outlines our principles of feminist data visualization and specifies some design questions for engaging them.
This talk departs from a seemingly simple question: “What is the story we tell about the origins of modern data visualization?” And as a set of follow-ups, “What alternate histories might emerge, what new visual forms might we imagine, and what new arguments might we make, if we told that story differently?”
To begin to answer these questions, I’ll focus on the work of one visualization designer from the nineteenth century, Elizabeth Palmer Peabody, whose images are rarely considered in the standard story we tell about the emergence of modern visualization techniques. When they are mentioned at all, they are typically described as strange—and sometimes even as failures. You can see one of them just below.
To us today, accustomed to the charts and graphs of Microsoft Excel, or the interactive graphics that we find on The New York Times (dot com) on any given day, we perceive schemas like this as opaque and illegible. They do none of the things that we think that visualization should do: be clear and intuitive, yield immediate insight, or facilitate making sense of the underlying data. But further questions remain: why have we become conditioned to think that visualization should do these things, and only these things? How has this perspective come to be embedded in our visual culture? And most importantly for us here today, what would it mean if we could view images like these, from the archive of data visualization, instead as pathways to alternate futures? What additional visual schemas could we envision, and what additional stories could we tell, if we did?
So I’m going to inhabit my method, and frame my talk today in terms of an alternate history. First, I’ll walk you through the visual schema that you see above-left, proposed by Peabody in 1856. Then I’ll talk about some of the more speculative work I’ve done in attempting to reimagine her schema in both digital and physical form. And then I’ll try to explain what I’m after by describing this work, as you saw in the title of this talk, as feminist—
More specifically, I’ll show how Peabody’s visual method replaces the hierarchical mode of knowledge transmission that standard visualization techniques rest upon with a more horizontal mode, one that locates the source of knowledge in the interplay between viewer, image, and text. I’ll demonstrate how this horizontal mode of knowledge transmission encourages interpretations that are multiple, rather than singular, and how it places affective and embodied ways of knowing on an equal plane with more seemingly “objective” measures. And finally, I’ll suggest that this method, when reimagined for the present, raises the stakes for a series of enduring questions—about the issue of labor (and its relation to knowledge work), the nature of embodiment (and how it might be better attached to digital methods), and the role of interpretation (and how is not only bound to perception, but also design).
But I think that’s enough of a preamble. So, to begin.
Elizabeth Palmer Peabody was born in Massachusetts in 1804. Today, she is most famous for her proximity to more well-known writers of the American Renaissance, such as Emerson and Hawthorne. (Hawthorne was actually married to her sister, Sophia). But Peabody had impact in her own right: the bookstore that she ran out of her home, in Boston, functioned as the de facto salon for the transcendentalist movement. She edited and published the first version of Thoreau’s essay on civil disobedience, which appeared in her Aesthetic Papers, in 1849. And interestingly, she’s also is credited with starting the first kindergarten in the United States.
But in the 1850s, Peabody set off to ride the rails. She traveled as far north as Rochester, NY; as far west as Louisville, KY; and as far south as Richmond, VA, in order to promote the US history textbook she’d recently published: A Chronological History of the United States. Along with boxes of books, Peabody traveled with a fabric roll the size of a living room rug, which contained floor-sized versions of charts like the one in the image above, which I’ll tell you only now is a visualization of the significant events of the seventeenth century, as they relate to the United States. (This image that you see is a plate from the textbook, measuring, at most, 3 inches square).
In her version of a sales pitch, Peabody would visit classrooms of potential textbook adopters, unroll one of her “mural charts” (as she called them) on the floor, and invite the students to sit around it to contemplate the colors and patterns they perceived.
Peabody’s design was derived from a system developed in Poland in the 1820s, which employed a grid, overlaid with shapes and colors, to visually represent events in time. At left you see, on the bottom left of the page, a numbered grid, with each year in a century marked out in its own box. On the top right you see how each box is subdivided. So, top left corner for wars, battles, and sieges; top middle for conquests and unions; top right for losses and divisions, and so on. And shapes that take up the entire box indicate an event of such magnitude or complexity that the other events in that year didn’t matter.
The basic exercise was to read the narrative account in Peabody’s textbook, and then convert the list of events that followed, like the one you see below-left, into graphical form. And I should note, at this point, that the events are color-coded, indicating the various countries involved in a particular event.
So now I’ll return to the original chart (at left), and you can see now, hopefully, that England is red, the Americas are orange, and the Dutch are teal—those are the colors that dominate the image. The French get in on the act, too, in blue. And if you cross-reference the chart to the table of events, you can see, for instance, the founding of Jamestown in 1607; and the settlement of Plymouth in 1620—that’s the red on the right—and, interestingly, the little teal box stands for the first enslaved Africans arriving in Virginia at that same time.
But I think it’s safe to say that no one in this room could have known this without me explaining how to interpret the chart. And for researchers and designers today, who champion the clarifying capacity of visualization; or for those who believe that datavis is best deployed to amplify existing thought processes—for such people, Peabody’s design would be a complete and utter failure. For Peabody, though, this near-total abstraction was precisely the point. Her charts were intended to appeal to the senses directly, to provide what she called “outlines to the eye.” Her hope was that, by providing only the mental outline of history, and by insisting that each student interpret the outline of history for herself, she would conjure her own historical narrative, and in that way, produce historical knowledge for herself.
So this is where the feminist aspects of Peabody’s approach to data visualization begin to emerge. Anticipating some of the foundational claims of feminist theory, Peabody’s schema it insists upon a multiplicity of meanings, and locates knowledge in the interplay between viewer, image, and text. Hers is a belief in visualization, not as clarifying or illuminating in its own right, not as evidence or proof of results, but as a tool in the process of knowledge production.
At this point, it also bears mention that for Peabody, the creation of knowledge took place through a second mode: through the act of creating the images themselves. Peabody also printed workbooks, with sheets like the one you can see at left, and she envisioned the exercise, ideally, as one not merely of cross-referencing events to their visual representation, but of constructing the images they would then study. So you can see one student’s attempt below-left. And below on the right is another, by someone who appears to have given up all together. (These images come from a blog post by the Beinecke, but I’ve traveled to several archives across the northeast U.S. and seen the same thing). I used to show these images to get a laugh, but I know now, because at this point I’ve tried it a number of times, that this method is a very hard thing to actually do.
But that seems to be both a liability of the form, and also the point. Peabody devised her method at a moment of great national crisis—the decade leading up to the Civil War—and she recognized that the nation’s problems would be difficult to solve. Her goal was to prompt an array of possible solutions—one coming from the creator of each chart. And her hope was that, by designing new narratives of the past, her students would also imagine alternate futures.
In fact it was this aspect of Peabody’s system—the idea that, by insisting that each student participate in the creation of each chart, they would also each create their own interpretations of them, that seeded my desire to reimagine these charts for the web.
I’d already recognized, in Peabody’s method of first scanning a list of events, then identifying the type of each event and the nations involved, and then plotting it on the chart, a method that was strikingly procedural. And the process of cross-referencing between text and image seemed to me, even more, like the form of interaction you see all over the web. But it was her insistence on the need to create the charts in order to create knowledge that confirmed my decision to pursue the project. What would we discover, about the past, or about the present, if we were to make Peabody’s scheme accessible to students and scholars today?
This was the point that I enlisted a graduate student in Human-Computer Interaction (HCI), Caroline Foster, who helped design the really beautiful site that you’ll see in a minute. And I was also fortunate to be able to work with two very talented computer science undergrads, Adam Hayward and Shivani Negi, who helped with the technical implementation. And together, we set about creating this site. (NB: Erica Pramer, who graduated last year, helped with an earlier version of the project).
Below you can see some screen-shots from the four interactive modes: “Explore” (top left), which shows you how the events and their visual representations align; “Learn” (top right), which guides you through Peabody’s original lesson; “Play” (bottom left), which gives you a blank grid to color in; and “Compare” (bottom right), which generates a timeline on the basis of the current dataset, and displays it alongside the data as it’s stored in CSV form, as it’s listed in the text, and as it’s displayed on Peabody’s grid. (There are also some narrative components of the site, and while we’re still fixing bugs and tweaking the text, we encourage you to explore the site and send us your feedback).
There’s a lot more to be said about the eerily object-oriented way in which Peabody structures her data, and I touch on that in one of the interactive elements of the site, but in terms of my current argument, the salient point is how the clarity and legibility of the timeline, by contrast to Peabody’s grid, shows just how conditioned we’ve become to certain ideas about what visualization should and should not do. To us today, accustomed to a standard typology of visual forms, we can form a really quick rank order of these images. The timeline is better, we say, because it yields immediate insight. Who the heck knows what is going on with that grid?!
But what if we understood the purpose of visualization differently? What if we were supposed to stop and think hard about what we were seeing, and what it meant?
For my part, I’ve been thinking about Peabody’s charts in their most striking instantiation—those mural charts on the floor—and how they demand a different mode of sensory engagement altogether. They replace the decorative and utilitarian function of a rug with an experience designed to generate knowledge, and in so doing, they require viewers to reconsider the actual position of their bodies in relation to their objects of knowledge.
This feature has prompted me to undertake a second project to implement a floor-size version of Peabody’s charts, which you can see at lest in some very early phases. (You can read about our current progress on the DH Lab blog). So in the top image, you see a matrix composed of 900 individually addressable LEDs, and in the bottom, you see the touch interface that we’re developing, which makes use of conductive tape and neoprene in an almost matrix-like keyboard interaction, so that you’ll be able to toggle each square off and on. And as my students and I carefully measure each strip of tape, and solder each part of each circuit together, I think back to Peabody’s own process of fabrication.
When I mentioned that she made floor-sized versions of the charts as a sort of marketing ploy, what I didn’t mention was that, as an additional incentive, she promised a handmade chart to any classroom that purchased the book. Writing to a friend in 1850, Peabody revealed that she was “aching from the fatigue” of making the charts for each school. She described how she would stencil shapes and colors onto a large piece of fabric, and how a single one took her 15 hours. As you can see from the text of the letter, she yearned for her book to become profitable so that she could hire someone to “do this drudgery for [her].”
It speaks both to poor book sales, and to the perceived lack of value of the charts, that none have been preserved. But what we do have, in letters like these, is evidence of the actual physical labor, as well as of the knowledge work, involved in producing these charts. And more specifically, with her references to the fabric and to the drudgery, it’s labor in feminized form.
Let me take some time unpack this, because it’s this observation that I want to end with, as I return to the question of alternate histories, and their impact on how we produce knowledge.
A lot of people—or, well, the handful who have ever thought to comment on Peabody’s work—observe that Peabody’s charts look like Mondrian paintings. And it’s true that, in their abstraction, they evoke the modernist grid. But thinking about the feminized labor of making the charts brings to mind a second point of reference, which is quilting.
What you see here (above) are two quilts from the area of Alabama known as Gee’s Bend. These quilts, created by a close-knit community of African American women in years that span the twentieth century, have in fact recently been posited as offering an alternate genealogy of modernism. This genealogy derives from folk art and vernacular culture, and centers this community of women who would otherwise be placed far to the side in the story of modernist art.
You might already be starting to guess where I am going with this line of thought—Peabody in relation to standard accounts of data visualization; the women of Gee’s Bend in relation to Mondrian. And it’s true—women’s work of all kinds, be it education or quilting, has long been excised from the dominant accounts of their fields.
But there’s another aspect of this comparison that I want to draw out—which is how both the quilts of Gee’s Bend, and charts of Elizabeth Peabody, offer alternative systems of knowledge-making. Both employ shape and color to visually represent events in the world. And both, also, rely upon sense perception—and more specifically, the tactile experiences of the body—in order to assimilate those shapes and colors into knowledge. In her textbook, Peabody even talks about things like pleasure, and she emphatically rejected the idea of a single interpretation of history in favor of an exchange between the subject and object of knowledge. For Peabody, the abstraction of the grid was preferable to a more mimetic form because it “left scope for a little narration.” In other words, she believed that if her visualizations provided the contours of history, the viewer could then—both literally and figuratively—color them in.
And therein lies her principal lesson: about what information constitutes knowledge, about how that knowledge is perceived, and about who is authorized to produce it. That, to me, is why this project—the historical part and the technical one—is a feminist one. Because it brings renewed attention to the role of interpretation, and to the modes of knowing outside of what we’d typically consider to be visualizable, such as intuition, or affect, or embodiment.
As humanists, we’ve been trained to recognize the value of these alternate forms of knowledge, just as we’ve been trained to register the people, like Peabody, who stand on the periphery of the archive. These are often people whose stories we would otherwise lack sufficient evidence to be able to bring to light, whether it’s evidence in the form of data, or just the archival record.
And this is where I see a convergence in the historical and theoretical work surrounding the archive, and the more technical, but equally theoretical work relating to data and its visual display. It’s where I think humanists have real lessons to teach those who design visualizations—and as I begin to speak more with designers and researchers outside of the humanities, I’m increasingly convinced of this fact. But it’s also a space where I think we, as digital humanists, could make an intervention in our own scholarly fields. It’s not only by taking digital methods and applying them to humanistic questions; or even what I’ve demonstrated here today: how humanistic theories allow us to better understand certain digital techniques. Rather, what I’d like to see as we “renew the networks” of nineteenth-century digital studies, to borrow a phrase that Alison employed to introduce this session, is to insist on a richer intersection of the digital with the humanities, as both critical and creative, theoretical and applied, both the contours and the coloring in. That’s what I envision as the shape of things to come.
What follows is the transcript of my talk, “Visualization as Argument,” presented at the Genres of Scholarly Knowledge Production conference held at the Umea University HUMlab in December 2014. The talk is adapted from an essay-in-progress about the theoretical work of some of the earliest data visualization designers in the United States, who also happened to be pioneering educators and champions (to varying degrees) of women’s rights.
My research is concerned, most generally, with the cultural and critical dimensions of data visualization. I’m at work on a book about the history of data visualization, from the eighteenth century to the present. I also design visualizations for a range of scholarly functions (and I’d be happy to talk more about this during the discussion). But in other work—and this is what I’ll be speaking about today—I attempt to theorize the function of visualization, both in terms of its ability to reframe humanities data, however that may be construed, and in terms of its ability to call attention to the various processes of scholarly knowledge production.
What you see here, on the floor [NB: HUMlabX has an amazing floor screen, on which many of these images were displayed], is a visualization of the major events of the seventeenth century United States, designed by Elizabeth Palmer Peabody in 1856. Peabody’s visualization work, and the ideas that underlie it, will serve as my primary example today of how the practice of visualization can demonstrate not only what knowledge we, as scholars, can produce, but also how we come to produce it. As I hope you’ll soon see, this work offers an incredibly generative limit case for thinking through the various functions of visualization, epistemological and otherwise.
For one, this image is—at least initially—totally impenetrable. For another, this impenetrability—or this effect of impenetrability—is deliberate. Peabody had very specific ideas about how visualizations, if properly designed, could facilitate knowledge production in the interplay between viewer and image. (This is sort of a proto-agential realism, if you will). And finally, as a female knowledge worker of feminism’s first wave, Peabody’s example helps to illuminate the feminist and affective dimensions of data visualization. So in each of these ways—Peabody’s emphasis on the importance of interpretation, her insistence on the two-way exchange between subject and object of knowledge, and on the very real women’s work that went into making these images—Peabody’s example helps us become more attuned to the epistemological, ontological, and political arguments that inform the range of visualizations that we presently encounter in our everyday lives.
But first, some background.
Elizabeth Peabody was born in Massachusetts in 1804. Today, she is probably most famous for her proximity to more well-known writers of the American Renaissance, such as Henry David Thoreau and Nathaniel Hawthorne. (Hawthorne was actually married to her sister). But Peabody had impact in her own right. The bookstore that she ran out of her home in Boston functioned as the de facto salon for the transcendentalist movement. She also edited the transcendentalist magazine, The Dial, for some of its most pivotal years. And also, like many women of the nineteenth century who didn’t have great career options, Peabody became an educator. In fact, she is credited with starting the first kindergarten in the United States.
But what you see here, this Mondrian-looking thing on the floor [see image above], is not designed for kindergarteners. I mentioned just a minute ago that it’s a visualization of the significant historical events of the seventeenth century United States. Peabody created this image in 1856 for her textbook, A Chronological History of the United States. Her design was derived from a system developed in Poland in the 1820s, which employed a grid, overlaid with shapes and colors, to visually represent events in time. At left, you see a numbered grid, with each year in a century marked out in its own box.
At left you see how each box is subdivided. So, top left corner for wars, battles, and sieges; top middle for conquests and unions; top right for losses and divisions, and so on. These events were color-coded according to which country was involved. Shapes that take up the entire box indicate an event of such magnitude or complexity that the other events in that year didn’t matter.
The idea was to read the historical account in Peabody’s textbook, and then convert the summary table that followed into graphical form. (Below you can see the table that corresponds to the chart on the floor). The US is orange, and red is Spain—those are the colors that dominate the image. Sweden is also here, in that “bluish-green” you occasionally see. If you cross-reference the chart to the table of events, you can see, for instance, that New Sweden was conquered by New Netherlands in 1655. That’s what the blue-green box with the triangular shading, in fact, represents.
I think it’s safe to say that no one in this room could have known this without me explaining how to interpret the chart. And for current visualization luminaries like Edward Tufte, who champions the clarifying capacity of visualization; or for Stuart Card, who authored the textbook that we use in data visualization courses back at Georgia Tech, who describes datavis as amplifying existing thought processes—for these men, this design would be a complete and utter failure. But for Peabody, the image’s near-total abstraction was precisely the point. Her charts were intended to appeal to the senses directly, to provide “outlines to the eye.” Her hope was that, in requiring her viewers to interpret the image, they would conjure the narrative of history, and therefore produce historical knowledge, for themselves. More than that, even, Peabody intended to evoke pleasure—and she actually uses that word in her account. So one of the things I’d like us to think about today is the affective work that visualization can do.
I said earlier that Peabody presents a limit case, of sorts, and I’m going to return to that notion now. Because all visualizations, of course, entail a degree of abstraction. In fact, that may the crucial feature that separates visualization from other, more mimetic strategies of representation. But Peabody, here, pushes the idea of abstraction to its limit, and in the process, forces us to ask what forms of knowledge we can, through visualization, actually, in fact, produce.
So, what forms of knowledge do such techniques actually produce? I would argue that, more than any specific conclusion prompted by a single image, visualization methods help us better understand the process of knowledge production. Here, Peabody is again instructive, because she did not merely intend her images to be perceived; she intended them to be created and then perceived. So at left you can see a page from one the workbooks that Peabody printed and sold alongside her text. This is one student’s attempt to follow Peabody’s instructions.
At left, you see another, by someone who appears to have given up all together. (These particular images come from Yale’s Beinecke library, although I’ve encountered many similar attempts in my archival research). Informed by the particular historical context of the mid-nineteenth-century United States—with slavery not yet abolished, the union in disarray, and its future in the hands of the citizens—Peabody was quite insistent that her students create charts of their own. In Peabody’s mind, the act of coloring in the little triangles—the act of producing a personal image of history—would enhance that person’s ability to influence national politics. Admittedly, as a political stance, it’s a bit idealistic; but in terms of a theory of knowledge production, it’s ahead of its time. Peabody flattens the relationship between the putative producer of knowledge and its perceiver. Moreover, in placing the image at the center of a multiphase interpretive act, Peabody destabilizes the previously fixed secure between the subject and object of knowledge itself. In this way, Peabody points to what visualization, when conceived as a feminist method, might allow us to bring into view.
The final point I want to make about Peabody and her visualizations is about labor—and about some of the unseen arguments that can underlie acts of visual display. With this floor-screen configuration, and this particular image on it, it seems particularly apt to note that Peabody herself created a set of oversized charts that she would unroll onto the floor, inviting her students to sit around and study them.
There are a number of points to be made here, about labor, about craft, and about the increasingly rich history of female involvement in technical work, and we can talk more about these later. But I want to end with a return to method, and to the idea of method as argument. For Peabody, the abstraction of the grid was preferable to a more mimetic form because it “left scope for a little narration.” In other words, she believed that if her visualizations provided the contours of history, the viewer could then—both literally and figuratively—color it in. And therein lies her argument—about what constitutes knowledge, about how that knowledge is perceived, and about who is authorized to produce it. We can find some of these ideas in her writing, but most must be gleaned from the images themselves. Indeed, this is the work that visualization can do.
Data visualization is not a recent innovation. Even in the eighteenth century, economists and educators, as well as artists and illustrators, were fully aware of the inherent subjectivity of visual perception, the culturally-situated position of the viewer, and the power of images in general—and of visualization in particular—to convey arguments and ideas.
In this talk, I examine the history of data visualization in relation to feminist theory, which has also long attended to the subjective nature of knowledge and its transmission. Exploring the visualization work of three female educators from the nineteenth century, Emma Hart Willard, Almira Hart Lincoln Phelps, and Elizabeth Peabody, I show how we might recover these women’s contributions to the development of modern data visualization techniques. I contend, moreover, that by conceiving of data visualization as a feminist method, we might better understand its function—in the nineteenth century and today—as a way to present concepts, advance arguments and perform critique.
What follows is the talk I delivered on behalf of the TOME project team at the Digital Humanities 2014 conference. We’re in the process of writing up a longer version with more technical details, but in the interim, feel free to email me with any questions.
NB: For display purposes, I’ve removed several of the less-essential slides, but you can view the complete slidedeck here.
Just over a hundred years ago, in 1898, Henry Gannett published the second of what would become three illustrated Statistical Atlases of the United States. Based on the results of the Census of 1890– and I note, if only to make myself feel a little better about the slow pace of academic publishing today, eight years after the census was first compiled– Gannett, working with what he openly acknowledged as a team of “many men and many minds,” developed an array of new visual forms to convey the results of the eleventh census to the US public.
The first Statistical Atlas, published a decade prior, was conceived in large part to mark the centennial anniversary of the nation’s founding. That volume was designed to show the nation’s territorial expansion, its economic development, its cultural advancement, and social progress. But Gannett, with the centennial receding from view, understood the goal of the second atlas in more disciplinary terms: to “fulfill its mission in popularizing and extending the study of statistics.”
It’s not too much of a stretch, I think, to say that we’re at a similar place in the field of DH today. We’re moved through the first phase of the field’s development– the shift from humanities computing to digital humanities– and we’ve addressed a number of public challenges about its function and position in the academy. We also now routinely encounter deep and nuanced DH scholarship that is concerned digital methods and tools.
And yet, for various reasons, these tools and methods are rarely used by non-digitally-inclined scholars. The project I’m presenting today, on behalf of a project team that also includes Jacob Eisenstein and Iris Sun, was conceived in large part to address this gap in the research pipeline. We wanted to help humanities scholars with sophisticated, field-specific research questions employ equally sophisticated digital tools in their research. Just as we can now use search engines like Google or Apache Solr without needing to know anything about how search works, our team wondered if we could develop a tool to allow non-technical scholars employ another digital method– topic modeling– without needing to know how it worked. (And I should note here that we’re not the first to make this observation about search; Ben Schmidt and Ted Underwood, as early as 2010, have also published remarks to this end).
Given this methodological objective, we also wanted to identify a set of humanities research questions that would inform our tool’s development. To this end, we chose a set of nineteenth-century antislavery newspapers, significant not only because they provide the primary record of slavery’s abolition, but also because they were one of the first places, in the United States, where men and women, and African Americans and whites, were published together, on the same page. We wanted to discover if, and if so, how these groups of people framed similar ideas in different ways.
For instance, William Lloyd Garrison, probably the most famous newspaper editor of that time (he who began the first issue of The Liberator, in 1831, with the lines, “I will not equivocate — I will not excuse — I will not retreat a single inch — AND I WILL BE HEARD”) decided to hire a woman, Lydia Maria Child, to edit the National Anti-Slavery Standard, the official newspaper of the American Anti-Slavery Society. Child was a fairly famous novelist by that point, but she also wrote stories for children, and published a cookbook, so Garrison thought she could “impart useful hints to the government as well as to the family circle.” But did she? And if so, how effective– or how widely adopted– was this change in topic or tone?
The promise of topic modeling for the humanities is that it might help us answer questions like these. (I don’t have time to give a background on topic modeling today, but if you have questions, you can ask later). The salient feature, for our project, is that these models are able to identify sets of words (or “topics”) that tend to appear in the same documents, as well as the extent to which each topic is present in each document. When you run a topic model, as we did using MALLET, the output typically takes the form of lists of words and percentages, which may suggest some deep insight — grouping, for example, woman, rights, and husband — but rarely offer a clear sense of where to go next. Recently, Andrew Goldstone released an interface for browsing a topic model. But if topic modeling is to be taken up by non-technical scholars, interfaces such as this must be able to do more than facilitate browsing; they must enable scholars to recombine such preliminary analysis to test theories and develop arguments.
In fact, the goal of integrating preliminary analytics with interactive research is not new; exploratory data analysis (or EDA, as it’s commonly known) has played a fundamental role in quantitative research since at least the 1970s, when it was described by John Tukey. In comparison to formal hypothesis testing, EDA is more, well, exploratory; it’s meant to help the researcher develop a general sense of the properties of his or her dataset before embarking on more specific inquiries. Typically, EDA combines visualizations such as scatterplots and histograms with lightweight quantitative analysis, serving to check basic assumptions, reveal errors in the data-processing pipeline, identify relationships between variables, and suggest preliminary models. This idea has since been adapted for use in DH– for instance, the WordSeer project, out of Berkeley, frames their work in terms of exploratory text analysis. In keeping with the current thinking about EDA, WordSeer interweaves exploratory text analysis with more formal statistical modeling, facilitating an iterative process of discovery driven by scholarly insight.
EDA tends to focus on the visual representation of data, since it’s generally thought that visualizations enhance, or otherwise amplify, cognition In truth, the most successful visual forms are perceived pre-cognitively; their ability to guide users through the underlying information is experienced intuitively; and the assumptions made by the designers are so aligned with the features of their particular dataset, and the questions that dataset might begin to address, that they become invisible to the end-user.
So in the remainder of my time today, I want to talk through the design decisions that have influenced the development of our tool as we sought to adapt ideas about visualization and EDA for use with topic modeling scholarly archives. In doing so, my goal is also to take up the call, as recently voiced by Johanna Drucker, to resist the “intellectual Trojan horse” of humanities-oriented visualizations, which “conceal their epistemological biases under a guise of familiarity.” What I’ll talk through today should, I hope, seem at once familiar and new. For our visual design decisions involved serious thinking about time and space, concepts central to the humanities, as well as about the process of conducting humanities research generally conceived. So in the remainder of my talk, I’ll present two prototype interface designs, and explain the technical and theoretical ideas that underlie each, before sketching the path of our future work.
Understanding the evolution of ideas– about abolition, or ideology more generally– requires attending to change over time. Our starting point was a sense that whatever visualization we created needed to highlight, for the end-user, how specific topics–such as those describing civil rights and the Mexican-American War, to name two that Lydia Maria Child wrote about– might become more or less prominent at various points in time. For some topics, such as the Mexican-American War, history tells us that there should be a clear starting point. But for other topics, such as the one that seems to describe civil rights, their prevalence may wax and wane over time. Did Child employ the language of the home to advocate for equal rights, as Garrison hoped she would? Or did she merely adopt the more direct line of argument that other (male) editors employed?
To begin to answer these questions, our interface needed to support nuanced scholarly inquiry. More specifically, we wanted the user to be able to identify significant topics over time for a selected subset of documents– not just in the entire dataset. This subset of documents, we thought, might be chosen by specific metadata, such as newspaper title; this would allow you to see how Child’s writing about civil rights compared to other editors work on the subject. Alternately, you might, through a keyword search, choose to see all the documents that dealt with issues of rights. So in this way, you could compare the conversation around civil rights with the one that framed the discussion about women’s rights. (It’s believed that the debates about the two issues developed in parallel, although often with different ideological underpinnings).
At this point, it’s probably also important to note that in contrast to earlier, clustering-based techniques for identifying themes in documents, topic modeling can identify multiple topics in a single document. This is especially useful when dealing with historical newspaper data, which tends to be segmented by page and not article. So you could ask: Did Child begin by writing about civil rights overtly, with minimal reference to domestic issues? Or did Child always frame the issue of civil rights in the context of the home?
Our first design was based on exploring these changes in topical composition. In this design, we built on the concept of a dust-and-magnets visualization. Think of that toy where you could use a little magnetized wand to draw a mustache on a man; this model treats each topic as a magnet, which exerts force multiple specks of dust (the individual documents). (At left is an image from an actual dust-and-magnets paper).
In our adaptation of this model, we represented each newspaper as a trail of dust, with each speck– or point– corresponding to a single issue of the newspaper. The position of each point, on an x/y axis, is determined by its topical composition, with respect to each topic displayed in the field. That is to say– the force exerted on each newspaper issue by a particular topic corresponds to the strength of that topic in the issue. In the slide below, you can see highlighted the dust trail of the Anti-Slavery Bugle as it relates to five topics, including the civil rights and women’s rights topics previously mentioned. (They have different numbers here). I also should note that for the dust trails to be spatially coherent, we had to apply some smoothing. We also used color to convey additional metadata. Here, for instance, each color in a newspaper trail corresponds to a different editor. So by comparing multiple dust-trails, and by looking at individual trails, you can see the thematic differences between (or within) publications.
Another issue addressed by this design is the fact that documents are almost always composed of more than two topics. In other words, for the topics’ force to be represented most accurately, they must be arranged in an n-dimensional space. We can’t do that in the real world, obviously, where we perceive things in three dimensions; let alone on a screen, where we perceive things in two. But while multidimensional information is lost, it’s possible to expose some of this information through interaction. So in this prototype, by adjusting the position of each topic, you can move through a variety of spatializations. Taken together, these alternate views allow the user to develop an understanding of the overall topical distribution.
This mode also nicely lends itself to our goal of helping users to “drill down” to a key subset of topics and documents: if the user determines a particular topic to be irrelevant to the question at hand, she can simply remove its magnet from the visualization, and the dust-trails will adjust.
This visualization also has some substantial disadvantages, as we came to see after exploring additional usage scenarios. For one, the topical distributions computed for each newspaper are not guaranteed to vary with any consistency. For instance, some topics appear and disappear; others increase and decrease repeatedly. In these cases, the resultant “trails” are not spatially coherent unless smoothing is applied after the fact. This diminishes the accuracy of the representation, and raises the question of how much smoothing is enough.
Another disadvantage is that while the visualization facilitates the comparison of the overall thematic trajectories of two newspapers, it is not easy to align these trajectories– for instance, to determine the thematic composition of two newspapers at the same point in time. We considered interactive solutions to this problem, like adding a clickable timeline that would highlight the relevant point on each dust trail. However, these interactive solutions moved us further from a visualization that was immediately intuitive.
At this point, we took a step back, returning to the initial goal of our project: facilitating humanities research through technically-sophisticated means. This required more complex thinking about the research process. There is a difference, we came to realize, between a scholar who is new to a dataset, and therefore primarily interested in understanding the overall landscape of ideas; and someone who already has a general sense of the data, and instead, has a specific research question in mind. This is a difference between the kind of exploration theorized by Tukey, and a different process we might call investigation. More specifically, while exploration is guided by popularity—what topics are most prominent at any given time—investigation is guided by relevance: what topics are most germane to a particular interest. We wanted to facilitate both forms of research in a single interface.
With this design, at left, it’s time that provides the structure for the interface, anchoring each research mode– exploration and investigation– in a single view. Here, you see the topics represented in “timeline” form. (The timeline-based visualization also includes smooth zooming and panning, using D3’s built-in zoom functionality). The user begins by entering a search term, as in a traditional keyword search. So here you see the results for a search on “rights,” with each topic that contains the word “rights” listed in order of relevance. This is like the output of a standard search engine, like Google, so each topic is clickable– like a link.
Rather than take you to a web page, however, clicking on a topic gets you more information about that topic: its keywords, its overall distribution in the dataset, its geographical distribution, and, eventually, the documents in the dataset that best encapsulate its use. (There will also be a standalone keyword-in-context view).
Another feature under development, in view of our interest in balancing exploration and investigation, is that the height–or thickness- of any individual block will indicates its overall popularity. (We actually have this implemented, although it hasn’t yet been integrated into the interface you see). For example, given the query “rights,” topic 59, centered on women’s rights, represented in blue at the top right, may be most relevant– with “rights” as the most statistically significant keyword. But it is also relatively rare in the entire dataset. Topic 40, on the other hand, which deals with more general civil and political issues, has “rights” as a much less meaningful keyword, yet is extremely common in the dataset. Each of these topics holds significance for the scholar, but in different ways. Our aim is to showcase both.
Another feature to demonstrate is a spatial layout of topic keywords. In the course of the project’s development, we came to realize that while the range of connotations of individual words in a topic presents one kind of interpretive challenge, the topics themselves can at times present another– more specifically, when a topic includes words associated with seemingly divergent themes. So for instance, in T56, the scholar might observe a (seemingly) obvious connection, for the nineteenth-century, between words that describe Native Americans and those that describe nature. However, unlike the words “antelope” or “hawk,” the words “tiger” and “hyena,” also included in the topic, do not describe animals that are native to North America. Just looking at the word list, it’s impossible to tell whether the explanation lies in a new figurative vocabulary for describing native Americans, or whether this set of words is merely an accident of statistical analysis.
So here, on the left, you see a spatial visualization of the topic’s keywords using multidimensional scaling, in which each keyword is positioned according to its contextual similarity. Here, the terms “indian”, “indians”, and “tribes” are located apart from “hyena”, “tiger”, and “tigers”, which are themselves closely associated. The spatial layout suggests a relatively weak connection between these groups of terms. For comparison, at right is a spatial visualization for a topic relating to the Mexican-American War, in which terms related to the conduct of the war are spatially distinguished from those related to its outcome.
But returning, for a minute, to the overall view, I’ll note just that there are limitations to this interface as well, owing to the fact of translating textual and temporal data into a spatial view. Through our design process, though, we came to realize that the goal should not be to produce an accurate spatial representation of what is, after all, a fundamentally non-spatial data. Rather, our challenge was to create a spatial transformation, one that conveyed a high density of information while at the same time allowed the scholar to quickly and easily reverse course, moving from space back to the original, textual representation.
Our project is far from concluded, and we have several specific steps we plan to accomplish. In addition to implementing the information about specific topics, our most pressing concern, given our interest in moving from text to space and back to text again, is to implement the KWIC view. We also plan to write up our findings about the newspapers themselves, since we believe this tool can yield new insights into the story of slavery’s abolition.
But I want to end with a more theoretical question that I think our visualization can help to address– in fact, one that our interface has helped to illuminate without our even trying.
I began this presentation by showing you some images from Henry Gannett’s Statistical Atlas of the United States. You’ll notice that one of these images bears a striking similarity to the interface we designed. Believe it or not, this was unintentional! We passed through several intermediary designs before arriving at the one you see, and several of its visual features: the hexagon shape of each blog, and the grey lines that connect them, were the result of working within the constraints of D3. But the similarities between these two designs can also tell us something, if we think harder about the shared context in which both were made.
So, what do we have in common with Henry Gannett, the nineteenth century government statistician? Well, we’re both coming at our data from a methodological perspective. Gannett, if you recall, wanted to elevate statistics in the public view. By integrating EDA into our topic model exploration scheme, our team also aims to promote a statistical mode of encountering data. But that I refer to our abolitionist newspaper data as “data” is, I think, quite significant, because it helps to expose our relation to it. For antislavery advocates at the time– and even more so for the individuals whose liberty was discussed in their pages– this was not data, it was life. So when we are called upon, not just as visualization designers, but as digital humanities visualization designers, to “expose the constructedness of data”—that’s Johanna Drucker again, who I mentioned at the outset. Or, to put it slightly differently, to illuminate the subjective position of the viewer with respect to the data’s display, we might think of these different sets of data, and their similar representations—which owe as much to technical issues as to theoretical concerns–and ask what about the data is exposed, and what remains obscured from view. That is to say, what questions and what stories still remain for computer scientists, and for humanities scholars, working together, to begin to tell?
On March 15th, 2014, I participated in a roundtable, “Networks and the Commons,” at C19: The Society of Nineteenth-Century Americanists biennial conference. My co-panelists were Ryan Cordell, Ellen Gruber Garvey, Kristen Doyle Highland, and Joanne van der Woude. (Ed Whitley provided the opening remarks). What follows are my remarks, slightly expanded and reformatted for the web.
In bringing together these roundtable contributions, Ed [Whitley] observed that our shared assumption is that “the commons as a category of analysis is the product of networks.” This is true– I think– and in the remarks we’ve heard from Ryan [Cordell], Ellen [Gruber Garvey], and Kristen [Doyle Highland], we can already begin to see the range in the kinds of networks that helped to constitute a printed “commons” in the nineteenth-century United States. These are networks of publication and of reprinting, of the circulation of objects and ideas. And in each of these cases, the notion of the network allows us to name—and in some instances, to visualize—an otherwise nebulous set of relations: transmission, circulation, influence, and exchange, to identify just a few.
So what I want to do in my time today is first to briefly describe– and then begin to theorize– a system that I’ve been developing at the Digital Humanities Lab, in collaboration with Jacob Eisenstein, an assistant professor of Interactive Computing, and Iris Sun, a graduate student in Digital Media, also at Georgia Tech. We’re in the process of building a web-based tool that will allow scholars to trace the transmission of ideas across social networks and over time. Our archive (or what others would call a dataset) is a set of nineteenth-century abolitionist newspapers from across the United States. We chose these papers because of the unusual diversity of their authorship, and for the intensity of the debates—social and cultural, as well as political—that took place on their pages. These newspapers, as this audience knows well, were one of the few places where men and women, African and Anglo-Americans, Northerners and Southerners, US citizens and those from abroad, could contribute their views about how to end slavery. And while these writers were united in their common goal, they often disagreed about how best to achieve it.
[SLIDE TK] What you see on the left is a prototype for the design of an interface to help scholars explore this set of newspapers according to the themes expressed in their pages. It relies on a computational technique called topic modeling, which allows a particular newspaper issue’s themes, also called topics, to be automatically identified and summarized. So if you had a general question like, “How did the discourse surrounding voting rights change in the wake of the 1840 Anti-Slavery convention?” (This was when the American Anti-Slavery Society split in two over the issue of whether women should be granted full membership rights). Or, if you wanted to begin to address a related question: “Did the women’s rights movement borrow language from the contemporaneous anti-slavery campaign?” you might type “rights” into the search box at the top of the page, and see what topics show up. We’re currently developing the ability to sort the topics by relevance and popularity, and we’ll also eventually add the ability to link back to the original texts. I’d be happy to give more details about the project during the discussion, or even after, if people are interested in hearing more.
I wanted to present the project to you today, though, even if in an abbreviated form, because it illustrates one way of thinking through the constitution– through a network of print– of an ideological commons, of sorts. By looking at the relative positions and popularity of related topics in these newspapers, we can get a basic sense of the origin, evolution, and circulation of anti-slavery ideas. These newspapers, viewed through this tool, also complicate the notion of the commons in productive ways, revealing any shared ideology to be constituted by a continuously changing set of printed texts; of the figures who wrote them; and of the issues, events– and, of course– people who were written about. This visualization method, moreover, draws attention to those people, institutions, ideas, and materials that remain outside of, or obscured by the commons. These other, less visible networks also influenced the printed record, and in my remaining minutes, I’d like to begin to explain how.
One such network is the social network of the editors who compiled these papers, and lobbied each other to print certain items (or not print them, as the case may be). In the Manuscripts and Archives Division of the New York Public Library, where I’ve been working for the past month, you can read the correspondence between Lydia Maria Child and her friend, the abolitionist Ellis Gray Loring. We know Child primarily for her 1824 novel, Hobomok, but between 1841 and 1843, she served as editor of the National Anti-Slavery Standard, the official newspaper of the American Anti-Slavery Society. Writing to Loring in March 1842, after a merger with the Pennsylvania Freeman, another anti-slavery newspaper, required her to publish certain amounts of its content, Child laments: “I cannot manage the paper at all as I would. Public documents of one kind or another crowd upon me so, and since the union with the Freeman I am flooded with communications, mostly of an ordinary character” (see image above). She admits to rewriting almost all of the content she receives from the Freeman in order to make more room for her own editorials, but even then, she cannot find enough space. “I fear to injure the interest of the cause and the paper by omission!” she exclaims.
So here you have an example of a network of influence that operates in the negative. Child’s own arguments—those that she believes will best advance the abolitionist cause—never enter the commons of print. And while we can infer, on the basis of her other writings, what Child might have argued, the “three editorials” that she professes to have rather composed nevertheless remained unwritten, and are alluded to only in this personal correspondence—a network then, as now, that is (or at least until Edward Snowden) assumed to be private.
The library is taking steps to make its archival material more accessible. In a tacit acknowledgment of the private networks that bind together such correspondence, NYPL Labs has developed a tool for visualizing the metadata included in its finding aids. At left, for instance, is a network visualization of the NYPL’s Lydia Maria Child Papers. And you can see not only how subjects, such as “abolitionists,” encompass multiple archival holdings; but also how letter-writers, such as Child and Loring, are linked to each other, and to others whose papers the Library contains. You still don’t see evidence of the Pennsylvania Freeman’s editors, however– those women who Child characterized, in another letter, as a set of “fussy, ignorant old women” who sent her nothing but the “dullest communications, bad grammar, and detestable spellings.” This is likely because the women were not deemed important enough—by Child, clearly, but also by anyone else who corresponded with them, who might have been able to preserve their letters at the time, which would, in turn, have allowed them to enter the archive today. And all this is to say nothing of the actual enslaved men and women, whose liberty was being argued about in these newspapers’ pages, but who were so rarely given opportunity to speak for themselves, or to have that speech recorded in print.
In his 2011 essay, “Are Some Things Unrepresentable?” (which takes its title, in turn, from Rancière), media theorist Alex Galloway answers this question with an only slightly qualified yes. He writes: “Only one visualization has ever been made of an information network, for there can be only one,” and that visualization always represents the same thing: power.
Galloway’s point is about both politics and aesthetics, but above all, about representation. In contrast to data, which—at least, according to Galloway—is inherently formless, information is “almost tautologically bound up with the concept of form.” What we see, then, in a visualization of an information network—any visualization of an information network—is not the underlying data, but instead the network itself, which processes and shapes the data. In these networked, neoliberal times (or so the argument goes), power is distributed, with no single source of origin. This form of power is unrepresentable– deliberately so. But with a network visualization, we can show—in fact, the only thing we can show—is an abstraction: how power operates (answer: diffusely!); how power is generated and maintained.
I don’t think, however, that the response to this uniformity of network visualizations should be to abandon visualization tools. Rather, I think we might take a cue from nineteenth-century networks of print, networks that we, sitting here today, know so well. In nineteenth-century networks, as we have seen, the commons was constituted—and also contested—through the interplay of ideology and specificity. Visualizing these networks, even through similar means, facilitates a clearer understanding of what was– and who were— included in the commons, and what or who remained outside. It allows us to question the value and veracity of such visualizations, and also ask what they productively fail to show. Public opinion, as expressed in print, is perhaps the easiest way to trace the contours of this particular ideological commons, because it could be pushed through the network of newspapers I’ve just discussed. But individual artifacts—the contents of a private letter like Child’s, the texture and weight of its paper, or the keepsake its envelope enclosed—these things mattered too. As scholars in the digital age, we’re learning how to assimilate the model and the material, artifact and abstraction, readings distant and close. In the spirit of the commons, and of this conference, it’s now up to us to also make sure that our commons– our community– learns to look for this interplay too.
On Thursday, February 27th, 2014, I traveled to Yale to speak about the uses of data visualization, past and present. Trip Kirkpatrick, of the Instructional Technology Group, wrote up a great summary of my talk on the DH Working Group website. (My slides are embedded in that post). I’m hoping to write up a more formal version of my remarks soon.
I’m pleased to announce that my essay, “The Image of Absence: Archival Silence, Data Visualization, and James Hemings,’ has been published in the December 2013 issue of American Literature (85.4). You can read my essay, along with many other excellent contributions, here.
On November 20th, 2013, I presented a talk at the Penn Humanities Forum on the long arc of visual display. The abstract is as follows:
We live in what’s been called the “golden age” of data visualization, and yet, the graphical display of information has a long history, one that dates to the Enlightenment and arguably before. This talk will explore the origins and applications (both historical and contemporary) of data visualization techniques. Drawing from the fields of media history, digital humanities, and information visualization, Lauren Klein will introduce several techniques for data visualization, and reflect upon their uses—and their limits—in humanities research and teaching.
I’ve uploaded my slides (minus the embedded movies) to SlideShare, but they can also be viewed below:
On Wednesday, November 20th, I’ll be speaking at the Penn Humanities Forum about the origins and applications (both historical and contemporary) of data visualization techniques. The official abstract is as follows:
We live in what’s been called the “golden age” of data visualization, and yet, the graphical display of information has a long history, one that dates to the Enlightenment and arguably before. This talk will explore the origins and applications (both historical and contemporary) of data visualization techniques. Drawing from the fields of media history, digital humanities, and information visualization, Lauren Klein will introduce several techniques for data visualization, and reflect upon their uses—and their limits—in humanities research and teaching.
For more information, and to RSVP, click here.