Thursday, 15 October 2015

Visualizing data's possibilities and limitations



N.B.: Pursuant to my presentation (and also to outline some of my dissertation’s foci, in accordance with public and accessible scholarship), I thought I would elaborate on the possibilities of data visualization methods, including some of the methods and ethical concerns they prompt due to basic research practices, like how to collect or filter the data I encounter.

I don't know if my presentation made this clear, but to make explicit my position (and for the benefit of any potential web readers), I want to borrow from our course reading by Jamie “Skye” Bianco in orienting my research and methods, because Bianco’s summarization of her writing succinctly summarizes the aims of this project: “Let me be clear: I am a digital/multimodal compositionist, a digital media practitioner, a feminist, and a critical media theorist whose ethics lie in progressive affiliation and collaborative social justice.” Sometimes the questions and minutiae that I raise during my research seem completely unnecessary, but one thing I’ve realized about my digital methods is that they democratize data, and allow very large and very small “things”1 to disclose information on equal terms. Although at times these large and small objects of analysis do not disclose information on equal terms, they are always addressed with the potential to speak on equal terms, and analysed accordingly. Thus I proposed in class that a more interactive version of Maggie Lee or Helen Jackson’s data visualizations can have very political ramifications: consider the implications of a visualization charting the correspondences made public during the Mike Duffy court case and scandal (or perhaps made public from a Freedom of Information Request). That could, hypothetically, provide a trail of legal culpability. Understanding how networks operate isn’t just the focus of actor-network theorists, but also of technological network security analysts and detectives who understand the trails emails leave behind, and how they are tracked. Digital humanities, then, can work towards Bianco’s “collaborative social justice” by providing, among other things, interactive visualizations that report situations and explore their circumstances (some of which occur in court rooms, in civil and criminal investigations that reflect a more literal interest in Bianco's "collaborative social justice").

For another brief example, consider Evan Solomon's piece on the “dead cat on the 2015 campaign trail”  in Canada's thankfully finished election. The dead cat controversy in this case, credited with changing the electoral tide this year, is the niqab and the racist rhetoric emerging from Canada’s Cultural Barbaric Practices Act (a law that is, unfortunately, as barbaric as its title implies). An adaptation of Mike Bostock, Shan Carter, and Matthew Ericson’s word clouds for the 2012 American National Conventions might produce interesting results if word clouds traced buzzwords throughout the 2015 Canadian election.  What if there was a visualization that tracked the media’s mentions of Mike Duffy, and compared it to the uses of the word niqab? Which words are most closely associated with Duffy, and which with the niqab? Who mentions which words, and who mentions them most often? Or perhaps more tellingly, who began mentioning which words first, and when/where did the volume rise? Did polls reflect these numbers in any way? This might be more useful when considering two more closely aligned topics, like how different CEOs are treated (Trump vs. Fiorina in the current Republican primaries, for example).

As Bianco points out, “tools may track and compile data around these questions, visualize and configure it through interactive interfaces and porous databases, but what then? What do we do with the data?” And perhaps more importantly, how is that data being compiled? What must I consider when collecting, curating, and filtering that data for analysis and public release? If, as Bianco reports, 87% of Wikipedia editors are males, is that reflected in the data Wikipedia publicizes? The New York Times reported these dynamics in 2011, and in February 2015, the Association for the Advancement of Artifical Intelligence released a study analysing gender dynamics throughout Wikipedia’s pages in six languages that found some pretty stark data: “men are indeed significantly more central in all language editions… except in the Spanish one where men and women are equally central” (5). Jenny Kleeman fleshes out the implications of these sorts of findings in The New Statesman, although explores some things quite problematically. I don't have the time to address the problems appropriately, although I wish I did. However, her summary of how Hedy Lamar’s oft-unknown but very important contribution to wireless communications technology figures into Wikipedia editing history complements the AAAI’s findings with an exploration of the cultural significance of the numbers that digital humanities can produce.


Morgan Currie’s article on the Feminism entry in Wikipedia is a great example of the importance of what data visualization can tell us about “a collaborative process, continually modifying within a digital environment” (224-5). For Currie, this collaborative editorial process forces an awareness of how the dynamics of “[a]n article can evolve precisely from conflict between editors who disagree, and [how] it may achieve its current state only after years of development” (225). But the collaborative editorial process in Wikipedia is just another specific framework for actor-network considerations, which she emphasizes by defining Wikipedia edits as controversy, defined by Tommaso Venturini as an actor-network event (Venturini qtd. in Currie 228-9). This idea of controversy, or even the basic action of editing on Wikipedia, is just another crystallization of Bennett’s “thing-power,” or the ability of numerous objects operating in various ways to determine a situation. In the case of Wikipedia, Currie does a pretty good job of summarizing how thing-power works in Wikipedia: “a researcher examines not only discussions between editors, but also looks at how wiki software, bots, admin hierarchy, and the wider architecture and mechanical nature of the Web affect disputes” (230), she argues on the most basic level (and also provides further elaboration throughout the section [230-]). 

Analysing a Wikipedia page, in other words, requires toggling between different considerations. Wikipedia’s general actions speak louder than the words hosted on their site, or even the editing of those words, at least when Wikipedia attempted to ban editors during a GamerGate “editing war” occurring throughout their pages; when an online encyclopedia assertively and unapologetically takes a stance in a cultural struggle over who has the power and voice in a community by actually silencing certain voices, it actively disenfranchises those voices from potential power, and begins mediating the event it is supposed to be recording.

To ignore Wikipedia’s explicit gestures in favour of subtler and softer (if not impressive) numbers would be problematically reductive and simplistic. And even Currie’s methodical visualizations run into one basic problem: if Currie’s analyses track editing activity, what happens when a whole group of perspectives is precluded from editing activity to begin with? Here is where I think Ian Bogost’s theories of the unit operation are quite helpful; Bogost proposes a similar “toggling” in the study of objects: one can see objects in terms of unit operations, or system operations. For Bogost, “[t]he difference between systems of units and systems as such is that the former derive meaning from the interrelations of their components, whereas the latter regulate meaning for their constituents” (4). System operations explain dynamics discursively, whereas unit operations are conceived as actors in networks and are consequently more aligned with Bennett’s concession of a thing’s agency. More interestingly, for Bogost, a system can be a unit operation itself: “[o]ften, systems become units in other systems” (5), especially if my analysis requires toggling between various “magnitudes of ‘unit’” (Bennett 227). So if a system can be considered as another unit acting within another larger system, Wikipedia’s censorship is no longer a blind spot in Currie’s data, but can actually become a different set of data to consider alongside Wikipedia’s editing activity.

So one visualization may track editing histories, but another visualization may track major hashtags or media buzz words that aren’t on Wikipedia, or grouping Wikipedia articles based on gender (topic, author, etc.), and seeing which get higher word counts, higher traffic categorizations, or nicer word associations, like a team of German and Swiss scholars began considering, or which projects are declined and which funded by the Wikimedia Foundation. 

Before I begin to filter my data, I always ask how that data has already been filtered, or even what data no longer exists. Like Bianco, my methods are “against the wielding of computation and code as instrumental, socially neutral or benevolent, and theoretically and politically transparent” (n. pag), because the data I choose can be highly politicized. It can, according to Sam-Chin Li, a government information librarian at U of T, be “altered or deleted without notice” and with little transparency. In fact, according to Anne Kingston, who interviewed Li and conducted a “months-long Maclean’s investigation,” concludes that “the government’s ‘austerity’ program… as well as its arbitrary changes to policy, when it comes to data […] has led to a systematic erosion of government records far deeper than most realize, with the data and data-gathering capability we do have severely compromised as a result.” So when I collect data, I retain skepticism over that data.

So my project, for example, might utilize the informative, interactive, and contextual capacities of data visualization to plot the economic relations between specific corporations (i.e., goods contracts) or even nations, but must always be aware of the fact that it’s often getting this information from press releases, or from legislation that guarantees information is public rather than information that is made transparently accessible. I might begin, in the case of my interest with Disney and Netflix, to chart different companies' contracts with Netflix by weighting their dollar amount, or their duration, or their market saturation in the visualization’s code -- how many movies and shows are being watched by how many people for how long? does this correlate at all with contract moves and bids, and what is the end result of Netflix's catalogue choices? I can plot the films presently available on Netflix, as well as those not on Netflix. I can tag genres, genders, races, time periods, or formats (16mm, digital, 16:9, 1080P, etc.). What is present, and what is absent? Is there a trend in what became absent? Is this perhaps a condition of a particular production company with particular values moving to a different streaming platform, like, say, Amazon, or Hulu, in light of Netflix gaining exclusive streaming rights to Disney works?

These kinds of methods and considerations can also allow me to chart the implications of when the RCMP, a government institution tasked with enforcing Canada's federal laws, contracted their "image rights management" to Disney between 1995-1999. As the Lawrence Journal reported in 1995:
Under the deal, any company that wants to produce Mountie-related souvenirs or other products will have to sign a licensing agreement with Disney Canada. [...] Disney will control the selection of designs, though the Mounties will retain ultimate veto power if they disapprove of a certain image. Each company awarded a license will pay a 10-percent licensing fee that will be split between Disney and the Mounted Police Foundation. Initially, the foundation will get 51 percent of the proceedings, and its share will rise to 55 percent after five years. (Section 9A)  
As a folklorist, I'm fascinated in the RCMP's interest in protecting its reputation. Blogger Will Pratt lists numerous circulations of the Mountie image: pornography, a WWF wrestler with a special move called the Mountie Mash, and Labatt's Malcolm the Mountie are the few he references. A folklorist sees this as a cultural remediation, a reception and then dissemination of a particular narrative, influenced by the teller in either small or large ways (and sometimes both). But a corporation sees this as bad image management. So, how can folkloric relations be plotted given these kinds of cultural and capitalist contexts?

There are some disturbing consequences of this relationship; amid massive budget cuts, Rabble reported in 2012 that the federal government was continuing to spend $11.3 million a year on the RCMP "'brand,' which is currently subject to 89 licensing agreements and memoranda of understanding as well as 22 national and international 'strategic partnership agreements entered into to promote the RCMP's image." In light of the Conservative Party’s austerity cuts that mean my job as a researcher/scholar is obfuscated, the decision to spend that much money on RCMP branding is suspicious, to say the least. It's not just taxpayer dollars that are affected, though: if any of you were a student at UBC, Simon Fraser, or University of Waterloo during or shortly after 2012, you also may have been recruited for RCMP employment based on new policies, which entail "literature reviews on terrorism-related topics, stag[ing] workshops on terrorism and security, [and] develop[ing] an internship program for grad students." As Behrens points out in that 2012 Rabble article, over $115,000 were given to pollsters by the RCMP that year alone. And as Behrens also points out, the political rhetoric of those polls are pretty thinly veiled: "They produced a poll asking certain Canadians whether they "strongly" or "somewhat" agree or disagree with the statement, 'Muslims share our values.' [...] A poll produced earlier [in 2012] by ACP also found that 52 per cent of Canadians 'mistrust' Muslims." Three years later, Bill C-51 is passed, and Statistics Canada releases polls supporting the RCMP and the majority government’s terrorist fear-mongering rhetoric, during the peak of election frenzy.

With such potential ideological factors at work, the data vectors can become pretty complicated. For example, according to the Associated Press, the RCMP was reportedly “receiving no royalties from sales. So an all-volunteer Mounted Police Foundation was formed to negotiate strict licensing contracts” (AP), which eventually established relations with Disney. How can data visualizations account for those complex negotiations of institutionalization and accountability when volunteers’ decisions have such weighty significance (especially for those sued for copyright/image infringement)?

So I have no real concrete answers to a lot of the questions I presented in class, and to the preliminary questions I glossed over in this post, but hopefully presenting various methods and visualizations was enough because it’s all I have. That’s partially because there are limitations to how I can collect data and archives, on numerous levels: for example, the RCMP Foundation's official statement on their relationship with Disney has been pulled from the web, and I can’t find any digital archives as of yet. So I might have to hope that information was released and preserved somewhere, identify where it is, and then review that literature once it has been physically found and read.

Tracing data surrounding Disney and the RCMP’s Mountie will probably require official Freedom of Information Act requests, given that archival information is limited at best. A good place to start for data, though, is the RCMP’s annual Report on Plans and Priorities, which are, for now at least, publicly available. Cross-referencing that with publicly released spending data might produce preliminary visualizations, but I think the bulk of data required for my project is the kind that journalists pull from reluctant bureaucratic hands.


1. I use the term “thing” in the same way Jane Bennett does, conceding all objects’ “‘vitality,’ […] mean[ing] the capacity of things – edibles, commodities, storms, metals – not only to impede or block the will and designs of humans but also to act as quasi agents or forces with trajectories, propensities, or tendencies of their own” (vii).  In other words, my research addresses all “things” as capable of exerting power within a network, regardless of how we originally perceive the power of a pen and ink (like the pen and ink which sign words into legislation), or a hand (like the ones raised during a parliamentary vote, or the hands ignored when raised because of their colour) to determine a social relation as much as the humanities’ established interest in the subject.

Wednesday, 7 October 2015

Indexes, graphs, maps, and trees: data visualization in the digital humanities


N.B.: This post will serve mostly as context for my interests in data visualization, mapping, and network analysis, for the purpose of the University of Alberta's Digital Humanities course.

Folklorists often focus on narrative: Russian Structuralist Vladimir Propp famously claimed there are 31 functions of a fairy tale. He argues that “functions of characters serve as stable, constant elements in a tale, independent of how and by whom they are fulfilled. They constitute the fundamental components of a tale” (21). My research argues that the advent of relatively new media for narrative transmissions (television, internet, etc.) and their cultural consequences (franchises, fandoms, etc.) have radically affected the ways North American (and to some extent, European/Eurocentric) culture produces, consumes, and receives fairy tales. So analysing a folktale with Propp’s 31 functions may tell me part of the story, but not all of it.

Context/Once Upon A Time

To fully understand the stories fairy tales offer, though, it’s important to understand how fairy tales have been conceptualized in the past, and how I propose fairy tales should be considered in contemporary contexts. Antti Aarne (1910), Stith Thompson (1928, 1961), and later Hans-Jörg Uther (2004) adapted a much more detailed and complicated system to classify folklore than Propp’s 31 functions. The Aarne-Thompson-Uther index (ATU) organizes folklore by categorizing each tale with a number and cross-referencing it with a letter. It’s a bit long (so feel free to skip down at any point), but Uther explains the lettering system in “Classifying Tales: Remarks to Indexes and Systems of Ordering”:

     The letters:
A. Mythological Motifs
B. Animals
C. Tabu
D. Magic
E. The Death
F. Marvels
G. Ogres
H. Tests
J. The Wise and the Foolish
K. Deceptions
L. Reversal of Fortune
M. Ordaining the Future
N. Chance and Fate
P. Society
Q. Rewards and Punishments
R. Captives and Fugitives
S. Unnatural Cruelty
T. Sex
U. The Nature of Life
V. Religion
W. Traits of Character
X. Humor
Z. Miscellaneous Groups of Motifs
According to content, further subdivisions are made, for instance[:]
group M is subdivided as follows:
Ordaining the Future: Judgments and Decrees (Mot. M 0 - M 99)
Vows and Oaths (M 100 - M 199)
Bargains and Promises (M 200 - M 299)
Prophecies (M 300 - M 399)
Curses (M 400 - M 499)

Tormod Kinnes has helpfully and publicly uploaded the very long list detailing the ATU index’s numerical system:

     The numbers:
ANIMAL TALES
  Wild Animals 1-99
     The Clever Fox (Other Animal) 1-69
     Other Wild Animals 70-99
  Wild Animals and Domestic Animals 100-149
  Wild Animals and Humans 150-199
  Domestic Animals 200-219
  Other Animals and Objects 220-299
TALES OF MAGIC
  Supernatural Adversaries 300-399
  Supernatural or Enchanted Wife (Husband) or Other Relative 400-459
     Wife 400-424
     Husband 425-449
  Brother or Sister 450-459
  Supernatural Tasks 460-499
  Supernatural Helpers 500-559
  Magic Objects 560-649
  Supernatural Power or Knowledge 650-699
  Other Tales of the Supernatural 700-749
RELIGIOUS TALES
  God Rewards and Punishes 750-779
  The Truth Comes to Light 780-799
  Heaven 800-809
  The Devil 810-826
  Other Religious Tales 827-849
REALISTIC TALES (NOVELLE)
  The Man Marries the Princess 850-869
  The Woman Marries the Prince 870-879
  Proofs of FidelitY and Innocence 880-899
  The Obstinate Wife Learns to Obey 900-909
  Good Precepts 910-919
  Clever Acts and Words 920-929
  Tales of Fate 930-949
  Robbers and Murderers 950-969
  Other Realistic Tales 970-999
TALES OF THE STUPID OGRE (GIANT, DEVIL)
  Labor Contract 1000-1029
  Partnership between Man and Ogre 1030-1059
  Contest between Man and Ogre 1060-1114
  Man Kills (Injures) Ogre 1115-1144
  Ogre Frightened by Man 1145-1154
  Man Outwits the Devil 1155-1169
  Souls Saved from the Devil 1170-1199
ANECDOTES AND JOKES
  Stories about a Fool 1200-1349
  Stories about Married Couples 1350-1439
     The Foolish Wife and Her Husband 1380-1404
     The Foolish Husband and His Wife 1405-1429
     The Foolish Couple 1430-1439
  Stories about a Woman 1440-1524
     Looking for a Wife 1450-1474
     Jokes about Old Maids 1475-1499
     Other Stories about Women 1500-1524
  Stories about a Man 1525-1724
     The Clever Man 1525-1639
     Lucky Accidents 1640-1674
     The Stupid Man 1675-1724
  Jokes about Clergymen and Religious Figures 1725-1849
     The Clergyman is Tricked 1725-1774
     Clergyman and Sexton 1775-1799
     Other Jokes about Religious Figures 1800-1849
  Anecdotes about Other Groups of People 1850-1874
  Tall Tales 1875-1999
FORMULA TALES
  Cumulative Tales 2000-2100
  Chains Based on Numbers, Objects, Animals, or Names 2000-2020
  Chains Involving Death 2021-2024
  Chains Involving Eating 2025-2028
  Chains Involving Other Events 2029-2075
  Catch Tales 2200-2299
  Other Formula Tales 2300-2399
If you’re looking for a public database of tales organized by ATU type, check out D.L. Ashliman’s Folktexts: A library of folktales, folklore, fairy tales, and mythology.

The problem with these tale-types, though, is that they don’t quite address the narrative complexities of mainstream folklore, like the fairy tales seen/told/disseminated in ABC’S Once Upon A Time. So my dissertation is interested, among other things, in how ABC, which has been owned by Disney since 1996, has begun to incorporate various fairy tales previously unassociated with Disney’s World within its purview. Thus the Beast from Disney’s iconic Beauty and the Beast (1991) is Rumpelstiltskin as well as Belle’s lover (fig. 1). Or there’s Lana Parilla, whose role as Regina Mills, the mayor of Storybrooke (the show’s contemporary real-world setting), also means acting as the Evil Queen from Snow White (1937) (fig. 2) and Ursula from The Little Mermaid (1989) (fig. 3).
 
Fig. 1
Belle (Emilie de Ravin) and Rumpelstiltskin (Roberty Caryle), Once Upon A Time
Fig. 2
Regina Mills/Snow White's Evil Queen (Lana Parilla), Once Upon A Time

 Fig. 3
Evil Queen/Ursula (Lana Parilla), Once Upon A Time

Now here I will be borrowing from work I’ve previously done during my M.A., but I promise it will only be a branching point for further research/discussion. Because I think my point comes across with some pretty simple data visualizations (fig. 4).

 Fig. 4

This first visualization, for example, is pretty rudimentary (and aesthetically ugly) but it communicates the gist of my dissertation (embarrassingly enough): ABC’s narrative manipulations are conglomerating Western folklore. The next visualization is aesthetically more sophisticated (fig. 5), but conceptually, it doesn’t communicate too much more, although it does make explicit a few things (namely, that ABC operates as the intermediary between Disney’s fairy tale, Beauty and the Beast, and the more publicly/freely consumed narrative, “Rumpelstiltskin,” and also becomes the force that encompasses what was once a freely consumed narrative into a packaged Disney product ready for en-franchising).

 Fig. 5

In the following graphs (figs. 6-8), I tried to make clear the importance of a network-based analysis, totally indebted to Franco Moretti’s “Network Theory, Plot Analysis.”

 Fig. 6

 Fig. 7

Fig. 8


What I think these graphs make clear are the overlap of corporate narratives with folkloric ones. Purnima Bose and Laura E. Lyons argue that
companies are always engaged in a kind of storytelling aimed at improving their public image and justifying their actions. Corporations and their CEOs are in the position of Scheherazade. As long as they have a story to tell that is at least captivating enough they can keep themselves alive for one more day. These stories play a role in suturing or resolving contradictions and in rationalizing seemingly arbitrary and brutal decisions. But there is increasing demand that these narratives be reliable and have some mimetic accuracy. Within the complex web of social, political, and economic relationships that constitute ‘the world of business,’ some stories are getting harder to sell. (3, emphasis mine)
Yet Disney’s stories are, in many ways, becoming easier to sell, with Netflix already beginning to offer exclusive streaming of new Disney releases as well as Disney classics from the Vault. Disney's also negotiating legal conceptualizations of public narratives with the rise of corporate copyright legislation like Canada’s Copyright Modernization Act (2011), the U.S.A.’s Copyrighted Term Extension Act (1998), the U.K.’s Copyright and Rights in Performances Regulations (2014), the European Copyright Directive (2014), and reactionary court cases like Belgium’s Deckmyn V. Vandersteen.

Thus my dissertation is interested in the stories companies and conglomerates tell, in line with Bose and Lyons’ own “interest[…] in the stories corporations tell about themselves and the ways that they weave corporate history into the larger narratives of communities and nations as a means of consolidating and justifying their practices” (3). But most importantly, I’m interested, as Bose and Lyons phrase it, in ensuring that those narratives are reliable, that they hold “mimetic accuracy” (3). In many ways, that is at the heart of my dissertation’s agenda: to detail the nature of Disney’s economic “storytelling,” and just what that means for folklore, which is largely regarded as belonging to a public collective, part of a creative or digital commons (Morell), similar to Britain’s economic Commons, and their legal definition of common land. And most importantly: the reason I’m even presenting these findings for a Digital Humanities course, and here on this blog with graphs I’ve previously worked on, is because I believe data visualization can elucidate/clarify those stories and how they are told.