Thursday 15 October 2015

Visualizing data's possibilities and limitations



N.B.: Pursuant to my presentation (and also to outline some of my dissertation’s foci, in accordance with public and accessible scholarship), I thought I would elaborate on the possibilities of data visualization methods, including some of the methods and ethical concerns they prompt due to basic research practices, like how to collect or filter the data I encounter.

I don't know if my presentation made this clear, but to make explicit my position (and for the benefit of any potential web readers), I want to borrow from our course reading by Jamie “Skye” Bianco in orienting my research and methods, because Bianco’s summarization of her writing succinctly summarizes the aims of this project: “Let me be clear: I am a digital/multimodal compositionist, a digital media practitioner, a feminist, and a critical media theorist whose ethics lie in progressive affiliation and collaborative social justice.” Sometimes the questions and minutiae that I raise during my research seem completely unnecessary, but one thing I’ve realized about my digital methods is that they democratize data, and allow very large and very small “things”1 to disclose information on equal terms. Although at times these large and small objects of analysis do not disclose information on equal terms, they are always addressed with the potential to speak on equal terms, and analysed accordingly. Thus I proposed in class that a more interactive version of Maggie Lee or Helen Jackson’s data visualizations can have very political ramifications: consider the implications of a visualization charting the correspondences made public during the Mike Duffy court case and scandal (or perhaps made public from a Freedom of Information Request). That could, hypothetically, provide a trail of legal culpability. Understanding how networks operate isn’t just the focus of actor-network theorists, but also of technological network security analysts and detectives who understand the trails emails leave behind, and how they are tracked. Digital humanities, then, can work towards Bianco’s “collaborative social justice” by providing, among other things, interactive visualizations that report situations and explore their circumstances (some of which occur in court rooms, in civil and criminal investigations that reflect a more literal interest in Bianco's "collaborative social justice").

For another brief example, consider Evan Solomon's piece on the “dead cat on the 2015 campaign trail”  in Canada's thankfully finished election. The dead cat controversy in this case, credited with changing the electoral tide this year, is the niqab and the racist rhetoric emerging from Canada’s Cultural Barbaric Practices Act (a law that is, unfortunately, as barbaric as its title implies). An adaptation of Mike Bostock, Shan Carter, and Matthew Ericson’s word clouds for the 2012 American National Conventions might produce interesting results if word clouds traced buzzwords throughout the 2015 Canadian election.  What if there was a visualization that tracked the media’s mentions of Mike Duffy, and compared it to the uses of the word niqab? Which words are most closely associated with Duffy, and which with the niqab? Who mentions which words, and who mentions them most often? Or perhaps more tellingly, who began mentioning which words first, and when/where did the volume rise? Did polls reflect these numbers in any way? This might be more useful when considering two more closely aligned topics, like how different CEOs are treated (Trump vs. Fiorina in the current Republican primaries, for example).

As Bianco points out, “tools may track and compile data around these questions, visualize and configure it through interactive interfaces and porous databases, but what then? What do we do with the data?” And perhaps more importantly, how is that data being compiled? What must I consider when collecting, curating, and filtering that data for analysis and public release? If, as Bianco reports, 87% of Wikipedia editors are males, is that reflected in the data Wikipedia publicizes? The New York Times reported these dynamics in 2011, and in February 2015, the Association for the Advancement of Artifical Intelligence released a study analysing gender dynamics throughout Wikipedia’s pages in six languages that found some pretty stark data: “men are indeed significantly more central in all language editions… except in the Spanish one where men and women are equally central” (5). Jenny Kleeman fleshes out the implications of these sorts of findings in The New Statesman, although explores some things quite problematically. I don't have the time to address the problems appropriately, although I wish I did. However, her summary of how Hedy Lamar’s oft-unknown but very important contribution to wireless communications technology figures into Wikipedia editing history complements the AAAI’s findings with an exploration of the cultural significance of the numbers that digital humanities can produce.


Morgan Currie’s article on the Feminism entry in Wikipedia is a great example of the importance of what data visualization can tell us about “a collaborative process, continually modifying within a digital environment” (224-5). For Currie, this collaborative editorial process forces an awareness of how the dynamics of “[a]n article can evolve precisely from conflict between editors who disagree, and [how] it may achieve its current state only after years of development” (225). But the collaborative editorial process in Wikipedia is just another specific framework for actor-network considerations, which she emphasizes by defining Wikipedia edits as controversy, defined by Tommaso Venturini as an actor-network event (Venturini qtd. in Currie 228-9). This idea of controversy, or even the basic action of editing on Wikipedia, is just another crystallization of Bennett’s “thing-power,” or the ability of numerous objects operating in various ways to determine a situation. In the case of Wikipedia, Currie does a pretty good job of summarizing how thing-power works in Wikipedia: “a researcher examines not only discussions between editors, but also looks at how wiki software, bots, admin hierarchy, and the wider architecture and mechanical nature of the Web affect disputes” (230), she argues on the most basic level (and also provides further elaboration throughout the section [230-]). 

Analysing a Wikipedia page, in other words, requires toggling between different considerations. Wikipedia’s general actions speak louder than the words hosted on their site, or even the editing of those words, at least when Wikipedia attempted to ban editors during a GamerGate “editing war” occurring throughout their pages; when an online encyclopedia assertively and unapologetically takes a stance in a cultural struggle over who has the power and voice in a community by actually silencing certain voices, it actively disenfranchises those voices from potential power, and begins mediating the event it is supposed to be recording.

To ignore Wikipedia’s explicit gestures in favour of subtler and softer (if not impressive) numbers would be problematically reductive and simplistic. And even Currie’s methodical visualizations run into one basic problem: if Currie’s analyses track editing activity, what happens when a whole group of perspectives is precluded from editing activity to begin with? Here is where I think Ian Bogost’s theories of the unit operation are quite helpful; Bogost proposes a similar “toggling” in the study of objects: one can see objects in terms of unit operations, or system operations. For Bogost, “[t]he difference between systems of units and systems as such is that the former derive meaning from the interrelations of their components, whereas the latter regulate meaning for their constituents” (4). System operations explain dynamics discursively, whereas unit operations are conceived as actors in networks and are consequently more aligned with Bennett’s concession of a thing’s agency. More interestingly, for Bogost, a system can be a unit operation itself: “[o]ften, systems become units in other systems” (5), especially if my analysis requires toggling between various “magnitudes of ‘unit’” (Bennett 227). So if a system can be considered as another unit acting within another larger system, Wikipedia’s censorship is no longer a blind spot in Currie’s data, but can actually become a different set of data to consider alongside Wikipedia’s editing activity.

So one visualization may track editing histories, but another visualization may track major hashtags or media buzz words that aren’t on Wikipedia, or grouping Wikipedia articles based on gender (topic, author, etc.), and seeing which get higher word counts, higher traffic categorizations, or nicer word associations, like a team of German and Swiss scholars began considering, or which projects are declined and which funded by the Wikimedia Foundation. 

Before I begin to filter my data, I always ask how that data has already been filtered, or even what data no longer exists. Like Bianco, my methods are “against the wielding of computation and code as instrumental, socially neutral or benevolent, and theoretically and politically transparent” (n. pag), because the data I choose can be highly politicized. It can, according to Sam-Chin Li, a government information librarian at U of T, be “altered or deleted without notice” and with little transparency. In fact, according to Anne Kingston, who interviewed Li and conducted a “months-long Maclean’s investigation,” concludes that “the government’s ‘austerity’ program… as well as its arbitrary changes to policy, when it comes to data […] has led to a systematic erosion of government records far deeper than most realize, with the data and data-gathering capability we do have severely compromised as a result.” So when I collect data, I retain skepticism over that data.

So my project, for example, might utilize the informative, interactive, and contextual capacities of data visualization to plot the economic relations between specific corporations (i.e., goods contracts) or even nations, but must always be aware of the fact that it’s often getting this information from press releases, or from legislation that guarantees information is public rather than information that is made transparently accessible. I might begin, in the case of my interest with Disney and Netflix, to chart different companies' contracts with Netflix by weighting their dollar amount, or their duration, or their market saturation in the visualization’s code -- how many movies and shows are being watched by how many people for how long? does this correlate at all with contract moves and bids, and what is the end result of Netflix's catalogue choices? I can plot the films presently available on Netflix, as well as those not on Netflix. I can tag genres, genders, races, time periods, or formats (16mm, digital, 16:9, 1080P, etc.). What is present, and what is absent? Is there a trend in what became absent? Is this perhaps a condition of a particular production company with particular values moving to a different streaming platform, like, say, Amazon, or Hulu, in light of Netflix gaining exclusive streaming rights to Disney works?

These kinds of methods and considerations can also allow me to chart the implications of when the RCMP, a government institution tasked with enforcing Canada's federal laws, contracted their "image rights management" to Disney between 1995-1999. As the Lawrence Journal reported in 1995:
Under the deal, any company that wants to produce Mountie-related souvenirs or other products will have to sign a licensing agreement with Disney Canada. [...] Disney will control the selection of designs, though the Mounties will retain ultimate veto power if they disapprove of a certain image. Each company awarded a license will pay a 10-percent licensing fee that will be split between Disney and the Mounted Police Foundation. Initially, the foundation will get 51 percent of the proceedings, and its share will rise to 55 percent after five years. (Section 9A)  
As a folklorist, I'm fascinated in the RCMP's interest in protecting its reputation. Blogger Will Pratt lists numerous circulations of the Mountie image: pornography, a WWF wrestler with a special move called the Mountie Mash, and Labatt's Malcolm the Mountie are the few he references. A folklorist sees this as a cultural remediation, a reception and then dissemination of a particular narrative, influenced by the teller in either small or large ways (and sometimes both). But a corporation sees this as bad image management. So, how can folkloric relations be plotted given these kinds of cultural and capitalist contexts?

There are some disturbing consequences of this relationship; amid massive budget cuts, Rabble reported in 2012 that the federal government was continuing to spend $11.3 million a year on the RCMP "'brand,' which is currently subject to 89 licensing agreements and memoranda of understanding as well as 22 national and international 'strategic partnership agreements entered into to promote the RCMP's image." In light of the Conservative Party’s austerity cuts that mean my job as a researcher/scholar is obfuscated, the decision to spend that much money on RCMP branding is suspicious, to say the least. It's not just taxpayer dollars that are affected, though: if any of you were a student at UBC, Simon Fraser, or University of Waterloo during or shortly after 2012, you also may have been recruited for RCMP employment based on new policies, which entail "literature reviews on terrorism-related topics, stag[ing] workshops on terrorism and security, [and] develop[ing] an internship program for grad students." As Behrens points out in that 2012 Rabble article, over $115,000 were given to pollsters by the RCMP that year alone. And as Behrens also points out, the political rhetoric of those polls are pretty thinly veiled: "They produced a poll asking certain Canadians whether they "strongly" or "somewhat" agree or disagree with the statement, 'Muslims share our values.' [...] A poll produced earlier [in 2012] by ACP also found that 52 per cent of Canadians 'mistrust' Muslims." Three years later, Bill C-51 is passed, and Statistics Canada releases polls supporting the RCMP and the majority government’s terrorist fear-mongering rhetoric, during the peak of election frenzy.

With such potential ideological factors at work, the data vectors can become pretty complicated. For example, according to the Associated Press, the RCMP was reportedly “receiving no royalties from sales. So an all-volunteer Mounted Police Foundation was formed to negotiate strict licensing contracts” (AP), which eventually established relations with Disney. How can data visualizations account for those complex negotiations of institutionalization and accountability when volunteers’ decisions have such weighty significance (especially for those sued for copyright/image infringement)?

So I have no real concrete answers to a lot of the questions I presented in class, and to the preliminary questions I glossed over in this post, but hopefully presenting various methods and visualizations was enough because it’s all I have. That’s partially because there are limitations to how I can collect data and archives, on numerous levels: for example, the RCMP Foundation's official statement on their relationship with Disney has been pulled from the web, and I can’t find any digital archives as of yet. So I might have to hope that information was released and preserved somewhere, identify where it is, and then review that literature once it has been physically found and read.

Tracing data surrounding Disney and the RCMP’s Mountie will probably require official Freedom of Information Act requests, given that archival information is limited at best. A good place to start for data, though, is the RCMP’s annual Report on Plans and Priorities, which are, for now at least, publicly available. Cross-referencing that with publicly released spending data might produce preliminary visualizations, but I think the bulk of data required for my project is the kind that journalists pull from reluctant bureaucratic hands.


1. I use the term “thing” in the same way Jane Bennett does, conceding all objects’ “‘vitality,’ […] mean[ing] the capacity of things – edibles, commodities, storms, metals – not only to impede or block the will and designs of humans but also to act as quasi agents or forces with trajectories, propensities, or tendencies of their own” (vii).  In other words, my research addresses all “things” as capable of exerting power within a network, regardless of how we originally perceive the power of a pen and ink (like the pen and ink which sign words into legislation), or a hand (like the ones raised during a parliamentary vote, or the hands ignored when raised because of their colour) to determine a social relation as much as the humanities’ established interest in the subject.

3 comments:

  1. It sounds to me like a question any data collector should be asking before the collecting begins: "before I begin to filter my data, I ask how that data has already been filtered." I think you have a really sound working methodology here, and the project itself is very interesting. I especially admire your rigorous "skepticism." You seem to maintain an inward and outward focus that is essential for any research project, but is often directed outward, toward results, rather than inward, toward what made those results possible. As humanists, digital or not, we use tools to think about data. It's important that we know how those tools have been designed for our use, thus shaping our research, as well as how data may have been fingerprinted beforehand.

    Do you think Mike Myers and John Oliver paid the 10% licensing fee?!

    ReplyDelete
    Replies
    1. Lee, I'm realizing more and more as I create a database just how important some of the smallest data or omissions can be. It's pretty crazy the amount of detailed data that sometimes needs to be accounted for before I'm able to progress onto the next stage of research. I'm constantly evaluating the data and whether that tiny little piece is being addressed appropriately, or if more tiny little pieces need to surround it before setting the data up and analysing it.

      One thing I wanted to get into about databases but couldn't (I should probably just write a post about it) is how one piece of data is often set up/explored in numerous ways, both in terms of a database's structure but also in terms of how it's used for analyses, and what exactly that means for research results, or when conducting research. I think Folsom's article begins addressing the topic (particularly with his interest in the database's possibility to endlessly reorganize Whitman's work), but doesn't fully flesh out the implications of returning to the same piece of data in different ways.

      I have no idea whether Myers and Oliver paid the licensing fee, but they could always argue fair use as well. Oliver's show is, if anything, one of the greatest arguments for intellectual fair use I've come across.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete