Shada Sagher

Thursday, 15 October 2015

Visualizing data's possibilities and limitations

N.B.: Pursuant to my presentation (and also to outline some of my dissertation’s foci, in accordance with public and accessible scholarship), I thought I would elaborate on the possibilities of data visualization methods, including some of the methods and ethical concerns they prompt due to basic research practices, like how to collect or filter the data I encounter.

I don't know if my presentation made this clear, but to make explicit my position (and for the benefit of any potential web readers), I want to borrow from our course reading by Jamie “Skye” Bianco in orienting my research and methods, because Bianco’s summarization of her writing succinctly summarizes the aims of this project: “Let me be clear: I am a digital/multimodal compositionist, a digital media practitioner, a feminist, and a critical media theorist whose ethics lie in progressive affiliation and collaborative social justice.” Sometimes the questions and minutiae that I raise during my research seem completely unnecessary, but one thing I’ve realized about my digital methods is that they democratize data, and allow very large and very small “things”¹ to disclose information on equal terms. Although at times these large and small objects of analysis do not disclose information on equal terms, they are always addressed with the potential to speak on equal terms, and analysed accordingly. Thus I proposed in class that a more interactive version of Maggie Lee or Helen Jackson’s data visualizations can have very political ramifications: consider the implications of a visualization charting the correspondences made public during the Mike Duffy court case and scandal (or perhaps made public from a Freedom of Information Request). That could, hypothetically, provide a trail of legal culpability. Understanding how networks operate isn’t just the focus of actor-network theorists, but also of technological network security analysts and detectives who understand the trails emails leave behind, and how they are tracked. Digital humanities, then, can work towards Bianco’s “collaborative social justice” by providing, among other things, interactive visualizations that report situations and explore their circumstances (some of which occur in court rooms, in civil and criminal investigations that reflect a more literal interest in Bianco's "collaborative social justice").

For another brief example, consider Evan Solomon's piece on the “dead cat on the 2015 campaign trail” in Canada's thankfully finished election. The dead cat controversy in this case, credited with changing the electoral tide this year, is the niqab and the racist rhetoric emerging from Canada’s Cultural Barbaric Practices Act (a law that is, unfortunately, as barbaric as its title implies). An adaptation of Mike Bostock, Shan Carter, and Matthew Ericson’s word clouds for the 2012 American National Conventions might produce interesting results if word clouds traced buzzwords throughout the 2015 Canadian election. What if there was a visualization that tracked the media’s mentions of Mike Duffy, and compared it to the uses of the word niqab? Which words are most closely associated with Duffy, and which with the niqab? Who mentions which words, and who mentions them most often? Or perhaps more tellingly, who began mentioning which words first, and when/where did the volume rise? Did polls reflect these numbers in any way? This might be more useful when considering two more closely aligned topics, like how different CEOs are treated (Trump vs. Fiorina in the current Republican primaries, for example).

As Bianco points out, “tools may track and compile data around these questions, visualize and configure it through interactive interfaces and porous databases, but what then? What do we do with the data?” And perhaps more importantly, how is that data being compiled? What must I consider when collecting, curating, and filtering that data for analysis and public release? If, as Bianco reports, 87% of Wikipedia editors are males, is that reflected in the data Wikipedia publicizes? The New York Times reported these dynamics in 2011, and in February 2015, the Association for the Advancement of Artifical Intelligence released a study analysing gender dynamics throughout Wikipedia’s pages in six languages that found some pretty stark data: “men are indeed significantly more central in all language editions… except in the Spanish one where men and women are equally central” (5). Jenny Kleeman fleshes out the implications of these sorts of findings in The New Statesman, although explores some things quite problematically. I don't have the time to address the problems appropriately, although I wish I did. However, her summary of how Hedy Lamar’s oft-unknown but very important contribution to wireless communications technology figures into Wikipedia editing history complements the AAAI’s findings with an exploration of the cultural significance of the numbers that digital humanities can produce.

Morgan Currie’s article on the Feminism entry in Wikipedia is a great example of the importance of what data visualization can tell us about “a collaborative process, continually modifying within a digital environment” (224-5). For Currie, this collaborative editorial process forces an awareness of how the dynamics of “[a]n article can evolve precisely from conflict between editors who disagree, and [how] it may achieve its current state only after years of development” (225). But the collaborative editorial process in Wikipedia is just another specific framework for actor-network considerations, which she emphasizes by defining Wikipedia edits as controversy, defined by Tommaso Venturini as an actor-network event (Venturini qtd. in Currie 228-9). This idea of controversy, or even the basic action of editing on Wikipedia, is just another crystallization of Bennett’s “thing-power,” or the ability of numerous objects operating in various ways to determine a situation. In the case of Wikipedia, Currie does a pretty good job of summarizing how thing-power works in Wikipedia: “a researcher examines not only discussions between editors, but also looks at how wiki software, bots, admin hierarchy, and the wider architecture and mechanical nature of the Web affect disputes” (230), she argues on the most basic level (and also provides further elaboration throughout the section [230-]).

Analysing a Wikipedia page, in other words, requires toggling between different considerations. Wikipedia’s general actions speak louder than the words hosted on their site, or even the editing of those words, at least when Wikipedia attempted to ban editors during a GamerGate “editing war” occurring throughout their pages; when an online encyclopedia assertively and unapologetically takes a stance in a cultural struggle over who has the power and voice in a community by actually silencing certain voices, it actively disenfranchises those voices from potential power, and begins mediating the event it is supposed to be recording.

To ignore Wikipedia’s explicit gestures in favour of subtler and softer (if not impressive) numbers would be problematically reductive and simplistic. And even Currie’s methodical visualizations run into one basic problem: if Currie’s analyses track editing activity, what happens when a whole group of perspectives is precluded from editing activity to begin with? Here is where I think Ian Bogost’s theories of the unit operation are quite helpful; Bogost proposes a similar “toggling” in the study of objects: one can see objects in terms of unit operations, or system operations. For Bogost, “[t]he difference between systems of units and systems as such is that the former derive meaning from the interrelations of their components, whereas the latter regulate meaning for their constituents” (4). System operations explain dynamics discursively, whereas unit operations are conceived as actors in networks and are consequently more aligned with Bennett’s concession of a thing’s agency. More interestingly, for Bogost, a system can be a unit operation itself: “[o]ften, systems become units in other systems” (5), especially if my analysis requires toggling between various “magnitudes of ‘unit’” (Bennett 227). So if a system can be considered as another unit acting within another larger system, Wikipedia’s censorship is no longer a blind spot in Currie’s data, but can actually become a different set of data to consider alongside Wikipedia’s editing activity.

So one visualization may track editing histories, but another visualization may track major hashtags or media buzz words that aren’t on Wikipedia, or grouping Wikipedia articles based on gender (topic, author, etc.), and seeing which get higher word counts, higher traffic categorizations, or nicer word associations, like a team of German and Swiss scholars began considering, or which projects are declined and which funded by the Wikimedia Foundation.

Before I begin to filter my data, I always ask how that data has already been filtered, or even what data no longer exists. Like Bianco, my methods are “against the wielding of computation and code as instrumental, socially neutral or benevolent, and theoretically and politically transparent” (n. pag), because the data I choose can be highly politicized. It can, according to Sam-Chin Li, a government information librarian at U of T, be “altered or deleted without notice” and with little transparency. In fact, according to Anne Kingston, who interviewed Li and conducted a “months-long Maclean’s investigation,” concludes that “the government’s ‘austerity’ program… as well as its arbitrary changes to policy, when it comes to data […] has led to a systematic erosion of government records far deeper than most realize, with the data and data-gathering capability we do have severely compromised as a result.” So when I collect data, I retain skepticism over that data.

So my project, for example, might utilize the informative, interactive, and contextual capacities of data visualization to plot the economic relations between specific corporations (i.e., goods contracts) or even nations, but must always be aware of the fact that it’s often getting this information from press releases, or from legislation that guarantees information is public rather than information that is made transparently accessible. I might begin, in the case of my interest with Disney and Netflix, to chart different companies' contracts with Netflix by weighting their dollar amount, or their duration, or their market saturation in the visualization’s code -- how many movies and shows are being watched by how many people for how long? does this correlate at all with contract moves and bids, and what is the end result of Netflix's catalogue choices? I can plot the films presently available on Netflix, as well as those not on Netflix. I can tag genres, genders, races, time periods, or formats (16mm, digital, 16:9, 1080P, etc.). What is present, and what is absent? Is there a trend in what became absent? Is this perhaps a condition of a particular production company with particular values moving to a different streaming platform, like, say, Amazon, or Hulu, in light of Netflix gaining exclusive streaming rights to Disney works?

These kinds of methods and considerations can also allow me to chart the implications of when the RCMP, a government institution tasked with enforcing Canada's federal laws, contracted their "image rights management" to Disney between 1995-1999. As the Lawrence Journal reported in 1995:

Under the deal, any company that wants to produce Mountie-related souvenirs or other products will have to sign a licensing agreement with Disney Canada. [...] Disney will control the selection of designs, though the Mounties will retain ultimate veto power if they disapprove of a certain image. Each company awarded a license will pay a 10-percent licensing fee that will be split between Disney and the Mounted Police Foundation. Initially, the foundation will get 51 percent of the proceedings, and its share will rise to 55 percent after five years. (Section 9A)

As a folklorist, I'm fascinated in the RCMP's interest in protecting its reputation. Blogger Will Pratt lists numerous circulations of the Mountie image: pornography, a WWF wrestler with a special move called the Mountie Mash, and Labatt's Malcolm the Mountie are the few he references. A folklorist sees this as a cultural remediation, a reception and then dissemination of a particular narrative, influenced by the teller in either small or large ways (and sometimes both). But a corporation sees this as bad image management. So, how can folkloric relations be plotted given these kinds of cultural and capitalist contexts?

There are some disturbing consequences of this relationship; amid massive budget cuts, Rabble reported in 2012 that the federal government was continuing to spend $11.3 million a year on the RCMP "'brand,' which is currently subject to 89 licensing agreements and memoranda of understanding as well as 22 national and international 'strategic partnership agreements entered into to promote the RCMP's image." In light of the Conservative Party’s austerity cuts that mean my job as a researcher/scholar is obfuscated, the decision to spend that much money on RCMP branding is suspicious, to say the least. It's not just taxpayer dollars that are affected, though: if any of you were a student at UBC, Simon Fraser, or University of Waterloo during or shortly after 2012, you also may have been recruited for RCMP employment based on new policies, which entail "literature reviews on terrorism-related topics, stag[ing] workshops on terrorism and security, [and] develop[ing] an internship program for grad students." As Behrens points out in that 2012 Rabble article, over $115,000 were given to pollsters by the RCMP that year alone. And as Behrens also points out, the political rhetoric of those polls are pretty thinly veiled: "They produced a poll asking certain Canadians whether they "strongly" or "somewhat" agree or disagree with the statement, 'Muslims share our values.' [...] A poll produced earlier [in 2012] by ACP also found that 52 per cent of Canadians 'mistrust' Muslims." Three years later, Bill C-51 is passed, and Statistics Canada releases polls supporting the RCMP and the majority government’s terrorist fear-mongering rhetoric, during the peak of election frenzy.

With such potential ideological factors at work, the data vectors can become pretty complicated. For example, according to the Associated Press, the RCMP was reportedly “receiving no royalties from sales. So an all-volunteer Mounted Police Foundation was formed to negotiate strict licensing contracts” (AP), which eventually established relations with Disney. How can data visualizations account for those complex negotiations of institutionalization and accountability when volunteers’ decisions have such weighty significance (especially for those sued for copyright/image infringement)?

So I have no real concrete answers to a lot of the questions I presented in class, and to the preliminary questions I glossed over in this post, but hopefully presenting various methods and visualizations was enough because it’s all I have. That’s partially because there are limitations to how I can collect data and archives, on numerous levels: for example, the RCMP Foundation's official statement on their relationship with Disney has been pulled from the web, and I can’t find any digital archives as of yet. So I might have to hope that information was released and preserved somewhere, identify where it is, and then review that literature once it has been physically found and read.

Tracing data surrounding Disney and the RCMP’s Mountie will probably require official Freedom of Information Act requests, given that archival information is limited at best. A good place to start for data, though, is the RCMP’s annual Report on Plans and Priorities, which are, for now at least, publicly available. Cross-referencing that with publicly released spending data might produce preliminary visualizations, but I think the bulk of data required for my project is the kind that journalists pull from reluctant bureaucratic hands.

1. I use the term “thing” in the same way Jane Bennett does, conceding all objects’ “‘vitality,’ […] mean[ing] the capacity of things – edibles, commodities, storms, metals – not only to impede or block the will and designs of humans but also to act as quasi agents or forces with trajectories, propensities, or tendencies of their own” (vii). In other words, my research addresses all “things” as capable of exerting power within a network, regardless of how we originally perceive the power of a pen and ink (like the pen and ink which sign words into legislation), or a hand (like the ones raised during a parliamentary vote, or the hands ignored when raised because of their colour) to determine a social relation as much as the humanities’ established interest in the subject.

Wednesday, 7 October 2015

Indexes, graphs, maps, and trees: data visualization in the digital humanities

N.B.: This post will serve mostly as context for my interests in data visualization, mapping, and network analysis, for the purpose of the University of Alberta's Digital Humanities course.

Folklorists often focus on narrative: Russian Structuralist Vladimir Propp famously claimed there are 31 functions of a fairy tale. He argues that “functions of characters serve as stable, constant elements in a tale, independent of how and by whom they are fulfilled. They constitute the fundamental components of a tale” (21). My research argues that the advent of relatively new media for narrative transmissions (television, internet, etc.) and their cultural consequences (franchises, fandoms, etc.) have radically affected the ways North American (and to some extent, European/Eurocentric) culture produces, consumes, and receives fairy tales. So analysing a folktale with Propp’s 31 functions may tell me part of the story, but not all of it.

Context/Once Upon A Time

To fully understand the stories fairy tales offer, though, it’s important to understand how fairy tales have been conceptualized in the past, and how I propose fairy tales should be considered in contemporary contexts. Antti Aarne (1910), Stith Thompson (1928, 1961), and later Hans-Jörg Uther (2004) adapted a much more detailed and complicated system to classify folklore than Propp’s 31 functions. The Aarne-Thompson-Uther index (ATU) organizes folklore by categorizing each tale with a number and cross-referencing it with a letter. It’s a bit long (so feel free to skip down at any point), but Uther explains the lettering system in “Classifying Tales: Remarks to Indexes and Systems of Ordering”:

The letters:

A. Mythological Motifs

B. Animals

C. Tabu

D. Magic

E. The Death

F. Marvels

G. Ogres

H. Tests

J. The Wise and the Foolish

K. Deceptions

L. Reversal of Fortune

M. Ordaining the Future

N. Chance and Fate

P. Society

Q. Rewards and Punishments

R. Captives and Fugitives

S. Unnatural Cruelty

T. Sex

U. The Nature of Life

V. Religion

W. Traits of Character

X. Humor

Z. Miscellaneous Groups of Motifs

According to content, further subdivisions are made, for instance[:]
group M is subdivided as follows:
Ordaining the Future: Judgments and Decrees (Mot. M 0 - M 99)
Vows and Oaths (M 100 - M 199)

Bargains and Promises (M 200 - M 299)

Prophecies (M 300 - M 399)

Curses (M 400 - M 499)

Tormod Kinnes has helpfully and publicly uploaded the very long list detailing the ATU index’s numerical system:

The numbers:

ANIMAL TALES

Wild Animals 1-99

The Clever Fox (Other Animal) 1-69

Other Wild Animals 70-99

Wild Animals and Domestic Animals 100-149

Wild Animals and Humans 150-199

Domestic Animals 200-219

Other Animals and Objects 220-299

TALES OF MAGIC

Supernatural Adversaries 300-399

Supernatural or Enchanted Wife (Husband) or Other Relative 400-459

Wife 400-424

Husband 425-449

Brother or Sister 450-459

Supernatural Tasks 460-499

Supernatural Helpers 500-559

Magic Objects 560-649

Supernatural Power or Knowledge 650-699

Other Tales of the Supernatural 700-749

RELIGIOUS TALES

God Rewards and Punishes 750-779

The Truth Comes to Light 780-799

Heaven 800-809

The Devil 810-826

Other Religious Tales 827-849

REALISTIC TALES (NOVELLE)

The Man Marries the Princess 850-869

The Woman Marries the Prince 870-879

Proofs of FidelitY and Innocence 880-899

The Obstinate Wife Learns to Obey 900-909

Good Precepts 910-919

Clever Acts and Words 920-929

Tales of Fate 930-949

Robbers and Murderers 950-969

Other Realistic Tales 970-999

TALES OF THE STUPID OGRE (GIANT, DEVIL)

Labor Contract 1000-1029

Partnership between Man and Ogre 1030-1059

Contest between Man and Ogre 1060-1114

Man Kills (Injures) Ogre 1115-1144

Ogre Frightened by Man 1145-1154

Man Outwits the Devil 1155-1169

Souls Saved from the Devil 1170-1199

ANECDOTES AND JOKES

Stories about a Fool 1200-1349

Stories about Married Couples 1350-1439

The Foolish Wife and Her Husband 1380-1404

The Foolish Husband and His Wife 1405-1429

The Foolish Couple 1430-1439

Stories about a Woman 1440-1524

Looking for a Wife 1450-1474

Jokes about Old Maids 1475-1499

Thursday, 24 September 2015

Technology or Archaeology? What am I studying and how should I study it?

As I type away on my keyboard, I feel pretty confident most of us are aware of the effects and affects that various technologies and technological advances have had on our lives; moving from pen to keyboard (Katherine N. Hayles qtd. in Bassett 116), from book to PDF, from newspaper to Facebook news feed, etc. In the short amount of time I’ve experienced this world, it has been concatenated by technological progressions, and my days have been defined by their developments. Hayles argues that the capacities of a simple word processor have radically changed academic research, while other technologies like blogging have created tensions that have yet to be negotiated within institutional practices (e.g., should an academically informal blog with a larger readership be considered above a peer-reviewed article with low or no citation rates?) (“How We Think” 2-5). It affects my studies, allowing me to access databases’ worth of articles without having to take one step, never mind quite a few toward the library and through its shelves. It allows me to watch Netflix on one screen while I search for sources in another.

For Matthew Wilkens, digital practices introduce new advantages to literary studies by allowing easy (i.e., computational) data mining, such as the quantitative visualization which plots the geographical places that are mentioned in American novels published between 1851-1875 (“Canons” n. pag, Fig. 14.2). It allows for the interesting proposal of a “revised understanding of American regionalism” (n. pag), although Wilkens rightly admits its hypothetical nature, and the need for further investigation. For Matthew Fuller, our culture’s reliance on digital media necessitates “software studies,” which Bassett summarizes as “an approach capable of exploring digital operations, structures, languages, and their intersections and connections directly” (118). Fuller’s Software Studies/A Lexicon provides a primer, advocating on the most basic level that “[s]oftware structures and makes possible much of the contemporary world. [Software Studies/A Lexicon] proposes an exercise in the prototyping of transversal and critical approaches to such stuff […] and to show […] the conditions of possibility that software establishes” (1-2). I don’t think it was until this morning, when I found an archived post from Vice’s Motherboard that I realized just how important these kinds of analyses could be.

The stakes of “software studies” literacy are quite high; according to software engineer Brett Thomas, not knowing how to navigate the internet, and not knowing the ramifications of traveling through cyberspace, can result in your porn-viewing habits being released, legally but without your consent. The reason? Because we don’t really understand digital navigation – we can type in an address, but we don’t know and/or fully get what happens when we type that address and press enter. It’s complicated and to be honest, if I were to try to explain it I’d probably muck it all up, but there’s a great comprehensivearticle about it by Panopticlick. Basically, websites request information from your browser, which freely gives information about you, like your IP address. Even if a porn website promises under its terms and conditions that it does not collect your data, third parties install tracking elements that do (they can range from Google to targeted advertisers).

Understanding how IP addresses work (or how to get one to look like it’s working somewhere else via VPN) becomes important. But so does, apparently, understanding that “incognito mode” is geared toward our reliance on interface-based digital interactions, and that the Incognito-user’s data is still read and written somewhere, just not within your cache of autofills and hyperlink histories. Knowledge of digital processes then, is as important as how we use digital media. In fact, when Brian Merchant reports for Motherboard, he repeats no less than three times that “incognito modes” in a browser does nothing to stop data tracking (this does not account for the two mentions of “private browsing modes”). His story attempts to prime readers in rudimentary software studies literacy, re-assessing how our culture uses digital media, and explaining that digital media is often used for different but no less important ways.

Literary studies might focus on what porn means, or how porn affects culture, or how culture affects porn, etc. Consider The Feminist Porn Book: The Politics of Producing Pleasure (2013); a cursory glance of the table of contents addresses neither web-design nor code, although it focuses on race, gender, queer studies, fat studies, sex education, topics familiar to literary and cultural studies students. Yet vast quantities of porn are available online: according to the Wall Street Journal, 70% of 18-34 year olds have visited a pornographic website within the past month (Wall Street Journal qtd. In Merchant). Even though pornographic data tracking has severe ramifications for queer individuals in jurisdictions where homosexuality is outlawed, queer studies have yet to address the connection between online pornography and the concerns Thomas raises (Thomas believes this data tracking could lead to an Ashley Madison-style data dump), perhaps because that would require a working knowledge of the internet that the average computer-user doesn’t seem to have. Thus I would like to finish by returning to Bassett, who argues for a mode of scholarship which aims to “re-focus the project to re-think, through a newly informed sense of what software is capable of effecting, what actors and what kinds of acts are possible – and perhaps whether it is convergence or its avoidance that might in some way be miraculous” (Bassett 120). What is software capable of effecting, in the case of online porn? What actors and acts are possible? Instead of considering, as The Feminist Porn Book does, how actors of colour fit into the porn industry, or fat porn stars have carved their own space within popular expressions of sexuality, digital humanities can ask, by addressing on the level of code, exactly who is an actor on a webpage – which third party is taking advantage of a porn star’s fame by tracking your habits? which kinds of acts are their codes allowing (are they recording data? which data? are they planting Trojans?)? which kinds of acts are legally possible, and do some need regulation (e.g., revenge porn legislation is finally catching up to the possibilities that the digital world so easily presents)? These are the kinds of questions that aren’t being asked, but as much as certain sexual preferences affect someone’s life, equally important sometimes is how the information about that person’s life is used.
Works Cited

Bassett, Caroline. "Canonicalism and the Computational Turn." Understanding the Digital Humanities. New York: Palgrave Macmillan, 2012. 105-126.

Fuller, Matthew. Introduction. Software Studies \ A Lexicon. ed. Matthew Fuller. 2008. 1-14.

Hayles, Katherine N. How We Think: Digital Media and Contemporary Technogenesis. Chicago: University of Chicago Press, 2012.

Wilkens, Matthew. "Canons, Close Reading, and the Evolution of Method." Debates in the Digital Humanities. Minneapolis: University of Minnesota, 2012. <http://dhdebates.gc.cuny.edu/debates/text/17>