Week 4: TEI in the Wild – Dictionary Edition

One TEI-using project close to home is the Dictionary of Old English Web Corpus (DOE), which has its physical headquarters on the 14th floor of Robarts. The DOE is a compilation of all surviving Old English texts, some in more than one copy. Each text has been XML-encoded and complies with TEI guidelines. Essentially, the DOE contains all of the surviving vocabulary of the Old English period (around 600-1150 CE). It is searchable in a variety of ways and is one of the best resources in the study of Old English.

Unfortunately, the website gives very little detail about its encoding strategies other than that they are compatible with the TEI-P5 2007 guidelines. It does not make its code available for others.

The project has also not published anything about its methods or challenges. The DOE’s editor, Antonette diPaolo Healey, wrote an article about the move from “manuscripts to megabytes” and the digital tools used by the DOE, yet she does not go so far as to talk about the code behind it.

However, in looking for information on the DOE, I did come across a short paper on the use of XML to create electronic texts of medieval manuscripts. The author goes through a few examples of this, and it showcases a current use for XML. It also has a great title.

That article can be found in the UTL catalogue: Powell, Kathryn. “XML and Early English Manuscripts: Extensible Medieval Literature.” Literature Compass 1 (2003): 1-5. doi: 10.1111/j.1741-4113.2004.00061.x.

Bibliography

DOE website: http://tapor.library.utoronto.ca.myaccess.library.utoronto.ca/doecorpus/index.html

DOE About Page: http://www.doe.utoronto.ca/pages/pub/web-corpus.html

Healey, Antonette diPaolo. “The Dictionary of Old English: From Manuscripts to Megabytes.” Dictionaries: Journal of the Dictionary Society of North America 23 (2002): 156-179. doi: 10.1353/dic.2002.0009.

Stressing TEI

There are a lot of online scholarly projects that feel like graveyards even when they first launch – at least that’s my experience. The usefulness of some of these sites seems limited, especially in terms their potential audience. But For Better for Verse is a relatively practical site for all students of poetry to practice scansion. (Maybe my enthusiasm stems from my own difficulties around scansion…) Users of the site can guess where stresses go in a poem – specific to the syllable – and have their guesses checked. As discussed in class, this site uses TEI as a way of tagging something that is not presentational: syllables.

TEI features this site on its “Projects” page, in which an overview of the project’s use of TEI is given:

“The several dozen poems in the site are marked up with TEI P5 coding, especially subsection 6 on Verse. We introduced slight modification of the marking of syllable divisions within a word but chiefly followed the TEI protocols.”

The actual website of For Better for Verse, however, does not contain any of this information (or even a link to the TEI page). The XML is not available to access.  The site was developed by the University of Virginia Department of English. Contact information is given, and so it is possible that the contacts may be able to share the source code with anyone interested in the site’s use of TEI.

Bibliography:

 “For Better for Verse.” Text Encoding Initiative. Accessed February 3, 2016. http://www.tei-c.org/Activities/Projects/fo02.xml.

Week 4: TEI in the REEDs

Records of Early English Drama, (REED) is an online “international scholarly project” aimed at “establishing for the first time the context from which the drama of Shakespeare and his contemporaries grew” (Records of Early English Drama). It brings together resources (namely transcriptions of historical documents related to the topic) from all over the world, including U of T, and makes them open and accessible from one online location.

The project is fairly transparent about the construction of the resource through a document it makes available called the “Fortune White Paper.” It discusses their TEI encoded prototype edition in depth and is accessible from the drop down menu under Online Resources, then Building EREED. It’s quite a lengthy document but if you scroll down to section 3 (Editorial Work: Technology) it starts to discuss their servers as well as their use of Oxygen in their TEI work. They describe their records as beginning in Microsoft Word MS and then converted to TEI-XML. Section 4 details how they chose to work with TEI’s Guidelines for Electronic Text Encoding and Interchange, based on the eXtensible Markup Language. The document continues to give a detailed breakdown of each decision and method employed with regards to the creation of the markup of the REEDs project, and even offers a sample segment of code using a random item. It doesn’t offer a downloadable XML file from within this document, however if one returns to the Building EREED page and clicks the Downloadable Script link, a page is made available with two different downloadable TXT files as well as a schema. The first one is described as “A TXT file of a PERL script, which parses REED document text files, converting REED’s markup into TEI-Lite conformant XML, … [which] populates the REED database”, while the second one notes it was prepared as an experiment and is a “TXT file of a drafted PERL script, which queries the REED database and formats the resulting data as Microsoft Word RTF output” (Records of Early English Drama). I get the sense that these are still examples of how the database works rather than actual samples of XML from the actual records made available online, but it does seem fairly in depth overall. Unlike the Folger Shakespeare example, one cannot download the code directly from the page of the record one is viewing, so the level of transparency is not quite on par, but there is a great deal of information prepared and made accessible about the project online.

Bibliography

Records of Early English Drama. University of Toronto. 2016. Online. February 2016.

TEI in the Wild: Yellow Fever Commission

The U.S. Army Yellow Fever Commission IMLS Digitization Project by the University of Virginia is an online exhibit that discovers the work, historical importance and impact of the U.S. Army Yellow Fever Commission.  The homepage for the project is available at http://exhibits.hsl.virginia.edu/yellowfever/. The collection consists of thousands of documents,including handwritten and typed notes, news paper articles, photographs, miscellaneous printed materials and artifacts which the University has been digitizing.

The project uses XML with TEI attributes to mark- up the resources and save them in TIFF files on CDs. The project also made digitized, transcribed and marked-up primary materials available online the website. The project provides a rather detailed history of  its journey through the digitization project as well as mistakes and challenges they faced. The project goes further to share lessons they learned and make recommendations for similar projects. More information on the digitization process is available at http://exhibits.hsl.virginia.edu/yellowfever-new/collection-digitization-report-1999-2004/.

The the details of its use of XML is not available, but it appears to be based upon the University of Virginia Library’s TEI Encoding Guidelines. The University of Virginia Library’s TEI Encoding Guidelines available at http://dcs.library.virginia.edu/digital-stewardship-services/tei-encoding-guidelines/.

Week 3: EEBO

When thinking about where I have encountered issues with representation, I thought of my experience using the database Early English Books Online (EEBO). It’s not quite as specific of an example as the Beatles and Sgt. Pepper’s, but my experience of this database was probably my first encounter with issues of representing print online, and how different projects privilege different types of representation. You can find the site here, and should be able to sign in through U of T.

In my third year of my undergraduate I was taking a Shakespeare course, and my professor had us do an assignment that asked us to engage with a variety of scholarly resources for a single project. Using EEBO was one of the tasks, while going to see a 1511 edition of Ovid’s Metamorphosis were among the other tasks we were given in our scholarly scavenger hunt (The assignment was focusing on Shakespeare’s education and the texts he would have read). This was the first time I had ever seen a rare book, and the first time I had consulted a database that replicated early printed text. Going to see the rare was obviously one of the most incredible experiences of my life and everything sensory about it stayed with me – the colour and feel of the paper, the binding, the woodcuts, the printed Latin I couldn’t read – but when I searched around various texts on EEBO, I naturally felt fairly underwhelmed comparatively. While the content and layout remained, that characteristic aged colour of the paper was gone, and there was only the pure white of a scanned image (see Figure 1).

Screen Shot EEBOFigure 1.

This makes sense considering EEBO is database of digital facsimiles rather than a digital preservation project with high-resolution photos of actual rares. However the juxtaposition was especially jarring after having been looking at the real thing. I suddenly became aware that while the representation served a purpose, it was missing a great deal of the aspects of reading an early printed book. It also brought an awareness that the medium of a text could inform a reader’s experience in pretty exceptional ways. What was missing from the digital facsimile helped me appreciate the tactile and visual aspects of early print and the importance of conserving it.

It also gave way to further thoughts around representation and choice. It’s important to think of digitization as representation because there is a motive behind the act of taking a text and altering it so that it exists in a new state. Digitization both reveals and obscures in the sense that it has an awareness and an opinion about what the end product should say and do, which intentionally guides the reader in subtle (and less subtle) ways that are important to consider when engaging with any text.

Digitization & representation

This week’s question concerning representation brought to mind The Selected Poems of E.J. Pratt: A Hypertext Edition, edited by Sandra Djwa, W.J. Keith, and Zailig Pollock. This text also ties in with Monday’s class discussion on popular perceptions of the Digital Humanities.

To explore this project for yourself, follow this link: http://www.trentu.ca/faculty/pratt/selected/

While I find the design of this project unappealing, including the small window for text and the font choice of Courier New, I appreciate the way the text makes editorial choices visible.

The use of hyperlinks highlights particular sections of the poem. Words and lines associated with a link are highlighted in blue and underlined. This emphasizes the words in ways in which a footnote citation does not. Footnotes, while by no means invisible to the reader, are less obtrusive. In the print version of the text, the font is also all one colour (black). As a result, the editors direct readers’ attention quite obtrusively to specific parts of the poem. This asserts an editorial presence.

Is all of this necessary, though? The intended reader for this text is a scholar. I draw this conclusion because one can only find this text on the Trent website if they already know of its existence, either through searching the site or following a series of hypertext links through the rabbit hole which is the university’s website. When reading Pratt’s “The Titanic,” does a scholar really need to click on a link which brings them to a grainy picture of the vessel’s deck? A Google image search would provide the same result, and more.

I am curious to hear what my Future of the Book colleagues think about a digitization project such as this one.

Week 3: Adventures in Representations

I feel the need to begin this post with a disclaimer. I love ebooks. I don’t mind not having the tactile feeling of a physical book as I read. The tactile feeling of my ereader is enough for me. I have noticed differences in how I read and what I retain when reading ebooks versus physical books, and I have some preferences as to what genres I read in either. Still, I’m generally happy either way.

That being said, I’ve had a very bizarre experience reading manga in print and online. Not all manga, but one specific series. Manga, though I’m sure everyone knows this, are Japanese comics. So for me to even read them at all they must go through a pretty significant change. Not only must they be translated into another language, but Japanese characters (ex. for sound effects) are often integrated into the art, which forces the translators to choose between leaving them in, often unexplained, or replacing them with Roman letters. I’ve experienced both.

The series I had such a strange time with is Tsubasa: Reservoir Chronicle (hereby called TRC) by the manga artist group Clamp, which was published from 2003-2009. As I was reading it, I quickly became impatient with the speed at which the volumes were released. Chapters of the manga were being published serially in Japan and I had to wait months for them to be collected, translated, and delivered into my greedy hands. So, I started reading them in scanlation. Scanlations are fan-made translations, written on top of scans of the Japanese originals and posted on the internet as they are released. They are often dark and smudgy, retaining the physical look of the paper as well as distortion caused by the scanning process. The translations themselves can be awkward.

tsubasa16_c120_01 copy

Despite this, I read buckets of scanlated manga. TRC itself I read and reread multiple times, eventually obtaining image files of the scans and cobbling together my own PDF ebooks.

Eventually, I decided I should read the legitimate, legal, licenced translations. But, when I got the books from the library the experience was disorienting. The pages were too small compared to what I had on my computer – the images weren’t distinct enough, the print was too small. The need to hold the book open wide so that the gutter wouldn’t distort the illustrations annoyed me. The paper itself was too powdery, but it smelled very nice, more gentle than other books, and gave a creamy cast to the illustrations. I’d read other manga in physical form without noticing anything. Somehow, my brain couldn’t handle switching formats with this particular series.

Recently, I started buying TRC in ebook form. That reading experience too is very different. The lines are cleaner and the whites are brighter, which is lovely. And, they come with the licenced translations. Still, one disorienting difference is that the ebooks show the page spreads, both recto and verso. Having read the manga so many times page by page, image file by image file, it’s a strange experience. I am essentially retraining my brain to read this format.

Screen Shot 2016-01-28 at 11.40.43 PM

I have read three different representations of this series, and each time it has been like reading a different thing. The scanlations focused on the textual, linguistic content. That was the reason for their existence, after all. The physical and ebook forms showcase the art more. Yet, the gutters of the physical books obscure some of the image content and page transitions, whereas the flatness of the ebooks makes it almost too easy to move from page to page.

In the end, I suppose I will continue to switch between formats. Each one has its benefits and drawbacks, and each supplies a unique and valuable experience.

Shift from Material to Digital

Board games are a popular pastime and lend to socialization among friends and family members. There has been a resurgence in Toronto at places like Snakes and Lattes and their sister location Snakes and Lagers. When we think of digitization of these types of games, we think of computer or mobile games that transfer all the properties into the digital realm. However, I thought about when board games are only partially digitized and how assets add or take away from the experience.

A couple of years ago, I purchased a new game of Monopoly with the electronic banking system. Monopoly originated in the United States in 1903 as an educational tool to explain taxation (Wikipedia, 2016). It has evolved into a pastime enjoyed by families with many versions available, such as Star Wars and Harry Potter. In this new version, you are given a card with preloaded money and all transactions flow through it. I found it difficult to keep track of my money because I was use to seeing it in front of me and had little tips on how I would spend it. My strategy was to place my $100 and $500 bills under the board out of sight, so that when I was in a pinch I had that extra money to spend. Now, with everything on the card, I could no longer do so. In addition, you could no longer cheat! My brother tried to scam some money as the banker, but was unable to do so because everyone could see the transaction via the machine. I found that this change from a material cash system to an electronic one diminished the experience of the game. The feel of the money, visually keeping count of how much you had, and, yes, even skimming some off the top.

Monopoly board with Electronic Banking
Monopoly Electronic Banking Game © Hasbro Games

The change from a cash system to an electronic one was a reflection of how our society had changed. Rarely, do I see people walk around with cash; they rely on their debit and credit cards for everyday transactions. It is harder to keep track of your purchases this way and can easily lead to increased debt. It used to be that Monopoly would teach you money skills in the sense that you are given so much and once it’s gone (physically) you know you are bankrupt and need to sell assets to survive. With this cash system change it seems like those lessons are gone and replaced with the consumerism tied to digital money: you receive money electronically, you spend it using a credit card, and all you can see are numbers going up and down, from black to red. You no longer have the physical medium of money to learn the lesson of rational spending; the system becomes hidden with only a digital counter to belie its existence.

Citations

Hasbro Games, Monopoly Electronic Banking Game. 2016. http://www.hasbro.com/en-us/product/monopoly-electronic-banking-game:AD4A14AC-5056-900B-10AB-AE9AC7F1AC92

Wikipedia contributors, “Monopoly (game),” Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Monopoly_(game)&oldid=701893421 (accessed January 27, 2016).

 

Cardboard to Circuit Board

Board games are really popular right now (especially in Toronto). At first glance, this tactile, in-person social phenomenon seems to be a rebuke of all things digital, or at least a respite from them. Cardboard over circuit boards. Dice over devices.

But alongside board games are digital apps that recreate these games, ostensibly not changing anything about the game. The rules are the same. The artwork is often exactly the same.  It’s just on your iPad – well, kind of…

A photo of the physical version of the game. By Laszlo Molnar. Source: Board Game Geek (https://boardgamegeek.com/image/464308/small-world)

One of these games to make such a transition is designer Philippe Keyaerts’ Small World (Days of Wonder, 2009). And it’s a game I’ve played in its digital form. This is a game for 2-4 players . It plays like a streamlined Risk, players competing directly for territories. Here are some of the big differences between the physical and digital versions:

  • There is music in the digital version. This music, by my ears, seems to reinforce the lightheartedness of the game with a lulling, cheery soundtrack.
  • The game has quite a bit of math in it. But the digital version, as you’d expect, handles this digitally. On the one hand this ensures that the points keeping is more accurate. On the other hand, if this is being used as an educational activity for children, then the digital version’s automatic calculations would be a disadvantage compared to the physical version’s need for human brainpower.
  • But the most significant change between the two versions is that the digital version allows AI opponents that a single player can compete against. Unlike the physical version, which requires in-person social interaction between players, the digital version does not. It’s easy to see why that may change the experience of the game. If board games are synonymous with in-person social activities, is any digital port of a board game really a board game?

form and meaning intertwined, or, digital insects, binary graffiti, and google hands

 

Museum of Natural History, Berlin. (http://www.nytimes.com/2015/10/20/science/putting-museums-samples-of-life-on-the-internet.html).

Okay, bear with me on this one. There’s a bunch of wormhole-y tangents that I hope will come together to make some sort of sense. I started this off thinking about attempts to digitize completely physical objects, things that exist undoubtedly in natural form. So, how about nature itself? It turns out that the Berlin Museum of Natural History is ambitiously undertaking a project that uses digital technology to create 3D images of the museum’s entire collection of insects. Erik Olsen describes how these are not simply scanned images of specimen drawers – rather, specimens are placed on “a rotating drum in a lightbox and photographed at many angles with a macro lens,” then, using computer software, the team stitches the photographs together which can be downloaded and seen from up to 100 angles (as many as 500 images can be taken of a single angle, and 3,000-5,000 images of a single specimen) (Olsen, 2015). Olsen notes that the team uses compression and an algorithm to load small portions of the resulting massive image at a time. See the result, called ‘ZooSphere’ here: http://www.zoosphere.net/. You can also see a video describing the project here: http://nyti.ms/1kkig5z.

This raised a lot of questions for me. There are numerous degrees of separation/representation going on here – there was the insect living in the wild, then the dead insect physically pinned/preserved in a drawer in the museum, then the digital 3D rendering of the image, posted publicly online – searchable, downloadable, viewable as a panorama. While not a ‘text’ per se, although perhaps in McKenzie’s broad definition – these insects still speak to Sperberg-McQueen’s assertion that “tools always shape the hand that wields them; technology always shapes the minds that use it. And so as we work more intimately with electronic texts, we will find ourselves doing those things that our electronic texts make easy for us to do” (1991, 34). How is it different for a biologist or researcher, to interact with these insect specimens online – ‘three-dimensional’ on a flat, digital screen? What do they ‘make easy for us to do’, or not? Are there aspects that couldn’t be observed in person, interacting with a fragile, precious specimen? Similarly, is there an ‘aura’ to the insect that is lost in digital form? That is untranslatable?

In a strange leap of brainstorming – this question also took me on a couple of separate (but I hope related) tangents.

One is London-based artist Stanza’s ‘The Binary Graffiti Club’, referred to as “a user friendly public participatory spectable [sic] and public engagement event across urban space creating new narratives for the playfull [sic] engagement of the environment, spectacle, performance, politics and art” (Stanza, 2013). The participants, made up by young members of the public, “[encode] the city with messages of binary code” (Stanza, 2013). Check it out here: http://stanza.co.uk/binary_club/index.html.

People dressed in black and white binary hoodies roam the city, tagging physical objects with messages in binary code. There’s something here, I’m just not sure what, yet. Stanza opened the Frequency Festival of Digital Culture in 2013 in Lincoln, England. The festival co-director noted that, “youths dressed in black hoodies swarmed the historic city streets of Lincoln during Frequency Festival 2013, their backs emblazoned with bold white digits, the zeros and ones. Their ominous presence was marked with a series of binary code graff-tags on official buildings throughout the city; messages of insurrection for a digital cult now active among us or analogue reminders of the digital soup of signals we wade through on a daily basis?  There’s an engaging playfulness and an aesthetic pleasure to Stanza’s work that pays rewards on deeper investigation.  His urban interventions remind us of the invisible occupation of the cyberspace around us and encourages us to ask whose hand manipulates these systems of control.” (Hale, 2013 in Stanza).

Speaking of hands, all of this has also made me think about the flurry of news a few years ago regarding the ghostly figures of Google book scanners’ hands appearing in books. Artist Benjamin Shaykin collected examples of this and other scanning mishaps and published them in a book called “Google Hands” (http://benjaminshaykin.com/Google-Hands). A similar collector, Paul Soulellis, curates ‘Library of the Printed Web’ (http://libraryoftheprintedweb.tumblr.com/), consisting of stuff pulled from the internet and bound into paper books, including a print-on-demand novel by Sean Raspet called “2GFR24SMEZZ2XMCVI5L8X9Y38ZJ2JD 25RZ6KW4ZMAZSLJ0GBH0WNNVRNO7GU 2MBYMNCWYB49QDK1NDO19JONS66QMB
2RCC26DG67D187N9AGRCWK2JIHA7E2
2H1G5TYMNCWYM81O4OJSPX11N5VNJ0 A Novel,” (http://libraryoftheprintedweb.tumblr.com/post/52408927041/raspet-sean-2gfr24smezz2xmcvi5l8x9y38zj2jd)  which is “an accumulation of CAPTCHA test results,” which are “designed to verify that a user is human by requiring her to perform a visual recognition task (such as deciphering a distorted string of characters) and input the result into a text field. They thus screen out automated programs or “bots” from exploiting website weaknesses” (http://thehighlights.org/#captcha).

Again, I can’t help but think about the human labour of digitizing Google books (and how, despite attempts to efface it, it still sneaks its way into the final, digital product). Or about how the random results of tests meant to see if a computer user is human are now being compiled by humans, using computers, and printed on demand in physical book form. Or about humans dressed up as physical representations of binary code and tagging the city itself with binary messages that humans, not computers, will process and decipher. Or 3D digital renditions of insects, created in part, as a way “of documenting what we are about to lose” (Wheeler in Olsen, 2015). So, what have we lost? What is being reclaimed?

 

Bibliography:

Goldsmith, Kenneth. (2013). The artful accidents of google books. The New Yorker. http://www.newyorker.com/books/page-turner/the-artful-accidents-of-google-books.

Olsen, Erik. (2015). Museum specimens find new life online. The New York Times.  http://www.nytimes.com/2015/10/20/science/putting-museums-samples-of-life-on-the-internet.html.

Olsen, Erik. (2015). Digitizing natural history. The New York Times. http://www.nytimes.com/video/science/100000003978105/digitizing-natural-history.html?smid=pl-share.

Shaykin, Benjamin. (2009). Google hands. http://benjaminshaykin.com/Google-Hands.

Soulellis, Paul. Library of the printed web. http://libraryoftheprintedweb.tumblr.com/.

Sperberg-McQueen, C.M. (1991). Text in the electronic age: Textual study and text encoding, with examples from medieval texts. Literary and Linguistic Computing, 6 (1), 34-46.

Stanza (2013). The binary graffiti club. http://stanza.co.uk/binary_club/index.html.

The highlights. http://thehighlights.org/#captcha.

ZooSphere. http://www.zoosphere.net/.