what it's oops: Google

Showing posts with label Google. Show all posts

Sunday, January 16, 2011

Google's Poetry Translation Software

The more my workload increases, the more I find myself dreaming of books, writing, reading, blogging, immersing myself in works of imaginative writing. But there simply is not enough time. Such is the irony of my life these days, but one result is the scarcity of posts here. I don't want to quit blogging, but sometimes I fear it's too difficult to keep it up. Ah well--it's a new year, so I'll keep trying.

***

Desparapluies, one of my brilliant former undergraduate students, works at Google (I think she's still there!), and I was thinking of her and her honors project, and of the countless works of undergraduate and graduate works I've read, as well as of the vast body of literature out there, including my own modest contributions, that would pose challenges to Google's new Poetry Translation software. Poetry, even the seemingly simplest of it, gives many readers a mental workout, so you need not extrapolate too wildly to consider how difficult it remains for artificial intelligence.

But why? Poetic language in almost every language has traditionally involved prosody, figuration, rhetorical devices, rhyme and other sonic devices, allusions and symbolic registers rooted in the language and culture in which it was produced, and the overall and often intricate interplay between all of these elements, in part because it arose out of orality, for which all of these aspects of poetry are required, and while computers have been increasingly able to perform extraordinary complex intellectual tasks, including readable, often idiomatic translation of prose, poetry and poetic language entails many more potentially insurmountable hurdles. Even the idea of paraphrasing poetry, whether in translation or not, can present difficulties; what, for example, is the paraphrase--or, to put it another way, a précis or simple meaning rendered in prose--of Stéphane Mallarmé's famous poem, "Ses purs ongles très haut....," a sonnet most likely remembered for its dazzling use of the teleuton "-yx"?

Google software engineer Dmitriy Genzel and his team presented a paper at the Empirical Methods in Natural Language Processing (EMNLP) conference at MIT this past October, in which they focused on the "purely technical challenges around generating translations with fixed rhyme and meter schemes." Part of the team's debate has centered on the importance of preserving form and meter in translating poetry, and in his blog post Genzel cites Vladimir Nabokov arguments about the impossibility of maintaining such features, while approvingly noting computer scientist Douglas Hofstadter's arguments on behalf of trying to do so. As anyone who has read my many poetry translations on here or elsewhere knows, I agree wholeheartedly with Hofstadter.

Genzel continues in his Google post:

A Statistical Machine Translation system, like Google Translate, typically performs translations by searching through a multitude of possible translations, guided by a statistical model of accuracy. However, to translate poetry, we not only considered translation accuracy, but meter and rhyming schemes as well. In our paper we describe in more detail how we altered our translation model, but in general we chose to sacrifice a little of the translation’s accuracy to get the poetic form right.

One interesting thing to consider here is their belief in a baseline fidelity in terms of the "translation's accuracy"; one thing most translators of poetry in particular recognize, following in the wake of theorists like Walter Benjamin, is that the possiblity of perfect accuracy is an impossibility, that we can never completely capture all the nuances of the source language or recapture an Ursprache in which both languages would be equal. Something is always lost and something else is gained in the process of carrying something across. For poetic language, this raises a host of questions and issues which to which some eminent scholars and translators like Lawrence Venuti, for example, or my colleague Reg Gibbons, have devoted careers, but I will just say that in the case of some poems "translation accuracy," which is to say, semantic accuracy and fidelity, may in some cases be less important that other elements of the poem, such as rhythm, feeling, figuration, and so forth.

That said, I think this Google project is incredibly important, particularly because of its potential effect on translation software in general. As I heard NPR Science Consultant Robert Krulwich noting today on All Things Considered, since so much online material is now no longer just in English, accurate translation software, especially of the kind that can minutely and subtly parse a range of languages, will open up even more material to readers all over the world, and that includes we (primary) Anglophones. It will probably not eliminate the need for those devoted to the translation of literature, however; I can think of a host of works of literary fiction off the top of my head, not all of them formally experimental, that would give the best translation software out today a run for its money. But in the future, who knows?

One last point about the Google poetry translator that will prove a useful tool, I imagine, for poets and others interested in digital and electronic poetries and natural language processing:

As a pleasant side-effect, the system is also able to translate anything into poetry, allowing us to specify the genre (say, limericks or haikus), or letting the system pick the one it thinks fits best.

This entire blog entry could thus become a poem, and in Urdu or Chinese, with the click of a few buttons. In a year or two, that is.

Tuesday, December 21, 2010

Darnton's The Library: 3 Jeremiads + Brathwaite's Elegguas + National Book Foundation's New Reading Prize

Last spring I checked out from the university's library the esteemed Enlightenment historian and (Harvard University) librarian Robert Darnton's The Case for Books: Past, Present, and Future (New York: PublicAffairs, 2009) to gauge his arguments about the present and future state of the world of books and literature for my own edification and to preview it for a class. Darnton, one of the most important figures in his fields, has a gift for subtle argumentation and narration, and I ended up skimming the book, which replicated in longer and more polished form a number of the essays he has been publishing along these same lines in the New York Review of Books, for the last several years. Many concern the role of the computer behemoth Google, and its relationship to the publishing and library worlds, and he has also made a passionate case in the pages of the NYR for a national (which would also be an international) digital library, drawing from the resources of private libraries like the one he heads, public ones like the unmatched Library of Congress, and the trove of 7 million and counting books that Google has already scanned in, with the cooperation of institutions like Harvard and the New York Public Library, but also against the wishes of some publishers and authors, who successfully prosecuted a lawsuit to gain compensation from Google for copyright infringement.

In the current issue of the NYR, in "The Library: Three Jeremiads," Darnton returns to the arguments he has made before, but this time with a trio of "jeremiads," as he calls them, concerning three pressing economic and resource-related issues that American research libraries face which also negatively affect scholarly publishing; universities and college library collections, along with those of public libraries; library patrons, which is to say, readers; and, to a degree not yet fully understood, the humanities, intellectual life, and knowledge production themselves. The first two of Darnton's jeremiad's focus on the exorbitant cost and terms, verging on extortion, of subscriptions to scholarly journals, especially in the sciences, relative to other kinds of texts, which has forced libraries to cut their purchase of scholarly monographs, thus harming libraries' budgets and university presses' bottom lines. Over the longer haul, this economic problem, juxtaposed with constrained university and research library budgets, threatens the sustainability of the academic research enterprise as a whole. To give a sense of the astronomical prices charged by some publishers, information about which many professors are completely unaware and which have far exceeded the cost of inflation, the chemistry journal Tetrahedron costs $39,082 per year, while The Journal of Comparative Neurology costs $27,465 per year, and both, like many journals from a given publisher, must be purchased in bundles, with high kill fees to end subscriptions for specific journals and so forth. Humanities and social science journals total less per year but are still high and part of this system, with the result that the average cost in 2009 of a US journal title was $2,031 and $4,753 for a non-US title, and that year the journal publishing giant Elsevier made $1.1 billion in profits. Moreover, there is little transparency in this system, according to Darnton, giving the journal publishers an advantage over libraries, which, for the sake of the scholars they serve, cannot opt out.

Scholars and librarians have attempted to respond, with mixed reuslts. In the case of the Mellon Foundation-funded Gutenberg-e program, which sought to publish digital monographs of award-winning PhD dissertations in scholarly areas under greatest threat, the potential was great but it did not work out as planned, and the project is now defunct; in the case of digital, open-access scientific journals, there has been some success after scientists at University of California-Berkeley and Stanford circulated a petition in 2001 calling for colleagues to patronize only these journals. The publisher BioMed Central, according to Darnton, has shown since 1999 that this model can work. But the larger question of the effects on libraries and particularly on the humanities and social sciences remains. Darnton had been holding out hope for the Compact for Open-Access Publishing Equity (COPE), founded this year, would lead universities towards the open-access model in terms of publishing, and also subsidize authors who could not get grants or subvention money from their home institutions, with the texts ultimately available in both digital and print form via the Espresso Book Machine, about which I've written on here. But, and this is the core of his third jeremiad, there loometh Google.

A 368-page "settlement" between Google and the authors and publishers who sued the company (the publisher of my first book was party to this agreement, as Annotations, I gather, was scanned without permission) divided up the profits produced by Google Book Search in a 1/3 fashion: Google would 37 percent and the authors and publishers would get 2/3rds. Fine. But, as a result of this, Google has proposed that libraries, some of which (like Harvard's) provided books for scanning free of charge, now pay a subscription fee to access Google's vast digital storehouse, which is now the largest digital library (and as recent announcements have shown, potentially the largest digital book retailer). Darnton's fear, quite reasonable given the history of such things, is that "cocaine pricing" will occur, which is to say, Google will start out with low subscription fees and then jack them up to unspeakable--unaffordable rates--once it has libraries and everyone else in its clutches.

Of course most people are completely unaware of all of this, both in terms of what's going on now and what could occur in the future. As he has in the past, Darnton is proposing a counterweight to Google, which is a National Digital Library, which would draw primarily upon the extraordinary collection of books, particularly those no longer under copyright or still in copyright but out of print, whose authors cannot be located, and so forth, belonging to the Library of Congress, but also from other vast library systems, like Harvard's. Darnton points out that in December 2009, French President Nicolas Sarkozy announced that he would set aside €750 million (roughly $900 million dollars, correct?) to digitize France's "cultural 'patrimony,'" and notes that the national libraries of the Netherlands, Japan, Australia, Norway, and Finland are digitizing their complete collections, and that European nations in collective fashion will have digitized over 10 million texts, from libraries, archives, museums, and audiovisual stocks, by the end of 2010. Darnton believes that Google has shown that for less than the cost Sarkozy appropriated, it is possible to digitize the Library of Congress's complete holdings, a good deal of which are already converted, but that Google itself might be persuaded to share--for free, with a great deal of praise--the 2 million or so materials in the public domain it digitized. Even if Google does not participate, Darnton believes private foundations might be able to underwrite this project, especially if its costs were spread out over time, but he does not believe that a Digital Library of America would solve or resolve the interrelated and waxing crises research libraries, the scholarly profession, and journal publication face. Rather, this vast digital storehouse, freely available to all, might change the "ecology" (back?) toward the idea of the public good, or public common, but even if it didn't do so completely, it would be an important start.

===

Speaking of books and reading, I just noticed the other day that Kamau Brathwaite has published a new book of poems, Elegguas (Wesleyan University Press/UPNE, 2010). Wesleyan's site says of the book

Elegguas—a play on “elegy” and “Eleggua,” the Yoruba deity of the threshold, doorway, and crossroad—is a collection of poems for the departed. Modernist and post-modernist in inspiration, Elegguas draws together traditions of speaking with the dead, from Rilke’s Duino Elegies to the Jamaican kumina practice of bringing down spirits of the dead to briefly inhabit the bodies of the faithful, so that the ancestors may provide spiritual assistance and advice to those here on earth. The book is also profoundly political, including elegies for assassinated revolutionaries like in the masterful “Poem for Walter Rodney.”

Throughout his poetry, Brathwaite foregrounds “nation-language,” that difference in syntax, in rhythm, and timbre that is most closely allied to the African experience in the Caribbean, using the computer to explore the graphic rendition of nuances of language. Brathwaite experiments using his own Sycorax fonts, as well as deliberate misspellings (“calibanisms”) and deviations in punctuation. But this is never simple surface aesthetic, rather an expression of the turbulence (in history, in dream) depicted in the poems. This collection is a stunning follow-up to Brathwaite’s Born to Slow Horses (Wesleyan, 2005), winner of the Griffin International Poetry Prize.

Kamau is, as it also notes, one of the major poets of the 2nd half of the 20th century, and one of the leading lights in Caribbean, African Diasporic and Anglophone poetry, and I would add without hesitation one of the most important experimental and political poets alive today. This fall has brought a marvelous harvest of new books by marvelous poets, and this appears as if it surely is among this bounty.

===

Speaking of more books and reading, the National Book Foundation is sponsoring an Innovations in Reading Prize. For whom and what is this?

For individuals, institutions, and collaborative programs using innovative approaches to successfully inspire a lifelong love of reading
2011 Innovations due date

POSTMARK DEADLINE: FEBRUARY 22, 2011

The complete application process is available in the Application Form.

PDF Application form to be filled out by hand and faxed or mailed to the Foundation. Download >
PDF Application form to be filled out on your computer using Adobe Acrobat and emailed to the Foundation. Download >

Each year, the National Book Foundation awards a number of prizes of up to $2,500 each to individuals and institutions--or partnerships between the two--that have developed innovative means of creating and sustaining a lifelong love of reading. In addition to promoting the best of American literature through the National Book Awards, the Foundation also seeks to expand the audience for literature in America. Through the Innovations in Reading Prizes, those individuals and institutions that use particularly innovative methods to generate excitement and a passionate engagement with books and literature will be rewarded for their creativity and leadership.

Questions? Contact the Foundation at 212.685.0261.

Monday, May 17, 2010

Celebrity Predictive Text

OK, fine. Since yesterday's post was kind of lame, I'll do two today to make up for it. And don't get me wrong, today isn't a much better news day than yesterday was. But whenst one is bored, there is always a little bit of amusement to be had through a little Googling. Today, I played around with Google's predictive text results with celebrity inquiries. Odd, I tell you. Very odd.

Just putting in a celebrity name will yield you some predictive text results, but they're not always that exciting. (Granted, none of this is really "exciting", but I've gotta build this stuff up somehow.) You have to pretend to want to inquire something about the celebrity. The use of "was" or "is" or "hates" or "likes" seems to help. Take, for example, the latest teen sensation, a one Justin Bieber. Do a search on "Justin Bieber is" and Google helps you out with ten suggestions of what it thinks you are most likely looking for. And in this instance, the most likely things that you're searching for about Justin Bieber are, in order of obvious importance, "Justin Bieber is....a fag....dead....bi....a jerk....a girl....a tool.....". Hard to believe from that sampling that he is just about the hottest thing out there right now. Hard. To. Believe. (A girl. Heh heh.)

If you were wondering about things that "Ellen DeGeneres has", you're only going to get three predictive answers. But the most common one is about her...big ears? That's the most common? Not that she has a talk show? Has a wife? Has been putting us to sleep with her less than amusing stint on American Idol? No, you people want to know about her ears so much you're at the point of Googling them? You people need to get out more.

What about that dear, sweet Betty White? That lovely 88-1/2 year old woman who hosted the funniest Saturday Night Live last week? Is there anything she doesn't like? Apparently, and it's related to you. That's right. "Betty White hates....your grandmother". But we don't know why. Hmmm.

Looking up stuff on Jay Z brought up results I wasn't expecting. He seems like an all right guy. Why, then, is the most predicted result when searching for "Jay Z is..." comes up "devil worshipper"? Really? It also says that he is not only "a freemason", but "a master mason" as well. It also covers the bases if you were thinking he was "the devil" and/or "the antichrist". Whatever the deal is with Jay Z, if he's only one quarter of those things, he's a pretty busy guy.

I was surprised when I read that "Jay Z is the antichrist" because I had been thinking all along that Barack Obama was the antichrist. Nope. Turns out "Barack Obama is your new bicycle". Wait. He's what now? Look, I even clicked on it and I still don't get it. But I was really happy to see that "Barack Obama is a muslim" was down to Number Six on the list, so I feel like there's really been some progress made here. Somewhere.

If you want to know what "Britney Spears was", you'll learn that "Britney Spears was upset unicorns". Um, what now? Unicorns? She was? Apparently, she was.

Now, if you want to know what "Britney Spears is", you'll learn that "Britney Spears is a three headed alien." Of course she is. Wait. What now?

If I were to guess at what the results would be for Googling what "Oprah has", I would guess "a lot of money" would be up there somewhere. "A talk show" might make the list. "Slept with Gayle King" ~~would~~ might even make an appearance. I was not expecting to learn that "Oprah has six toes". That would explain why "Oprah has eleven toes" is further down on the list, but it still really isn't explaining the most basic of all questions "Who are you people who are asking about Oprah's toes?!"

As you can imagine, The Google is not kind about Sarah Palin. Behold!

Yeah, like you were expecting any different.