Sunday, August 11, 2013

National Archives: Customs Records 1841-1922

I'm doing a project on movement between Sydney and San Francisco during the Californian Gold Rush.
Earlier I did a post on One, but many that won the 2013 GovHack competition using the National Archives’ Passenger Arrivals Index (PAI). The Passengers Arrival list is for Perth post 1920.
There is a list of NAA Fact Sheets here.
The best one (for me) was
Customs shipping records held in Sydney – Fact sheet 65.
Especially Registers of ships, arrivals and departures, Sydney (arranged chronologically)1841–1922, (Series Number SP729/1).
Description of series:
17 bound registers, generally, 19" x 15", titled "Arrivals & Departures", being a chronological register of ships arrivals and departures to and from Port Jackson, Sydney, for the period named. On each open page Arrivals are entered on left hand side and Departures on right hand side. The first volume is a variation as Departures start at one end of register and Arrivals at the other.  Entries in the first register are under Date, Name of ship, Captns. Name, Wherefrom, Remarks (eg. cargo, convicts, emigrants, troops, whaling, etc.), and Burthen (tonnage).  Printed register details (1917-22) are as follows: Date, Name of Vessel, Description of Vessel (eg. Steam, Motor, Sail), Master's Name, Tonnage, No. of Crew, No. of Passengers (Male, Female, and Children), Total No. of Passengers, and in case of "Arrivals" column: "Whence from" and "Departures" "Where bound".
History Prior/Subsequent to Transfer:
Prior to transfer: These registers appear to the forerunner of the Clearing Clerk's Ship's Register, Series 5 and 6. The difference from about 1850 onwards is that the general registers of arrivals and departures are chronological and are chronological to the present, whereas, the Jerquer copy has always been an alphabetical type of index register, chronological under each letter of the alphabet.
A more thorough history of these records is being compiled with the aid of the State Archives Office of NSW and will be issued as a supplement to this accession.
Subsequent to transfer: This series was originally transferred to archival custody in August 1952 as SP70, Series 1 and 2, and has been used quite often for reference. With the microfilming of the Passenger Lists in 1964 it was decided to consolidate all Ships' Registers, Inward and Outward, to make for easier reference by officers in Archives, other Commonwealth Departments, and the general public.
NOTE: On 2 February 1967 Mr. Russel Doust, Archivist, Archives Office of NSW intimated that he had located some 1850's vintage Registers at the Maritime Services Board, Sydney, and would follow-up and advise us further. They were Arrivals and Departures Registers and may well fill in the gaps in this accession.

The Sydney Office is located at
National Archives of Australia,
120 Miller Road,
9am–4.30pm Wednesdays, Thursdays and Fridays
(02) 9645 0110

Tuesday, August 6, 2013

Interesting times for literary theory.

Interesting times for literary theory.

A couple of weeks ago, after reading abstracts from DH2013, I said that the take-away for me was that “literary theory is about to get interesting again” – subtweeting the course of history in a way that I guess I ought to explain.

In the twentieth century, “literary theory” was often a name for the sparks that flew when literary scholars pushed back against challenges from social science. Theory became part of the academic study of literature around 1900, when the comparative study of folklore seemed to reveal coherent patterns in national literatures that scholars had previously treated separately. Schools like the University of Chicago hired “Professors of Literary Theory” to explore the controversial possibility of generalization.* Later in the century, structural linguistics posed an analogous challenge, claiming to glimpse an organizing pattern in language that literary scholars sought to appropriate and/or deconstruct. Once again, sparks flew.

I think literary scholars are about to face a similarly productive challenge from the discipline of machine learning — a subfield of computer science that studies learning as a problem of generalization from limited evidence. The discipline has made practical contributions to commercial IT, but it’s an epistemological method founded on statistics more than it is a collection of specific tools, and it tends to be intellectually adventurous: lately, researchers are trying to model concepts like “character” (pdf) and “gender,” citing Judith Butler in the process (pdf).

At DH2013 and elsewhere, I see promising signs that literary scholars are gearing up to reply. In some cases we’re applying methods of machine learning to new problems; in some cases we’re borrowing the discipline’s broader underlying concepts (e.g. the notion of a “generative model”); in some cases we’re grappling skeptically with its premises. (There are also, of course, significant collaborations between scholars in both fields.)

This could be the beginning of a beautiful friendship. I realize a marriage between machine learning and literary theory sounds implausible: people who enjoy one of these things are pretty likely to believe the other is fraudulent and evil.** But after reading through a couple of ML textbooks,*** I’m convinced that literary theorists and computer scientists wrestle with similar problems, in ways that are at least loosely congruent. Neither field is interested in the mere accumulation of data; both are interested in understanding the way we think and the kinds of patterns we recognize in language. Both fields are interested in problems that lack a single correct answer, and have to be mapped in shades of gray (ML calls these shades “probability”). Both disciplines are preoccupied with the danger of overgeneralization (literary theorists call this “essentialism”; computer scientists call it “overfitting”). Instead of saying “every interpretation is based on some previous assumption,” computer scientists say “every model depends on some prior probability,” but there’s really a similar kind of self-scrutiny involved.

It’s already clear that machine learning algorithms (like topic modeling) can be useful tools for humanists. But I think I glimpse an even more productive conversation taking shape, where instead of borrowing fully-formed “tools,” humanists borrow the statistical language of ML to think rigorously about different kinds of uncertainty, and return the favor by exposing the discipline to boundary cases that challenge its methods.

Won’t quantitative models of phenomena like plot and genre simplify literature by flattening out individual variation? Sure. But the same thing could be said about Freud and Lévi-Strauss. When scientists (or social scientists) write about literature they tend to produce models that literary scholars find overly general. Which doesn’t prevent those models from advancing theoretical reflection on literature! I think humanists, conversely, can warn scientists away from blind alleys by reminding them that concepts like “gender” and “genre” are historically unstable. If you assume words like that have a single meaning, you’re already overfitting your model.

Of course, if literary theory and computer science do have a conversation, a large part of the conversation is going to be a meta-debate about what the conversation can or can’t achieve. And perhaps, in the end, there will be limits to the congruence of these disciplines. Alan Liu’s recent essay in PMLA pushes against the notion that learning algorithms can be analogous to human interpretation, suggesting that statistical models become meaningful only through the inclusion of human “seed concepts.” I’m not certain how deep this particular disagreement goes, because I think machine learning researchers would actually agree with Liu that statistical modeling never starts from a tabula rasa. Even “unsupervised” algorithms have priors. More importantly, human beings have to decide what kind of model is appropriate for a given problem: machine learning aims to extend our leverage over large volumes of data, not to take us out of the hermeneutic circle altogether.

But as Liu’s essay demonstrates, this is going to be a lively, deeply theorized conversation even where it turns out that literary theory and computer science have fundamental differences. These disciplines are clearly thinking about similar questions: Liu is right to recognize that unsupervised learning, for instance, raises hermeneutic questions of a kind that are familiar to literary theorists. If our disciplines really approach similar questions in incompatible ways, it will be a matter of some importance to understand why.

* <plug> For more on “literary theory” in the early twentieth century, see the fourth chapter of Why Literary Periods Mattered: Historical Contrast and the Prestige of English Studies (2013, hot off the press). The book has a lovely cover, but unfortunately has nothing to do with machine learning. </plug>

** This post grows out of a conversation I had with Eleanor Courtemanche, in which I tried to convince her that machine learning doesn’t just reproduce the biases you bring to it.
*** Practically, I usually rely on Data Mining: Practical Machine Learning Tools and Techniques (Ian Witten, Eibe Frank, Mark Hall), but to understand the deeper logic of the field I’ve been reading Machine Learning: A Probabilistic Perspective (Kevin P. Murphy). Literary theorists may appreciate Murphy’s remark that wealth has a long-tailed distribution, “especially in plutocracies such as the USA” (43).

Thursday, August 1, 2013

Linked Jazz

About Linked Jazz from Linked Jazz on Vimeo.

Market Assessment of Public Sector Information

A report by Deloitte for the Department for Business, Innovation and Skills UK, May 2013.

Upon the release of road construction project data by the Department of Transport in Edmonton, Canada, a local application developer decided to create a mobile app for smart phones and similar devices to access the map interface.

The Deloitte Report cited a report by the Centre for Technology in Government, at the University of Albany (State University of New York (SUNY) called The Dynamics of Opening Government Data: A White Paper.

The story of opening street construction projects data began several years prior to its official launch in April 2012. In 2009, the City of Edmonton, a recognized leader in open data initiatives, made a commitment to using “technology to make municipal information more open, transparent and accessible” through the launch of an Open Data Catalogue. The Office of the Chief Information Officer (OCIO) staff were responsible for working with the City’s major agencies— as the business owners of most of the City’s information assets—to identify data that were good candidates for inclusion in the Catalogue.

10 insights for humanities researchers from #ACHRC 2013

This is a re-post from 

10 insights for humanities researchers from #ACHRC 2013

#ACHRC 2013: 2 days, 1 keynote, 6 panels.
Between 8-9 July, researchers converged on the University of Western Australia for the annual meeting of the Australian Consortium of Humanities Research Centres (ACHRC).

Humanists came from range of disciplines and institutions: university research centres, collecting institutions, advocacy groups, media, university administration and funding bodies. The diversity of attendees resulted in a range of perspectives and energetic debate.

All were united by a belief that, in a world facing pressing global challenges, we need the humanities now more than ever. In this context, it seemed appropriate that the keynote was delivered by Alan Liu, a digital humanities innovator and passionate advocate of the humanities.

Here are ten themes that emerged over an inspiring few days:

1. We need to learn how to tell compelling stories about the humanities.
We’re not doing enough to communicate the value of what we do. Alan Liu’s keynote stressed the importance of humanities advocacy and outlined the creation - a “conceptual laboratory” that provides a space for humanists to tell their stories. Digital storytelling outcomes include:
  • Infographic Friday - visualisations that express the impact of the humanities using statistics
  • Humanities Backpack  - mini documentaries that bring the research process to life
  • Humanities Showcase - online portal that allows public audiences to browse a gallery of research case studies

For Liu, public engagement starts with an articulation of fundamental principles which should be long-term, structural and local/global. What do the humanities stand for? Frustrated by the strategic plan outlined by your research institution? Liu and his colleagues at the University of California were. So they sat down and outlined their own vision.

To communicate values you need to know what they are. Several key outcomes of the 4Humanities project have been value finding exercises such as What everyone says about the humanities and Humanities plain and simple, which challenges contributors to outline why the humanities matter in plain language. Techniques include focus groups, crowd sourcing, and text mining values statements. For a best practice example, Liu pointed to a recently released report and video published by the American Academy of Arts and Sciences, “The Heart of the Matter: Commission of the Humanities and Social Sciences.”

2. And communicate these stories to our audiences more effectively.

Deakin University: Research My World
Engaging with public audiences is not about meeting research impact funding obligations; it’s a social responsibility. Andrea Whitcomb encouraged researchers to resist the notion that academic work is outside society and to develop research projects that emerge from problems and issues outside of academic debate.

Alan Liu outlined a framework for public engagementthat involved (1) articulating a core message, (2) designing a communications plan (involving the selection of spokespersons - not necessarily the researcher, media channels, and specific media forms and genres) and (3) communicating this message to a specific target audience (such as fellow academics, practitioners or local communities). Researchers should seek to emulate organisations skilled in communicating for social change, such as the Occupy Movement, and Amnesty International. He suggested researchers may like to read Jennifer Earl and Katrina Kimport’s recent book, Digitally Enabled Social Change, Activism in the Internet Age.

Deb Verhoeven felt it was important that researchers feed data and results back to communities. She described a recent project that engaged communities in the research process from the outset using crowd funding. Not only did the Deakin University Pozzible Campaign, Research My World, raise over $20,000 in research funding, it helped build the online profiles of the early career researchers who participated.

Jane Davidson, from the ARC Centre of Excellence for the History of Emotions argued that research impact should include creative outcomes as well as traditional research outputs and outlined recent collaborations with ABC radio, schools and the arts. 

Andrew Jaspan, Editor of The Conversation, outlined his vision to provide a new business model for news media. The Conversation has 1.3 million unique views per month, with 80% of the audience outside of academia. An Australian success story, over 60% of academic contributors are followed up by other mass media outlets and the recent UK launch will be followed by launches in USA and India. Subsequently, Linguist Alan Dench reminded researchers that, while the written word has primacy in academia, spoken language remains a primary form of communication and is innate to human beings. He urged attendees to speak on the radio and encourage students to practice communicating their research verbally by participating in Three Minute Thesis competitions.

3. But often, we don’t really understand who our audience is.

Dev Verhoeven reminded researchers that we are not communicating to a homogenous group of people called “the public”.  Powerful storytelling necessitates a genuine understanding of our audience/s and these insights are the bedrock of any communication. Not only will this information assist researchers in crafting an impactful message, this kind of focus can assist in making decisions as to the most effective communication channels - vital when time and money are in short supply. Know your audience and design communications to resonate with specific concerns.  Then choose the most appropriate communication channel to reach each group.  Provide different touch points that allow people to become involved at various levels. For example, some may watch a video or share information on social media, others might like to make a donation or even examine research data for themselves.

4. It's not enough to start a discussion, we need to change the conversation.

We need to advocate the humanities on our own terms.  Christina Parolin, of the Australian Academy of the Humanities, argued that the term "benefit" captures the intention of research assessment better than "impact". Indeed, Alan Liu prefers the term “discovery” rather than words such as “invention”, and “innovation” as he believes it most aptly expresses the range of human meanings and possibilities associated with technological breakthroughs.
Liu suggested that researchers explore George Lakoff’s strategic frame analysis, a science based communications methodology that draws from linguistics, psychology, anthropology, political science and communications theory. If you're interested in Lakoff's work you can explore his eWorkshop Changing the public conversation on social problems: a beginners guide to strategic frame analysis  or read his article Framing 101: How to Take Back Public Discourse.

5. Continue to question the logic of measurement indicators.

Carmen Lawrence agreed that accountability for publicly funded research projects was important but compelled the audience to keep asking the hard questions. Why are you asking us to measure this? What is the likely benefit to the academy/society? How many dollars does it take to get one dollar on the desk of a researcher? Christina Parolin agreed that data collection was an issue, citing the administrative burden and cost of preparing case studies, the risk of over engineering the effort to measure impact, and the focus on demonstrating and communicating rather than making the research more beneficial.

Lawrence noted the private sector origins of the current obsession with measurement, citing the mantra "If you can't measure it, you can't manage it".  She compared dogmatic approaches to measurement with religious devotion stating "The fact that something is hard to measure does not mean that it is not real or important". Indeed she argued that the more any social indicator is used for social decision making, the more it will corrupt the very policies it aims to manage, quoting Goodheart’s Law which states "When a measure becomes a target, it ceases to be a good measure".

6. Form richer, mutually beneficial collaborations. 

We’re stronger together. Humanists may have different goals but we tend to share a similar vision. Richard Neville, outlined how the State Library of NSW wants to "be a place where conversations happen, either on site or online", an objective which echoed several previous discussions. Yet collaboration between different research centres, disciplines and collecting institutions poses challenges.  Neville recognised opportunities to share staff, knowledge and experience based on mutual understanding and frank discussions about expectations, constraints and opportunities. Alec Coles, CEO of the Western Australian Museum, reiterated some of these concerns but was optimistic about ways in which partnerships could add immense value.

7. Think global. 

Is the global turn in higher education merely economic opportunism or a genuine desire to train and educate graduates to become global citizens? Masashi Haneda, Vice President (International) of the University of Tokyo, argued that global issues such as climate change, food and water crisis, aging societies, bioethics, security and migrations, necessitated the development of a new global citizen and that the humanities should play a leading role. Krishna Sen, Dean of the UWA Faculty of Arts, was more cautious about the idea of “Global Humanities” arguing the centrality of difference and relativism to the humanities. Yet, the discourse surrounding the Asian century highlights the importance of alternative modes of understanding. According to Krishna, "We are the miners of ideas and we are the real miners of the Asian century".

8. Embrace new technologies. 

Marvin eBook reader - iPad app
Alan Liu showcased some extraordinary new digital humanities projects such as Open Journal Systems(OJS), the Journal Author Name Estimator (Jane) andMarvin – an iPad app that automatically generates abstracts from ebooks.  Other innovations includedJoVE, a Peer Reviewed Scientific Video Journal andWikipedia Journal Articles. Liu’s visions for the future included a machine learning approach for identifying thesis and conclusion statements in student essays, and a computer program that could support academics in public outreach by automatically generating draft tweets and blog posts based on "business as usual" teaching and research activities.

According to Lui, Research Centres should act as “Think Tanks” for the Humanities.
  • Taking a long term, strategic approach and contributing to public policy
  • Becoming hubs for digital humanities by investing in state-of-the-art infrastructure (such as video conferencing), hosting workshops and conferences and offering support with communications and public outreach, websites and databases, and legal/IP issues.
  •  Lobbying for institutional acknowledgment for blog posts and public outreach activity, referencing the recently published (and must-read) Young Researchers in Digital Humanities: A Manifesto. Carmen Lawrence also recognised that the immense pressure to publish and lack of reward for outreach activity discourages community engagement, stating "We are giving young researchers the wrong message".
  • Supporting early career researchers by recognising that there are not enough academic jobs for graduates and helping to develop alt-ac career paths.  Kate Darian-Smith, of the University of Melbourne, also shared examples of how her team are trying to develop new career opportunities and forms of PhD supervision.

9. But be realistic about potential pitfalls and challenges. 

Who is going to pay for it all? Several speakers recognised the logics of current funding systems provided considerable barriers to making such grand visions a reality. An ACHRC survey of 180 Australian Research Centres led by Tully Barnett revealed that many operate with minimal administrative support - yet this is vital for tracking research impact, planning events and public outreach activity.  Training and developing new skills is essential for researchers but this has limits. It it is inefficient to stretch competencies too wide. Research Centres should empower humanists to do what they do best and free their time for research as much as possible.

Toby Burrows, Director of eResearch at UWA, urged attendees to be aware of “The Nasties” when planning websites and databases. Blogs, social media and open source platforms are not “free”. They involve a long term commitment of time and effort, as well as  hard costs involved with backing up data, servers and web hosting. Should you seek open source, commercial or bespoke IT solutions? Burrows outlined pros and cons of working within IT systems or going it alone.

10. And stop complaining.  
Robert Phiddian, Director of ACHRC, didn’t mince words in his closing remarks, stating “It’s our own bloody fault” for taking the brace position rather than getting on the front foot. The meeting ended on an optimistic note, with a challenge to experiment, innovate and think differently about the ways in which we undertake and communicate humanities research.

There is much to be done but also much that can be don

Open public sector information: from principles to practice

The report on the state of open data in Australia by the Information Commissioner Open public sector information: from principles to practice was published in February 2013.

The report details the results of a survey conducted by the Office of the Australian Information Commissioner (OAIC) on how 191 Australian Government agencies manage PSI. The survey was structured around the eight Principles on open public sector information (Open PSI principles) that were published by the OAIC in 2011.

The Open Data Policy of Queensland's Department of Transport and Main Roads can be found here.

The Updated Digital Economy

The Updated Digital Economy paper can be found here. Section 10 in the report explicitly talks about, open data (including opening up the G-NAF spatial data set) and big data. 

Governments are key collectors and producers of large amounts of data that, when released publicly for reuse, can be used in new and innovative ways. In the past, governments would charge a fee for access to this data, but there is increasing evidence that free access will bolster economic activity and efficiency.

‘Open data’ is the commitment that some organisations make to allow data to be freely available for reuse. Many groups and innovators are already ‘mashing up’ government data to create new stories, visualisations, resources and tools.

The results of the recent Public Sector Information Survey suggest that many government agencies possess data that could generate significant value if made available for reuse, including by private sector organisations.65 66 The Australian Government has made progress on this front. A recent report by the Office of the Australian Information Commissioner Open public sector information: from principles to practice, reported that government agencies are actively embracing an open access and proactive disclosure culture. The high response rate to the Public Sector Information Survey and the widespread and growing use of digital and web technologies to support a transformation is another sign of progress.67

Gov 2.0 - Recent Updates

See Craig Thomler's professional blog - eGovernment and Gov 2.0 thoughts and speculations from an Australian perspective
Pia Waugh writes
In other open data news, we have been chatting to open data people in governments all around Australia, from Federal agencies and departments, State/Territory Government representatives and a number of Local Governments. We are pulling together with them something of an informal report on the state of open data in Australia at the moment, as there is a lot happening and a lot planned. Below is a bit of a taste of the good work happening around Australia at the moment, and kudos to the excellent work of all involved!

One But Many

Winer of the 2013 GovHack contest.
The Brisbane-based group, Hack the Evening was awarded $1,000 for their project
One, but many which took information from the Archives’ Passenger Arrivals Index
(PAI) and combined it with historical statistics to create an application to help users
learn more about Australia’s migration history. 
The application used a virtual map to show migration departures from ports around the
world during the period 1921-1949; and by using additional high-band video footage,
Wikipedia information and historical data from the Australian Bureau of Statistics, the
group aimed to show what events might have influenced migration patterns, including
those to Australia.

A full report on GovHack can be found here.

A Full list of projects is available here.