The LUCERO Project » jiscEXPO

LUCERO extension

Mathieu — Wed, 23 Nov 2011 12:31:53 +0000

We have had quite a few nice new things happening in relation to LUCERO recently, including some updates of the code, initial work around aggregating data from multiple universities, a paper at a linked data workshop with people from several departments of the OU, presentations, etc. In other words, the work is continuing, and quite a lot more will be happening soon. In turns out indeed that we have not spent all of our budget and have some more time to spend on synthesising, factoring and making more directly reusable the work we have done as part of the project (don’t ask me how that happened…).

The idea here is therefore, starting from February 2012, to work on making our experience in LUCERO, in creating data.open.ac.uk, more directly accessible and reusable by other universities and colleges. The exciting bit about it is that, while we have been working mostly internally for the initial duration of the project, we will realise this new work in direct collaboration with two other universities: one that has already achieved a similar realisation as our own data.open.ac.uk (Southampton, working with Christopher Gutteridge), and one that is at the first, very initial steps of the process (Manchester, working with Sean Bechhofer).

More precisely, here is a quick description of the work, divided in workpackages:

WP1: Technical/Conceptual/Organisational process of deploying linked data in a University

The goal of this workpackage is to rely on our (joint) experience to describe and provide some guidelines regarding the different options related to the deployment, maintenance and sustainability of a linked data platform in a University. This includes in particular tasks such as the choice of vocabularies for data modelling, or the ways to establish links between internal and external datasets.

Deliverable: Report/Guidelines describing the concrete steps of deploying linked data in a university.

Duration: 12 days

WP2: Business case for linked data in universities

Nowadays, everything is driven by business cases and nothing happens with the direct approval and support from higher management. In this workpackage, we will compile a collection of common case studies demonstrating the benefits of linked data (whether it is to drive innovation, reduce cost of data management or create new entry points to the university’s online presence), and providing clear demonstrations of the business value of linked data.

Deliverables:A clearly illustrated, online collection of case studies with associated business cases for linked data in universities.

Duration: 12 days

WP3: Liaison with other universities involved in linked data

This workpackage contains the work related to the collaboration with other universities, including the organisation of face-to-face and online meetings, capturing their experiences and requirements, etc.

Deliverable: Meeting reports and descriptions of other universities’ linked data environments in comparison with data.open.ac.uk

Duration: 5 days

WP4: Dissemination and community portal

In this workpackage, we will rely on the experience in LUCERO in the use of blogs and twitter feeds to realise the dissemination of the results of the work. We will also make use and extend the LinkedUniversities.org portal to host the reports, guidelines and business cases produced as part of the work, and engage with the community around this documentation.

Deliverables:Extensions of the project blog, twitter feed and of the linkeduniversities.org portal

Duration: 5 days

Final Product Post: Tabloid

Mathieu — Fri, 01 Jul 2011 09:04:40 +0000

This is the final, formal post of the LUCERO JISC project. However, be reassured, this is far from the last post. More and more activities around linked data are happening at the Open University, and this blog will carry on being a primary channel for communication and discussions around these activities.

For this post, we had to chose one “product” of the project, which we believed was to be most useful and reusable by others. We have done so many things over the last year that choosing one was almost impossible. After a lot of discussion and head scratching, we decided to promote as a product our collection of tools, examples and documentions, explaining the why and how of linked data, as well as the benefit one can get from deploying linked data in a higher education institution. We call this toolkit Tabloid: Toolkit ABout Linked Open Institutional Data.

Users

To clarify very quickly, the intended target audience for the Tabloid Toolkit are not the end-users of linked data. We focus here on helping people in higher education institutions with getting involved in promoting, implementing and deploying linked data within their institution. This includes more or less anybody who would have a role to play in the management of data and information, from PVCs to researchers, librarians and developers.

Overview

Tabloid is an evolving toolkit made of code, documentation and examples in various places, and trying to address the people with various roles involved in the deployment of linked data: from managers who want to quickly understand the benefits, to developers who are expected to work with it, develop applications and integrate it into their technical workflow.

In this sense, Tabloid can be seen an entry point to institutional linked data, with different parts being relevant to different people at different times. It includes many components distributed in different ways, and put together in a coherent structure in the Tabloid Page. In particular, the toolkit contains documentations giving an overview of the basic principles of linked data, of the way it concretely creates benefits and of simple examples of how such benefits can be exploited in research and education scenarios (see What is linked data?). It provides an overview of both the technical and organisational workflows that are necessary to deploy linked data in an institution, and provide some tool support to realise common tasks in such workflows. Finally, Tabloid puts a particular emphasis on the aspect of using and consuming linked data, providing documentation and experience reports regarding the use of linked data. It includes many pointers to a large variety of applications developed within the LUCERO project, together with reusable source code.

Link: The Tabloid page

LUCERO blog up to 1st July 2011:

Many parts of the Tabloid toolkit described above have been drawn out or described in blog posts on the LUCERO Blog. Here we give a brief overview of the content of the blog according to (mostly emerging) categories of posts:

Publishing Datasets

One of the major activities in LUCERO is related to the exposure as linked data of a number of datasets from the Open University. The posts in this category explain and describe how we realised such exposure for a number of datasets.

PRONOM and Linked Data – This post is not actually about a dataset from LUCERO, but comments on one that was exposed by another organisation, based on the experience gained in LUCERO
Publishing OpenLearn Metadata as Linked Data – OpenLearn is the repository of Open Educational Material from the Open University. This post explains how we expose the related metadata as linked data.
Connecting the reading experience database to the Web of Data – This post describes how we exposed the content of the Reading Experience Database and built a demonstrator showing what can be achieved through connecting such data to external source.
Publishing ORO as Linked Data – This post explains how the ORO repository of publications was exposed as linked data on data.open.ac.uk.
First version of data.open.ac.uk – This post announced the release of the very first version of data.open.ac.uk with a small number of datasets.

Documentation and Support

The LUCERO blog is also used to provide easily accessible documentation regarding various aspects of the project. This category contains posts and pages that are intended to help people to better understand the principles and technologies related to linked data.

What is linked data? – This page is intended to give a general overview of the principles and benefits of linked data for readers of the blog unfamiliar with it.
data.open.ac.uk – This page gives an overview of data.open.ac.uk and of its relationship with the project.
Example Queries – This post show example SPARQL queries helping users to understand what can be done with SPARQL and the data.
Initial Overview of the LUCERO workflow – This post described the technical workflow applied in LUCERO to get from existing data sources to up-to-date linked data on data.open.ac.uk.

Tools and Applications

This category includes posts that describe tools and applications developed within the project. It is an important part of the activities in LUCERO, demonstrating through examples how one can benefit from linked data, and how to realise such applications.

What to ask Linked Data? – This describes a tool that extract from a dataset the questions it is particularly good at answering. A demo shows the tool applied on 3 datasets from data.open.ac.uk.
wayOU – Mobile Location Tracking App Using Linked Data – The application described in this post is a mobile app for OU users to track their location at the OU. It has won the “Best Demo Award” at the Extended Semantic Web Conference 2011.
ROLE Widget Consumes Linked Data – This guest post from a member of the ROLE project explains how linked data available on data.open.ac.uk was used to create a widget for the learning environment created by ROLE.
Results from the KMi Linked Data Application Competition – This post describes the 4 applications that were submitted to the linked data application development competition organised by LUCERO.
Connecting the reading experience database to the Web of Data – This post describes how we exposed the content of the Reading Experience Database and built a demonstrator showing what can be achieved through connecting such data to external source.

Experience report – Guest posts

One great success of LUCERO is that it has managed to get people outside the project and the linked data community to engage with linked data, create applications of it and generally use the linked data we exposed for a variety of tasks. The posts in this category show a few of such examples.

ROLE Widget Consumes Linked Data – This guest post from a member of the ROLE project explains how linked data available on data.open.ac.uk was used to create a widget for the learning environment created by ROLE.
Know Thyself – This post written by a member of the communication services of the Open University shows how the availability of linked data can be used to quickly answer unexpected queries that aggregate resources from various resources.
Putting Linked Data to Work: A Developer’s Perspective – This guest post written by a developer from the IT department of the Open University demonstrates how linked data can be used and integrated to write new and more cost effective applications, despite the initial confusion that linked data technologies often create.
Introducing LUCERO – This post summarises the effort realised at the beginning of the project to explain and discuss with a large variety of people the expected benefits of linked data.

Project Plan

The 7 first posts on the blog gave the details of the project plan.

Hello World – This un-categorised post summarised, at the very beginning of the project, our expectations and plans for LUCERO.

Description of the Project

Name: LUCERO – Linking University Content for Education and Research Online
URL: http://lucero-project.info
Dates: 1st June 2010 – 31st May 2011
Programme: JiscEXPO
Funding: £100,000
Team: see About page
Code License: LGPL
Data License: CC-BY
Content License: CC-BY
Code and documentation: LUCERO on Google Code

What to ask linked data

Mathieu — Fri, 24 Jun 2011 15:54:37 +0000

Publishing linked data is becoming easier, and we now come across new RDF datasets almost everyday. One question that keeps being asked however is “what can I do with it?” More or less everybody understand the general advantages of linked data, in terms data access, integration, mash-up, etc., but getting to know and use a particular dataset is far from trivial: “What does it say? What can I ask it?”

You can look at the ontology to get an idea of the data model used there, send a couple of SPARQL queries to `explore’ the data, look at example objects. etc. We also provide example SPARQL queries to help people getting the point of our datasets. Of course, not everybody is proficient enough in SPARQL, RDF-S and OWL to really get it using this sort of clues. Also, datasets might be heterogeneous in the representation of objects, in the distribution of values, or simply very big and broad.

To help people who don’t necessarily know/care about SPARQL `getting into’ a complex dataset, we developed a system (whatoask) that automatically extract a set of questions that a dataset is good at answering. The technical aspects of realising that are a tiny bit sophisticated (i.e., it uses formal concept analysis) and are detailed in a paper I will present next week at the K-CAP conference. What is interesting however is how such a technique can provide a navigation and querying interface of top of a linked dataset, providing a simple overview of the data and a way to drill down to particular areas of interest. In essence, it can be seen as an FAQ for a dataset, not presenting frequently asked questions, but the questions the dataset is specially good at answering.

What the tool does is creating a hierarchy of all the simple questions an RDF dataset can answer, and presents to the user a subset that, according to a set of metrics described in the paper, are believed to be more likely of interest. The questions are displayed in a pseudo natural language, in a format where for example “What are the (Person/*) that (knows Tom) and that (KMi hasEmployee)?” can be interpreted as the question “What are the people who know Tom and are employed in KMi?”. Questions can be selected, and displayed with their answers, and the question hierarchy can be navigated, selecting more specific and more general questions than the selected one.

To clarify what that means, let’s look at what it does on the data.open.ac.uk OpenLearn dataset. The initial screen shows a list of questions, the first one (“What are the (Document/*/OpenLearnUnit) that (subject Concept, relatesToCourse Course, relatesToCourse Module)?”, i.e., “What are the OpenLearn Units that are related to courses and have a topic?”) being selected. More general and more specific questions are also shown, such as “What are the OpenLearn Units that have a topic?” (more general) and “What are the OpenLearn Units that relate to a course and have for topic `Education Research’?” (more specific).

We can select alternative questions, such as the second in the list — “What are the OpenLearn Units in english distributed under a creative commons licence and that talk about Science?”, obtain a new list of answers (quite a few), as well as more general and more specific questions. We can then specialise the question to “What are the OpenLearn Unit in english under a CC licence that talk about science and family?” and carry-on with a more general question looking at the `family topic’ without science, to finally ask “What are the OpenLearn units about family?” (independently of the licence and language).

As can be seen from the example, the system is not meant for people who know in advance what they want to ask, but to provide a level of serendipitous navigation amongst the queries the dataset can answer, with the goal of giving a general overview of what the dataset is about and what it can be used for. The same demo is also available using the set of reading experiences from the RED dataset and the datasets regarding buildings and places at the OU. The interface is not the most straightforward at the moment, but we are thinking about ways by which the functionalities of the system could be integrated in a more compelling manner, as a basic `presentation’ layer on top of a linked dataset.

wayOU – mobile location tracking app using linked data

Mathieu — Mon, 23 May 2011 21:08:00 +0000

As can be seen from the few previous posts on this blog, one of our main focus at the moment is, in addition to trying to handle with all the data that we still have to process, to develop applications that demonstrate the benefit and the potential of linked data. When we obtained data from our estate department regarding the buildings and spaces in the Open University’s main campus (in Milton Keynes) and in the 13 regional centers, we got quite excited. The data contain details of the buildings and surroundings of the buildings (car parks, etc.) with their addresses, floors, spaces, images, etc.

However, these data was not very well connected. We used links to the postcode unit descriptions from the address to the ordnance survey dataset, giving us an general view on the locations of buildings (and so allowing us to build a very crude map of OU buildings in the UK), but we didn’t have precise locations of buildings. We also couldn’t relate the buildings with events (e.g., tutorials), people (through their workplace, attendance, etc.)

We therefore decided to build an application to not only use these data, but also create some of these missing relations, and specially, to allow OU users to connect to the data.

The application is called wayOU, for “where are you in the OU?”. It can be used to “check-in” at specific locations indicating the “reason” for attending these locations, to keep track of the different places where the user has been, declare the current location as his / her workplace, as well as to connect to their network at the Open University, in terms of the people they share activities with. The video below explains the principle of the application better than I can do with text.

The application is now being tested and is made available for download (see QR code below – without guaranty that it will actually work) on data.open.ac.uk. Fouad is going to demonstrate it next week at the Extended Semantic Web Conference next week (see the abstract of the demonstration), and (perhaps more importantly) the sources of this first release are available in our code base.

Publishing OpenLearn metadata as linked data

Mathieu — Thu, 21 Apr 2011 08:51:49 +0000

OpenLearn is a website giving free access to Open University course material. We especially look at the “LearningSpace” where hundreds of HTML documents, called OpenLearn Units, are made available. These units represent very valuable resources for students as they provide entry points into specific topics, useful in particular in deciding whether or not to enroll in a course on this topic. A lot of these units relate directly to specific courses as their content is obtained from the corresponding course material. Being able to query and use such metadata in connection with other sources of information can be very useful in applications supporting students in the discovery of learning resources, as demonstrated by the OpenLearn Linked Data application developed by Fouad Zablith.

Representing an OpenLearn unit is realised through a specific class called OpenLearnUnit, which is a subclass of foaf:Document. Most of the common fields, such as title, subject and description of the unit are represented through common Dublin Core properties. A specific property relatesToCourse is used to relate a unit to the corresponding course in the Course Description dataset. We also use the Creative Commons Rights Expression vocabulary to express the license attached to the content of the unit (mostly Creative Commons Attribution – NonCommercial-ShareAlike 2.0 Licence) and the Nice Tag Ontology to connect units to the keywords they have been tagged with.

While all this information is already available in structured form from the OpenLearn websites (through XML descriptions and RSS feeds), having it in directly accessible, Web addressable and queryable makes it easier to create new interfaces, new links and new processess that facilitates the use of this information for resource discovery. Some elements are still being investigated, regarding in particular the complex connection that might exist between an OpenLearn unit and the corresponding course material as described in the library catalogue.

Results of the KMi Linked Data Application Competition

Mathieu — Thu, 24 Mar 2011 15:36:20 +0000

One of the biggest worry we had at the beginning of LUCERO was that we were promising quite a lot: we were not only going to establish the processes to expose public university data as linked data, but also to demonstrate the benefit of it through applications. Originally, we naively thought that we were going to build two demonstrators, providing obvious and complete illustrations of the ways in which linked data could support students and researchers in better accessing information from the university, and better exploit it. We quickly discovered that this “killer app” approach wasn’t going to work, as the benefits of linked data appear to be a lot more in the many “day-to-day” use cases, rather than in large, “clever” application projects. In other words, as clearly shown in both Liam’s post and Stuart’s post, data.open.ac.uk is quickly becoming an essential resource, a piece of the information infrastructure, that benefits use cases, scenarios and applications of all sorts and scales.

That’s when we thought of organising a linked data application competition in KMi. KMi is full of very smart people, researchers and PhD students with the skills, knowledge and energy to build this sort of apps: lightweight, web or mobile applications to demonstrate one specific aspect and one specific use of the Open University’s linked data. I’m not going to give all the details of the way the competition was organised. We received four incredibly interesting applications (the promise of winning an iPad might have helped). This four applications are now featured on the brand new Data.open.ac.uk Application Page together with other applications currently being developed.

So, congratulations to our winners! The choice was really difficult (and you might not agree with it), as the applications described below are all great examples of the many things that can be achieved through opening up and linking university data.

The Winner: OpenLearn Linked Data (Fouad Zablith)

OpenLearn Linked Data makes use of data from data.open.ac.uk to suggest courses, podcasts and other OpenLearn units that relate to an OpenLearn Unit. The application takes the form of a bookmarklet that, when triggered while browsing the webpage of an OpenLearn unit, will add to the page small windows with links to the relevant course in Study at the OU, to podcasts from the OU podcast repositories and units from OpenLearn that share a common tag.

The great thing about this application is that it addresses scenarios directly relevant to students, prospective students and users of OpenLearn in general. It very naturally exploits the way linked data removes the boundaries that exist between different systems within the Open University, without having to change or integrate these systems.

Second Place: OU Expert Search (Miriam Fernandez)

The OU Expert Search system (accessible inside the OU network only) allows users to find academics at the Open University who are experts in a given domain, providing a ranked list of experts based in particular on their research publications. It uses information about publications in ORO and computes complex measures to provide a ranking of the people who are most likely to be experts in the given domain. It also integrates data obtained from the staff directory of the Open University to provide contact details for the people in the system.

Here as well the strong point of the application is its apparent simplicity. It is very easy to use and has been applied already for example to find Open University experts on Volcanoes (see Stuart’s blog post). Expert search is a complex task, and OU Expert Search, through the use of linked data, makes it look rather simple.

OUExperts (Vlad Tanasescu)

OUExperts is a mobile (android) application to find Open Univeristy experts in a given domain, and connect to their social network. Similarly to the OU Expert Search application, it relies on information related to the scientific publications of OU researchers, as available in ORO. It also finds synonyms of the given keywords, and tries to connect to the pages of the listed researchers.

The interesting aspect of OUExperts, apart from being a mobile application, is the clever attempt to connect to social networking website, so that it is not only possible to find an expert, but also to connect to them on Facebook or LinkedIn.

Buddy Study (Matthew Rowe)

Buddy Study suggests potential contacts and Open University courses to follow for students, based on the analysis of the topics in the user’s Facebook page. The application attempts to extract from the user’s Facebook page prominent topics, which are then matched to the interests of other people, and to the topics covered by courses at the Open University.

In this case, it is the social aspect of a user’s presence online which is used to create connections into the data from the Open University, creating easily accessible entry points to the data.

Connecting the Reading Experience Database to the Web of Data

Mathieu — Tue, 08 Mar 2011 09:31:21 +0000

The Reading Experience Database (RED) project is dedicated to collecting and using evidences of reading experiences for teaching and research. The project has created a large and very rich database regarding specific situations in which a person has read a text, and how such an experience was evidenced.

RED is one of the projects from the Open University’s Faculty of Arts working with LUCERO, as an early example on how linked data can be applied to research in humanities, and in general. And it is really a very good example! We have been working on an initial method to extract the content of the RED database into RDF, combining several well known vocabularies (see figure below). While we are still at an early stage in the whole process, this has given us a great insight into the challenges and potentials for linked data in such a domain.

Data cleaning is clearly one of our biggest issues. The RED database is mostly based on contributions from various people, from researchers in humanities connected to the project, to interested individuals. As a result, many entities are duplicated, misspelled, or mistakenly aggregated. A lot of these problems can be addressed automatically through filters, but the major part has to be addressed by the RED team, who are currently involved in a cleaning, normalisation and restructuration process.

Unsurprisingly, where the linked data approach really creates novelty here is in the links. We have published a “preview” of the dataset in data.open.ac.uk, we initial sets of links for people and places, to their (supposed) equivalent in DBPedia. For example, Virginia Woolf, who is both an author and a reader in the RED database, is represented as http://data.open.ac.uk/page/red/person/woolf-virginia, which is linked to the corresponding DBPedia http://dbpedia.org/page/Virginia_Woolf.

This might not look like much in principle, but in reality, it opens up to new ways to look at the data, that couldn’t be anticipated even by the researchers involved in modelling it. I gave a quick talk at a workshop organised two weeks ago by the RED team, to an audience of researchers and lecturers in humanities (see picture above). Showing the benefit of linked data to such an audience is clearly not the most trivial task. I therefore developed a small demonstrator that presents in one page the information about a given person from the RED database (here, Virginia Woolf), together with some information from DBPedia (abstract, categories, and influences). Now, where it becomes interesting, is that the information from DBPedia can be used to filter and browse the information in the RED Database. What this demonstration can do is, through clicking on the corresponding categories, tell you what other people in RED are, according to DPPedia, People from Kensington, People With Bipolar Disorder, Bisexual Writers, Writers Who Committed Suicide, etc. Looking at this, through one simple set of links to one dataset, we can already see emerge a brand new research questions and a new set of research practices, together with the data to start exploring them. We can only be overwhelmed thinking about what will happen when the approach is generalised to more links, more datasets, and more research projects.

Putting Linked Data to work: A developer’s perspective

ostephens — Thu, 09 Dec 2010 15:48:16 +0000

This is a guest post by Liam Green-Hughes, a developer at The Open University, relating his experience with Linked Data to date, and his initial use of Linked Data from http://data.open.ac.uk.

Over the last few months I have been on a bit of a journey in the world of Linked Data. It has had highs and lows, frustrations and triumphs, but in the end it was worth it. When you first enter the world of the “Linked Data movement” the terminology can be baffling it is a world of “triples”, “RDF”, “SPARQL” and lots of other terms that will come as news to many developers. Yet when you see past all that it turns out it is a genuinely useful concept to developers like me. Over the past few weeks I have been experimenting with the Open University’s new Linked Data service in a number of scenarios and have now used it in a production environment.

Without a doubt the learning curve involved in being able to use Linked Data is significant. Some of this may be down to the fact that it is relatively early days for such technology so it will take a while to settle down. A lot of the learning curve to do with learning the SPARQL language. It was designed to be a SQL-like way to obtain data from endpoints. I have heard it described as “more powerful” than SQL but I am not sure that comparison serves much purpose, the two languages have similarities but exist to do different things. SPARQL often feels more difficult to understand, less obvious what is going on that SQL. Maybe it needs some more work as a language in order to open it up to more developers. Once you have worked out a few example queries though you have enough to get going and at least.

A breakthrough on my journey to be able to use Linked Data was the realisation that I should just stop worrying about the terminology and get on with using the data. When you query a SPARQL endpoint what you actually get back is XML or JSON. For a developer this is wonderful as many programming languages include tools to parse these formats. Sure there are libraries available that will give you richer functionality, it just depends on what you want to do. Parsing incoming results is easy enough. When I realised this I wrote a blog post “An approach to consuming Linked Data with PHP ” which provided me with a basis to do other things. There are many things that are complicated about Linked Data, but actually sending and receiving information to an endpoint turned out not to be one of them. In fact it turned out to be easier than using a SOAP service!

Many people use Linked Data services to construct reports or create mashups but I started experimenting with another theme. Linked Data is part of an idea to move to a “web of data” from a “web of documents”.. For me this fitted very well with another big change happening in the web; the move away from just looking at web pages on a desktop computer to consuming data on lots of other devices like mobiles, smart televisions and tablets. Each of these classes of devices have very different user interface requirements, often with dedicated applications. The underlying data driving these applications can be the same though so my experiments so far have centred around the idea of bringing OU data into new contexts such as mobiles and to the social web.

In the OU’s Linked Data store we are lucky enough to have access to all sorts of data including podcast information. With an idea in my head about using Linked Data to reach devices I looked at the new JQuery Mobile library which is designed to make creating mobile web applications easier. So the actual application still runs on a server, but when you view the output of the application on a mobile web browser (or in a thin app wrapper) it looks very much like a mobile application. The advantage of this approach is that you end up with something that works across a variety of mobile devices, for a fraction of the effort it would take to create apps for each mobile operating system. I created the app with navigation menus to drill down to the podcast you require. This was quite easy to do as I could create SPARQL queries on the fly using menu selections to filter the information. In fact I could get a prototype together in only a few hours. It would need more work to be production useful, but being able to use SPARQL on the fly was very useful.

Encouraged by this success I felt it was time to put our Linked Data store to good use with a production requirement. We have an application inside Facebook called Course Profiles which enables students to advertise what course they are studying with the OU on their profile page. For a while keeping the course list up to date has been a bit tricky. It used to be done by either manually adding entries or taking input from a file containing all of the course codes. Linked Data came in useful in this case as we can use it to obtain the live course codes and load this directly into the database though the application. The data was quite easy to parse and then it is possible to work out which courses are missing and add them. This will make it much easier to keep information up to date and I know that it is coming from a publicly documented source, making it easier for me as a developer to work with.

My third experiment was to try to hook up the Linked Data service to Google’s App Inventor for Android. Connecting these two cutting edge developments seemed like a good opportunity to me, having all of this data readily available means that all sort of new apps will be possible. App Inventor isn’t just a great teaching tool for programming, it is also a great prototyping tool. Creating an app this way might not lead to such a polished result as coding by hand but does mean that you can quickly experiment with building different kinds of apps quickly and with limited effort needed. App Inventor is a little limited in what it can cope with in web data terms, so I wrote a script to enable conversion of the results from an endpoint into a format that App Inventor could deal with. Full details are in my post: Using Linked Data in App Inventor for Android with the help of a bridging script.

So why not just use an API instead of Linked Data? Many developers are used to using APIs and many sites, e.g. Twitter and Facebook support an API to extend their functionality. I don’t think Linked Data is necessarily better than an API, but it might offer a better solution in some circumstances. The data you can return is very flexible, you don’t need to look up the syntax of a new API call every time you want new information. Also if two sites support a Linked Data endpoint the way you work with them will be broadly the same, you don’t have to worry about things like downloading any extra software libraries to be able to access the API.

It would be wrong to ignore the issues around the usage of Linked Data, the learning curve involved and the difficulties in making those first steps. My own journey with it has been difficult and at first I wasn’t persuaded. Now having experimented with it and forced myself though a bit of the learning curve it is delivering on its potential and I am encouraged to continue learning about this technology. It is still early days for the Linked Data movement so it is a great time for developers to get involved and help this idea grow and work with others to sort out difficulties along the way. The prize is ready access to lots of information about the world around us.

Publishing ORO as Linked Data

ostephens — Fri, 26 Nov 2010 11:54:19 +0000

The data

One of the first data sets to be made available on http://data.open.ac.uk is the contents of ORO (Open Research Online), the Open University’s repository of research publications and other research outputs. The software behind ORO is EPrints, open source software developed at the School of Electronics and Computer Science and is used widely for similar repositories across UK Higher Education (and beyond).

ORO contains a mixture of metadata for items and full text items (often as PDF). The repository includes a mixture of journal articles, conference papers, book chapters andtheses. The data we are taking and presenting on http://data.open.ac.uk is just the metadata records – not any of the full-text items. Typical information for a record includes:

title
author/editor(s)
abstract
type (e.g. article, book section, conference, thesis)
date of publication

The process

We had initially expected to extract data from ORO in an XML format (possibly RSS) and transform into RDF. However, Chris Gutteridge, the lead developer for the EPrints, added an RDF export option to version 3.2.1 of EPrints, and since we could get this installed on a test server we decided we would make use of this native RDF support. We did make a few small changes to the data before we published it, mainly to replace some of the URIs assigned by EPrints with data.open.ac.uk URIs as noted the blog post ‘First version of data.open.ac.uk‘.

Issues

In general, the process of publishing the data was quite smooth. However, once we had published the data it quickly became apparent there were some issues with the data. Most notably we found that in some cases two or more individual author details were merged together into a single ‘person’ in the RDF. Investigation showed that the problems were in the source data, and were caused by a number of issues:

Incorrectly allocated author IDs in ORO

ORO allows an (Open University) ID to be recorded for Open University authors, and we use this ID as a way of linking together works by an author. Unfortunately in some cases incorrect IDs had been entered, leading to two separate identities to become con-fused in our data

Name changes

In some cases the author had changed their name, resulting in two names against the same author ID. While all the information is correct, it leads to slightly confusing representation in the RDF (e.g. Alison Ault changed her name to Alison Twiner)

Name variations

In some cases the author uses different versions of their name in different publications. Good practice for ORO is to use the name as printed on the publication, which can result in different versions of the same name – for example in most papers, Maria Velsco-Garcia’s name is spelt with a single ’s’ in Velasco, but in one paper, it is spelt Velassco with a double ’s’.

A particularly common inconsistency was around the use of accents on characters – where sometimes a plain character was used instead of the accented character – this seemed to be down to a mixture of data entry errors and variations in the use of accents in publications

Incorrect publisher data

There were a couple of examples where the publisher had incorrect data in their systems, which had been brought through into ORO. One particular example split a single author with several parts to their name into two separate authors.

Having identified the records effected, the next challenge was correcting them – firstly investigation into each error (this could be challenging – especially where name changes had occurred it was sometimes difficult to know if this was the same person or not), secondly the question of where these are corrected. In this case we were given edit access to ORO so we could make the corrections directly, but the question does arise – what happens if you can’t get the errors corrected in the source data set?

Conclusions

One of the interesting things for me is that these small errors in data would be unlikely to be spotted easily in ORO. For example, when you browse papers by author in ORO, behind the scenes, ORO uses the author ID while presenting the user with the names associated with that ID. Because of this, you would be hard pushed to notice a single instance of a mis-assigned identifier. However, once the data was expressed as RDF triples, the problem became immediately apparent. This means that a very low error rate in ORO data, is magnified into obvious errors on http://data.open.ac.uk

I suspect that this ‘magnification’ of errors will lead to some debate over the urgency of fixing errors. While for http://data.open.ac.uk fixing the data errors becomes important (because they are very obvious), it may be that for the contributing dataset (perhaps especially large datasets of heterogeneous data such as bibliographic data) fixing these errors is of lower priority.

On the upside, using the data on data.open.ac.uk we can start to run queries that will help us clean the data – for example, you can find people with more than 1 family name in ORO.

Introducing Lucero

ostephens — Thu, 25 Nov 2010 11:04:17 +0000

Having made great progress with Lucero in October, with the launch of http://data.open.ac.uk, and the publication of our first data sets as Linked Data, we now have something to start talking about and showing to people. We’ve used Twitter extensively for our first wave of dissemination, including the first announcement of data available at http://data.open.ac.uk by Mathieu:

It is easy to see the impact this had on traffic to the website

The announcements we made regarding establishing http://data.open.ac.uk and the first data sets were picked up and retweeted extensively, including re-tweets from Andrew Stott (previously UK Government Director of Digital Engagement at the Cabinet Office), and Professor Jim Hendler (a leading expert in the Semantic Web and related technologies)

Twitter has proved effective for immediate dissemination of project milestones, but there is a lot of detailed information that we want to communicate and so we have started to present longer form information, both on this blog, but also through seminars. Mathieu introduced Lucero in a seminar at the Knowledge Media institute (KMi) at the Open University on the 3rd November 2010. A recording of this is now available to view online at the KMi Stadium, and the slides from this presentation are available on Slideshare. On the same day I gave a presentation to Library staff at the Open University, and the slides are also available on Slideshare.

To reach a wider audience we’ve worked with the Media team at the Open University to issue a press release about the project and data.open.ac.uk. We hope that this will help us reach those who are unlikely to be following the project in detail, but who have an interest in the overall aims and objectives of the project.

We’ll be continuing to disseminate the work of the project through many routes, so keep track through this blog, or by following the team on twitter (@mdaquin, @ostephens, @stuartbrown, @fzablith) and tracking the project hashtag #luceroproject. If you are interested in the type of information you can get from http://data.open.ac.uk you can also look for #queryou where examples of SPARQL queries against the data are being shared (and feel free to add your own).