Linked data is about data on the Web. It is often seen as a complicated set of technologies and principles, but linked data is in reality based on very simple principles, which makes the tasks of publication and integration of data happen on the Web, relying on the “architecture of the Web”.

First, let’s consider what the Web is: a network of documents connected by hyperlinks. That sounds easy enough, but the main advantages of the Web is that documents are shared and connected independently from where they are stored, where that are managed and where they are read. By “where” here I mean the actual geographical location, the machine on which things are happening, as well as the systems and tools that are used to do these things. You can add your own webpage, and link it to other webpages without having to take into account such things as “on what server it is” and “what software was used to create it”. In other words, the great strength of the Web is that it is global network, that completely abstracts from artificial, technology-related boundaries.

It does not take long to realise that dealing with such boundaries is one of the most difficult part of managing and using data, in and across organisations. Linked data is a movement, mostly lead by researchers, technologists and developers, with the goal to apply these same principles the Web is based on to data.

Principle = data on the Web

The very basic idea linked data is based on is that data, instead of staying in database management systems that are “interfaced” to the Web, are put directly on the Web, using the Web architecture. What this means in practice is that every piece of data, every object or resource the data talks about, is identified by a Web address. One of such object can be, for example, the course M366: Natural and Artificial Intelligence at the Open University. This object is associated with the Web address (the URI) http://data.open.ac.uk/course/m366. Another object represents the Open University itself, with the URI http://data.open.ac.uk/organization/the_open_university.

The whole point of doing this is that, first, going to these addresses will provide the data about the corresponding objects in a standard (tool independent) format, which is called RDF (Resource Description Framework). Most importantly, the representation of these objects, the information we have about them, is purely based on links. RDF is a graph-based data model that can connect URIs to other URIs of objects using relationships that are themselves labeled by Web URIs. In our example, part of the information about the course M366 is that it is taught at the Open University. This is represented by a link saying that http://data.open.ac.uk/course/m366 is taught at http://data.open.ac.uk/organization/the_open_university (this link is what is often called an RDF triple: and expression of the form subject-predicate-object, where the subject and predicate are web URIs of data objects, and the object can be a URI or a literal, such as a number or a string).

The powerful thing here is that the entire data modelling is based on this one, simple mechanism: links between Web addresses. This basic principle is what makes the abstracting power of the Web to apply to data, connecting resources that can be located anywhere and managed by anyone. A course such as http://data.open.ac.uk/course/m366 for example is also connected to the countries it is available in through such links, but in this case, the URIs used for the countries don’t come from our data, but from other data on the Web, created by the geonames system. The URIs from geonames are also linked to many other datasets such as DBpedia (the linked data version of wikipedia). This ability to link to and reuse other URIs, independently from where they were created and in what system, makes it possible to connect for example openly available demographic information (from DBpedia) with the courses available in the corresponding country in a big global data-graph on the Web.

Of course, beyond this summary, there are many subtle challenges with dealing with linked data, at the technical level (e.g., how to create the links; how to query the data across platforms), as well as at the organisational level (e.g., what licence to use for the data; how to keep track of provenance and attributions). However, the value of linked data is now appearing clearly as making it easier to work with your and other’s data without the usual technological obstacles and barriers that this implies, as well as creating new possibilities in the way we use data, and the infinite connections that can be created amongst them.

What linked data is good for

Besides the large, global vision of linked data, its use in an organisation to expose its public information, or even to manage internal data, brings new possibilities that traditional data management models have been notoriously bad at handling: It provides a model for naturally accessible and integrated data. In addition, the graph model it uses offers a level of flexibility that makes it possible to extend and enrich linked data incrementally, without having to reconsider the entire system: there is no system, only individual contributions.

Within the LUCERO project, we have worked on a number of scenarios and applications demonstrating exactly that: reusing data wherever they come from is becoming almost trivial with the availability of resources such as data.open.ac.uk. One example of this is a small application developed by Fouad Zablith that connects an OpenLearn unit online to the relevant course at the Open University, to podcasts that talk about the same topic and to other OpenLearn Units that have been annotated with the same tags (see below). In principle, that does not sound overwhelming, but in order to realise such an application without linked data would have required integrating data from at least three different systems, each having their own format, modelling, access modes, etc. Linked data makes it possible to abstract from these issues, and to build a lightweight mechanism that fits in a small Web application.

Another example showing how linked data can be used to put together, in a meaningful way, data from very different sources is the application we built as a demonstrator using the Reading Experience Database  (see this post on the LUCERO blog). The idea here is that there is a core of institutionally created data, used by researchers to answer questions related to the connection between people, locations and situation, and what they read across time. Through linked data, it is possible to connect such information, previously seating in an isolated database, with other sources of information, enriching the core research data with data for example coming from DBpedia, concerning the people, places and books involved. In addition to opening up the research data with new entry points and new, mostly unintended use, this has the potential to make emerge new research questions that couldn’t be envisioned before.

These two examples demonstrate some key elements of linked data, and how we can take benefit from it by reducing the cost of managing and using our own data, but more importantly, by creating new value out of the connections between previously isolated data. We are still very far from completely understanding what can be achieved with linked data. The simple principles it is based on open up to completely new ways to deal with data, to manage them and to use them.

Links: