The LUCERO Project » Tools http://lucero-project.info/lb Linking University Content for Education and Research Online Mon, 21 Jan 2013 08:34:16 +0000 http://wordpress.org/?v=2.9.2 en hourly 1 DiscOU: Discoverability of Open Educational Content http://lucero-project.info/lb/2012/08/discou-discoverability-of-open-educational-content/ http://lucero-project.info/lb/2012/08/discou-discoverability-of-open-educational-content/#comments Sat, 04 Aug 2012 14:07:01 +0000 Mathieu http://lucero-project.info/lb/?p=749 If there is one scenario that was prominent in driving the development of Linked Data at the Open University, it is the one related to the discovery of educational resources. Indeed, there is a basic assumption that providing structured, open and addressable descriptions of resources helps making these resources more visible. In fact, most of my early presentations of LUCERO (but, for some reasons, not the ones that are online) included a picture of somebody saying “I’ve just seen a very interesting BBC programme. What is there at the OU that can help me learn more about it?”. Two years later, we actually have a systems that does exactly that!

Indeed, with support from the Open University’s “Open Media Unit”, we built an application that can semantically analyse the textual content of online resources and match it agains semantically indexed Open University content (OpenLearn Units and Podcasts at the moment) . The result (implemented as a set of REST services, some Javascript and a bookmarklet) is, if I might say so myself, super cool. It’s called:


DiscOU

(and yes, we probably should have put more effort in choosing the name).

The whole thing is pretty much a combination of linked data and information retrieval technologies. The Open University resources are crawled through data.open.ac.uk, analysed using DBPedia Spotlight and indexed using Apache Lucene. A BBC programme page used as a starting point would pretty much go through the same process, using the RDF description of the programme from the BBC website, analysing the textual components and matching the results to indexed resources. Because we use DBpedia Spotlight, the resources are described (and indexed) based on DBpedia entities, which allows us to semantically characterise their overlap, based on the links between common entities. It also makes it possible for the user to customise the search process based on his/her own interests.

]]>
http://lucero-project.info/lb/2012/08/discou-discoverability-of-open-educational-content/feed/ 1
What to ask linked data http://lucero-project.info/lb/2011/06/what-to-ask-linked-data/ http://lucero-project.info/lb/2011/06/what-to-ask-linked-data/#comments Fri, 24 Jun 2011 15:54:37 +0000 Mathieu http://lucero-project.info/lb/?p=474 Publishing linked data is becoming easier, and we now come across new RDF datasets almost everyday. One question that keeps being asked however is “what can I do with it?” More or less everybody understand the general advantages of linked data, in terms data access, integration, mash-up, etc., but getting to know and use a particular dataset is far from trivial: “What does it say? What can I ask it?”

You can look at the ontology to get an idea of the data model used there, send a couple of SPARQL queries to `explore’ the data, look at example objects. etc. We also provide example SPARQL queries to help people getting the point of our datasets. Of course, not everybody is proficient enough in SPARQL, RDF-S and OWL to really get it using this sort of clues. Also, datasets might be heterogeneous in the representation of objects, in the distribution of values, or simply very big and broad.

To help people who don’t necessarily know/care about SPARQL `getting into’ a complex dataset, we developed a system (whatoask) that automatically extract a set of questions that a dataset is good at answering. The technical aspects of realising that are a tiny bit sophisticated (i.e., it uses formal concept analysis) and are detailed in a paper I will present next week at the K-CAP conference. What is interesting however is how such a technique can provide a navigation and querying interface of top of a linked dataset, providing a simple overview of the data and a way to drill down to particular areas of interest. In essence, it can be seen as an FAQ for a dataset, not presenting frequently asked questions, but the questions the dataset is specially good at answering.

What the tool does is creating a hierarchy of all the simple questions an RDF dataset can answer, and presents to the user a subset that, according to a set of metrics described in the paper, are believed to be more likely of interest. The questions are displayed in a pseudo natural language, in a format where for example “What are the (Person/*) that (knows Tom) and that (KMi hasEmployee)?” can be interpreted as the question “What are the people who know Tom and are employed in KMi?”. Questions can be selected, and displayed with their answers, and the question hierarchy can be navigated, selecting more specific and more general questions than the selected one.

To clarify what that means, let’s look at what it does on the data.open.ac.uk OpenLearn dataset. The initial screen shows a list of questions, the first one (“What are the (Document/*/OpenLearnUnit) that (subject Concept, relatesToCourse Course, relatesToCourse Module)?”, i.e., “What are the OpenLearn Units that are related to courses and have a topic?”) being selected. More general and more specific questions are also shown, such as “What are the OpenLearn Units that have a topic?” (more general) and “What are the OpenLearn Units that relate to a course and have for topic `Education Research’?” (more specific).

We can select alternative questions, such as the second in the list — “What are the OpenLearn Units in english distributed under a creative commons licence and that talk about Science?”, obtain a new list of answers (quite a few), as well as more general and more specific questions. We can then specialise the question to “What are the OpenLearn Unit in english under a CC licence that talk about science and family?” and carry-on with a more general question looking at the `family topic’ without science, to finally ask “What are the OpenLearn units about family?” (independently of the licence and language).

As can be seen from the example, the system is not meant for people who know in advance what they want to ask, but to provide a level of serendipitous navigation amongst the queries the dataset can answer, with the goal of giving a general overview of what the dataset is about and what it can be used for. The same demo is also available using the set of reading experiences from the RED dataset and the datasets regarding buildings and places at the OU. The interface is not the most straightforward at the moment, but we are thinking about ways by which the functionalities of the system could be integrated in a more compelling manner, as a basic `presentation’ layer on top of a linked dataset.

]]>
http://lucero-project.info/lb/2011/06/what-to-ask-linked-data/feed/ 2
wayOU – mobile location tracking app using linked data http://lucero-project.info/lb/2011/05/wayou-mobile-location-tracking-app-using-linked-data/ http://lucero-project.info/lb/2011/05/wayou-mobile-location-tracking-app-using-linked-data/#comments Mon, 23 May 2011 21:08:00 +0000 Mathieu http://lucero-project.info/lb/?p=450 As can be seen from the few previous posts on this blog, one of our main focus at the moment is, in addition to trying to handle with all the data that we still have to process, to develop applications that demonstrate the benefit and the potential of linked data. When we obtained data from our estate department regarding the buildings and spaces in the Open University’s main campus (in Milton Keynes) and in the 13 regional centers, we got quite excited. The data contain details of the buildings and surroundings of the buildings (car parks, etc.) with their addresses, floors, spaces, images, etc.

However, these data was not very well connected. We used links to the postcode unit descriptions from the address to the ordnance survey dataset, giving us an general view on the locations of buildings (and so allowing us to build a very crude map of OU buildings in the UK), but we didn’t have precise locations of buildings. We also couldn’t relate the buildings with events (e.g., tutorials), people (through their workplace, attendance, etc.)

We therefore decided to build an application to not only use these data, but also create some of these missing relations, and specially, to allow OU users to connect to the data.

The application is called wayOU, for “where are you in the OU?”. It can be used to “check-in” at specific locations indicating the “reason” for attending these locations, to keep track of the different places where the user has been, declare the current location as his / her workplace, as well as to connect to their network at the Open University, in terms of the people they share activities with. The video below explains the principle of the application better than I can do with text.

The application is now being tested and is made available for download (see QR code below – without guaranty that it will actually work) on data.open.ac.uk. Fouad is going to demonstrate it next week at the Extended Semantic Web Conference next week (see the abstract of the demonstration), and (perhaps more importantly) the sources of this first release are available in our code base.

qrcode

]]>
http://lucero-project.info/lb/2011/05/wayou-mobile-location-tracking-app-using-linked-data/feed/ 5
ROLE Widget Consumes Linked Data http://lucero-project.info/lb/2011/04/role-widget-consumes-linked-data/ http://lucero-project.info/lb/2011/04/role-widget-consumes-linked-data/#comments Thu, 14 Apr 2011 11:04:51 +0000 fouad http://lucero-project.info/lb/?p=426 This is a guest post by Alexander Mikroyannidis, a researcher at the Knowledge Media Institute of The Open University, discussing the use of http://data.open.ac.uk to identify related material to OpenLearn units within a Moodle block.


The winning application of the KMi Linked Data Application Competition has attracted the interest of the ROLE project (Responsive Open Learning Environments – www.role-project.eu). The OpenLearn Linked Data application was originally developed by Fouad Zablith as a showcase of querying data.open.ac.uk for educational resources related with OpenLearn courses. I have now transformed this application into a widget that can be directly embedded into any OpenLearn course as a Moodle block. The widget displays a list of Open University courses, iTunesU podcasts, as well as OpenLearn tags related to the course that the user is currently viewing. As data.open.ac.uk is constantly growing by integrating metadata from more repositories, the widget will also be extended with recommendations about educational resources of additional types. You can try out the current release by logging in as guest at: http://projects.kmi.open.ac.uk/role/moodle/course/view.php?id=3.


This widget is part of the widget bundles developed by ROLE for providing self-regulated learning support. ROLE is aiming at empowering learners for lifelong and personalised learning within a responsive open learning environment. OpenLearn comprises one of the project’s test-beds concerning the transition from formal learning towards informal learning, where the learner is in control of the whole learning process. For more information about the learning technologies developed so far by ROLE, please visit the Showcase Platform (http://www.role-showcase.eu/).

]]>
http://lucero-project.info/lb/2011/04/role-widget-consumes-linked-data/feed/ 3
Results of the KMi Linked Data Application Competition http://lucero-project.info/lb/2011/03/results-of-the-kmi-linked-data-application-competition/ http://lucero-project.info/lb/2011/03/results-of-the-kmi-linked-data-application-competition/#comments Thu, 24 Mar 2011 15:36:20 +0000 Mathieu http://lucero-project.info/lb/?p=399 One of the biggest worry we had at the beginning of LUCERO was that we were promising quite a lot: we were not only going to establish the processes to expose public university data as linked data, but also to demonstrate the benefit of it through applications. Originally, we naively thought that we were going to build two demonstrators, providing obvious and complete illustrations of the ways in which linked data could support students and researchers in better accessing information from the university, and better exploit it. We quickly discovered that this “killer app” approach wasn’t going to work, as the benefits of linked data appear to be a lot more in the many “day-to-day” use cases, rather than in large, “clever” application projects. In other words, as clearly shown in both Liam’s post and Stuart’s post, data.open.ac.uk is quickly becoming an essential resource, a piece of the information infrastructure, that benefits use cases, scenarios and applications of all sorts and scales.

That’s when we thought of organising a linked data application competition in KMi. KMi is full of very smart people, researchers and PhD students with the skills, knowledge and energy to build this sort of apps: lightweight, web or mobile applications to demonstrate one specific aspect and one specific use of the Open University’s linked data. I’m not going to give all the details of the way the competition was organised. We received four incredibly interesting applications (the promise of winning an iPad might have helped). This four applications are now featured on the brand new Data.open.ac.uk Application Page together with other applications currently being developed.

So, congratulations to our winners! The choice was really difficult (and you might not agree with it), as the applications described below are all great examples of the many things that can be achieved through opening up and linking university data.

The Winner: OpenLearn Linked Data (Fouad Zablith)

OpenLearn Linked Data makes use of data from data.open.ac.uk to suggest courses, podcasts and other OpenLearn units that relate to an OpenLearn Unit. The application takes the form of a bookmarklet that, when triggered while browsing the webpage of an OpenLearn unit, will add to the page small windows with links to the relevant course in Study at the OU, to podcasts from the OU podcast repositories and units from OpenLearn that share a common tag.

The great thing about this application is that it addresses scenarios directly relevant to students, prospective students and users of OpenLearn in general. It very naturally exploits the way linked data removes the boundaries that exist between different systems within the Open University, without having to change or integrate these systems.

Second Place: OU Expert Search (Miriam Fernandez)

The OU Expert Search system (accessible inside the OU network only) allows users to find academics at the Open University who are experts in a given domain, providing a ranked list of experts based in particular on their research publications. It uses information about publications in ORO and computes complex measures to provide a ranking of the people who are most likely to be experts in the given domain. It also integrates data obtained from the staff directory of the Open University to provide contact details for the people in the system.

Here as well the strong point of the application is its apparent simplicity. It is very easy to use and has been applied already for example to find Open University experts on Volcanoes (see Stuart’s blog post). Expert search is a complex task, and OU Expert Search, through the use of linked data, makes it look rather simple.

OUExperts (Vlad Tanasescu)

OUExperts is a mobile (android) application to find Open Univeristy experts in a given domain, and connect to their social network. Similarly to the OU Expert Search application, it relies on information related to the scientific publications of OU researchers, as available in ORO. It also finds synonyms of the given keywords, and tries to connect to the pages of the listed researchers.

The interesting aspect of OUExperts, apart from being a mobile application, is the clever attempt to connect to social networking website, so that it is not only possible to find an expert, but also to connect to them on Facebook or LinkedIn.

Buddy Study (Matthew Rowe)

Buddy Study suggests potential contacts and Open University courses to follow for students, based on the analysis of the topics in the user’s Facebook page. The application attempts to extract from the user’s Facebook page prominent topics, which are then matched to the interests of other people, and to the topics covered by courses at the Open University.

In this case, it is the social aspect of a user’s presence online which is used to create connections into the data from the Open University, creating easily accessible entry points to the data.

]]>
http://lucero-project.info/lb/2011/03/results-of-the-kmi-linked-data-application-competition/feed/ 0
Example Queries http://lucero-project.info/lb/2011/03/example-queries/ http://lucero-project.info/lb/2011/03/example-queries/#comments Tue, 15 Mar 2011 14:15:05 +0000 Mathieu http://lucero-project.info/lb/?p=378 Some times ago, we started to collect example SPARQL queries of interest through twitter, using the hashtag #queryou. This page is here to keep track of the queries collected, and to discuss them. Please fell free to contribute by sending your query on twitter using the #queryou tag.

Courses available in Nigeria

This query lists all the Open University courses that can currently be registered to from Nigeria.

select distinct ?course where {
?course <http://data.open.ac.uk/saou/ontology#isAvailableIn> <http://sws.geonames.org/2328926/>.
?course a <http://purl.org/vocab/aiiso/schema#Module>}

(see the results)

Things related to Earthquakes

This query find any video podcast and OpenLearn units which descriptions contain the term “earthquake”.

select ?c ?desc where {
?c <http://purl.org/dc/terms/description> ?desc .
{ {?c <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://data.open.ac.uk/openlearn/ontology/OpenLearnUnit>}
UNION
{?c <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://data.open.ac.uk/podcast/ontology/VideoPodcast>} }
FILTER regex(str(?desc), "earthquake", "i" )}

(see the results)

Subjects of Podcasts

Subject headings used to describe a specific podcast (@psychemedia).

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?s ?x ?y {
<http://data.open.ac.uk/podcast/9687b84ab18c389aace5b9fecdb42457> ?s rdfs:label ?y }

(see the results)

Subject headings used to describe all podcasts (@ppetej).

PREFIX rdf:
PREFIX rdfs:
PREFIX xsd:
PREFIX dcterms:
PREFIX skos:

SELECT DISTINCT ?tlabel
WHERE {
{?x a <http://data.open.ac.uk/podcast/ontology/VideoPodcast> ;
dcterms:subject ?t .
?t rdfs:label ?tlabel }
UNION
{?x a <http://data.open.ac.uk/podcast/ontology/AudioPodcast> ;
dcterms:subject ?t .
?t rdfs:label ?tlabel } }

(see the results)

Course offers and prices

Prices (ordered) with currency of OU level 1 courses in Arts and Humanities as available in France.

select ?course ?price ?cur
where {
?course <http://data.open.ac.uk/saou/ontology#OUCourseLevel> "1"^^<http://www.w3.org/2001/XMLSchema#string>.
?course <http://purl.org/dc/terms/subject> <http://data.open.ac.uk/topic/arts_and_humanities>.
?off <http://purl.org/goodrelations/v1#includes> ?course.
?off <http://purl.org/goodrelations/v1#hasPriceSpecification> ?ps.
?course <http://data.open.ac.uk/saou/ontology#isAvailableIn> <http://sws.geonames.org/3017382/>.
?off <http://purl.org/goodrelations/v1#availableAtOrFrom> <http://sws.geonames.org/3017382/>.
?ps <http://purl.org/goodrelations/v1#hasCurrencyValue> ?price.
?ps <http://purl.org/goodrelations/v1#hasCurrency> ?cur
} order by ?price

(see the results)

Course related podcasts

Video podcasts related to postgraduate courses in computing.

select ?x ?t
where {
?c <http://purl.org/dc/terms/subject> <http://data.open.ac.uk/topic/computing>.
?c <http://data.open.ac.uk/saou/ontology#courseLevel> <http://data.open.ac.uk/saou/ontology#postgraduate>.
?x <http://data.open.ac.uk/podcast/ontology/relatesToCourse> ?c.
?x <http://purl.org/dc/terms/title> ?t.
?x <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://data.open.ac.uk/podcast/ontology/VideoPodcast>}

(see the results)

Course topic and availability

Spanish courses available in Germany.

SELECT DISTINCT ?c WHERE {
?c <http://purl.org/dc/terms/subject> <http://data.open.ac.uk/topic/spanish> .
?c <http://data.open.ac.uk/saou/ontology#isAvailableIn> <http://sws.geonames.org/2921044/> . }

(see the results)

People names

People with more than one family name in ORO.

select distinct ?x ?y ?z where {
?x <http://xmlns.com/foaf/0.1/family_name> ?y.
?x <http://xmlns.com/foaf/0.1/family_name> ?z.
filter(?y != ?z) }

(see the results)

]]>
http://lucero-project.info/lb/2011/03/example-queries/feed/ 1
First version of data.open.ac.uk http://lucero-project.info/lb/2010/10/first-version-of-data-open-ac-uk/ http://lucero-project.info/lb/2010/10/first-version-of-data-open-ac-uk/#comments Mon, 11 Oct 2010 14:51:49 +0000 Mathieu http://lucero-project.info/lb/?p=212 LUCERO is all about making University wide resources available to everyone in an open, linked data approach. We are building the technical and organisational infrastructure for institutional repositories and research projects to expose their data on the Web, as linked data. It is therefore natural for the interface to this data, the SPARQL endpoint and server addressing URIs in this data to be hosted under http://data.open.ac.uk. The first version of the components underlying this site, as well as a small part of the data which will be ultimately exposed there have gone live last week, with a certain level of excitement from all involved.

What is there? The data

The “launch” of data.open.ac.uk happened relatively shortly after the beginning of the LUCERO project. Indeed, we take the approach that the basic data exposure architecture have to be in place, to incrementally integrate data into it. As a first step, we developed extraction and update mechanisms (see the previous blog post of about the LUCERO workflow) for two important repositories at the Open University: ORO, our publication repository, and podcast, the collection of podcasts produced by the Open University, including the ones being distributed through iTunes U.

ORO data concerns scientific publications with at least one member of the Open University as co-author. The source of the data is a repository based on the EPrints open source publication repository system. EPrints already integrates a function to export information as RDF, using the BIBO ontology. We of course used this function, post-processing what is obtained to obtain a representation consistent with the other (future) datasets in data.open.ac.uk, in particular in terms of URI Scheme. The ORO data represents at the moment 13,283 Articles and 12 Patents, in approximately 340,000 triples (see for example the article “Molecular parameters of post impact cooling in the Boltysh impact structure”).

Podcast data is extracted from the collection of RSS feeds obtained from podcast.open.ac.uk, using a variety of ontologies, including the W3C media ontology and FOAF (see for example the podcast “Great-circle distance”). An interesting element of this dataset is that it provides connections to other types of resources at the Open University, including courses (see for example the course MU120, which is being referred to in a number of podcasts). Podcasts are also classified into categories, using the same topics used to classify courses at the Open University, as well as the iTunesU categories, which we represent in SKOS (see for example the category “Mathematics”).

While representing only a small fraction of the data we will ultimately expose through data.open.ac.uk, the new possibilities obtained by exposing openly these datasets in RDF, with a SPARQL endpoint and resolvable URIs are very exciting already. In a blog post, Tony Hirst has shown some initial examples and encouraged others to share their queries to the Open University’s linked data. Richard Cyganiak has also kindly created a CKAN description of our datasets, for others to find and exploit.

The technical aspects

In a previous blog post, we gave an overview of the technical workflow by which data from the original sources would end up being exposed as linked data. The current platform implements parts of this workflow, including updaters and extractors for the two considered datasets. At the centre of the platform is the triple store. After trying several options, including Sesame, Jena TDB and 4Store, we settled for SwiftOWLIM, which is free, scalable and efficient, and includes limited reasoning capabilities, which might end up being useful in the future.

The current platform also implements the mechanisms by which URIs in the http://data.open.ac.uk namespaces are being resolves. Very simply, a URI such as http://data.open.ac.uk/course/a330 can either be re-directed to http://data.open.ac.uk/page/course/a330 or to http://data.open.ac.uk/resource/course/a330 depending on the content being requested by the client. http://data.open.ac.uk/page/course/a330 shows a browsable webpage linking the considered resource to related one, while http://data.open.ac.uk/resource/course/a330 provides the RDF representation of this resource.

A SPARQL endpoint is also available, which allows to query the whole set of data, or individual datasets through their namespaces, http://data.open.ac.uk/context/oro and http://data.open.ac.uk/context/podcast.

What’s next?

Of course, this first version of data.open.ac.uk is only the beginning of the story. We are currently actively looking at the way to represent and extract information about courses and qualifications from the Study At the OU website, as well as at information about places in the OU campus and regional centres (building, car parks, etc.)

More ways to access will also be soon made available, including faceted search/browsing, and links to external datasets are being investigated. All this is going to be gradually integrated into the platform while the existing data is being constantly updated.

]]>
http://lucero-project.info/lb/2010/10/first-version-of-data-open-ac-uk/feed/ 4
Initial Overview of the LUCERO Workflow http://lucero-project.info/lb/2010/08/initial-overview-of-the-lucero-workflow/ http://lucero-project.info/lb/2010/08/initial-overview-of-the-lucero-workflow/#comments Mon, 16 Aug 2010 14:45:31 +0000 Mathieu http://lucero-project.info/lb/?p=187 A large part of the technical development of LUCERO will consist of a set of tools to extract RDF from existing OU repositories, load this RDF into a triple store and expose it through the Web. This might sound simple, but the reality is that, in order to achieve this with sources that are constantly changing and that are originally working in isolation requires a workflow which is at the same time efficient, flexible and reusable.

The diagram below gives an initial overview of how such a workflow will look like for the institutional repositories of the Open University considered in the project. It involves a mix of specific components, which implementation require to take into account the particular characteristics of the dataset considered (e.g., an RDF Extractor components depend on the input data), and generic components, which are globally reusable, independently of the dataset. The approach for the deployment of this workflow is that each component, specific or generic, is realised as a REST service. The materialisation of the workflow for a given dataset is then realised by a scheduling programme, calling the appropriate components/services in the appropriate order.

LUCERO Workflow

One of the points worth noticing in this diagram is the way updates are handled. A set of (mostly) specific components are in charge of detecting, at regular intervals, what is new, what have been removed and what have been modified from a given dataset. They then generate a list of new items to be extracted into RDF, and a list of obsolete items (either deleted elements of data, or previous versions of updated items). The choice here is to re-create the set of RDF triples corresponding to obsolete items, so that they can be removed from the triple store. This assumes that the RDF extraction process consistently generates the same triples from the same input items over time, but has the advantage of having to keep track of updates only in the early stages of the workflow, making it simpler and more flexible.

Another crucial element concerns the way the different datasets connect to each other. Indeed, the workflow is intended to run independently for each dataset. A phase of linking is planned to be integrated right after RDF extraction (currently left out of the workflow), but this is essentially meant as way to connect local datasets to external ones. Here, we realise the connections between different entities of different local datasets through the use of an Entity Name System (ENS). The role of ENS (inspired by what was done more globally in the Okkam project) is to support the RDF Extractor components in using well-defined, shared URIs for common entities. It implements the rules for generating OU data URIs from particular characteristics of the considered object (e.g., creating the adequate URI for a course using the course code), independently from the dataset where the object is encountered. In practice, implementing such rules and ensuring their use across datasets will remove the barriers between the considered repositories, creating connections based on common objects such as courses, people, places and topics.

]]>
http://lucero-project.info/lb/2010/08/initial-overview-of-the-lucero-workflow/feed/ 2