The Australian government has supported the development of repository infrastructure for several years now. One product of this support was the ARROW project (Australian Research Repositories Online to the World). The ARROW project sponsored a hybrid commercial/open-source approach to building vendor-supported repository infrastructure with open-source underpinnings.
Acknowledgements & Credits
At USQ, there was a team involved in this work: Oliver Lucido, Ron Ward, Linda Octalina, Bronwyn Chandler Caroline Drury and Duncan Dickinson all assisted in programming and/or project management.
Alison Dellit at the National Library of Australia and Neil Dickson from Monash were on the project team as stakeholder/drivers.
The Fascinator was conceived initially as a way to prove a point in an ongoing dialogue within the ARROW project about repository architecture. The goal was to test the hypothesis that it would be possible to build a useful, fast, flexible web front end for a repository using a single fast indexing system to handle browsing via facets, full-text search, multiple 'portal' views of subsets of a large corpus, and most importantly, easy-to administer security that could handle the most common uses cases seen in the ARROW community. This contrasted with the approach taken by ARROW's commercial partner, which used several different indices to achieve only some of the same functionality in an environment which was much more complex to manage and configure.
We will give an overview of the product in both functional and technical terms, in the two modes that the product is used in: The Fascinator Server and The Fascinator Desktop.
What does it do?
Functionally, The Fascinator offers:
Technically, The Fascinator is a modular system, written in Java so it is easy to deploy with Fedora and Apache Solr, consisting of:
An indexing system for Fedora which builds on the standard G-Search supplied with the software, and some work done by the Muradora team.
A configurable harvesting application which can ingest data from OAI-PMH, OAI-ORE, and local file systems.
A web portal application which can be used to build flexible front end websites or act as a service to other sites via an HTTP API.
An OAI-PMH (and ATOM feed) system which can create sub-feeds from a repository very easily without complexities like OAI-PMH sets.
While The Fascinator Server's goals were modest it has been met with some enthusiasm by repository managers in Australia and beyond, and is being trialled and/or piloted in a small number of sites across the world.
An example of the Fascinator in action is the incomplete Australian University Repository Census, which harvests Institutional Repositories. See this web page which explains some of the normalization issues that the service is designed to illustrate to the community, so that solutions maybe sought.
The ARROW discovery service harvests research outputs from Australian repositories and makes them available in a faceted browsable, searchable web site. This site gives a reassuring picture that the repositories effort in Australia is paying off. For example if you do a search for climate change, the site shows this range of resource types:
However, this view is only available because the normalising rules in the NLA harvester,which pulls data from repositories and then normalises it for inclusion in the repository. A similar search on an incomplete dataset without normalisation looks like this:
Journal Article (184)
Book chapter (65)
Conference Paper (35)
PhD Doctorate (20)
journal article (18)
Book Chapter (17)
Book Section (10)
Book Chapters (7)
Note the inconsistency of what repositories publish in their OAI-PMH feeds – there are things labeled as Journal Article, Article, journal article which may or may not be the same as the NLA's journal article type, And what is a 'b1'? Only those familiar with Australian Government repository requirements for university research are going to understand that one.
The Fascinator Desktop is a new software project which was inspired by a perceived gap in the available free eResearch software for small-scale research. How does one get 'ordinary' files from a hard disk into a data grid, or even backed up?
The concept was introduced in a blog post:
We’ll start with a local installation of The Fascinator – that puts Fedora 3 and Apache Solr on your desktop. Don’t worry, we have a simple installer. It’s all Java, so it might be painful for the programmers at times but it should install pretty much anywhere.
Then we will add a file-system indexer for The Fascinator – pretty much like what Picasa does, it will index all of your stuff. It will grab whatever metadata it can, including properties from office documents, EXIF metadata and tags from images . We will also treat the file system as a source of metadata so you will be able to explore using metadata facets and file system facets using the same interface. This should be a very straightforward addition to the existing software, it’s just a matter of bolting together some standard software libraries.
Next comes the taxonomy/tagging bit: we need a way to import tag-sets and taxonomies that you might want to apply to your content and then let you tag it. I think it will be important to support both formal metadata and informal tagging. For example, you might want to set up your own tag hierarchy with home/work at the root, and with work broken down into teaching/research and research broken up by project.
The Fascinator desktop finds all your stuff:
Treats directories/folders as metadata in their own right.
Folksy tagging and formal ontology/taxonomy support.
Route data to departmental, institutional or global repositories.
Configurable to deal with ANY data type – automatic conversion.
Configurable to deal with the way you work.
The Fascinator Desktop was inspired by a number of other software packages.
But going beyond these, the software is designed to allow researchers to 'slice and dice' their own data files, and to work with eResearch analysts on customised indexers which can deal with data and relationships between data. For example, for a researcher using a lot of video the indexer might be configured to make web-deliverable and preservation quality versions of video and route then to a repository and backup respectively. If there are time coded transcripts of video interviews, the the indexer will be able to provide full-text search into the video. These indexing rules will take time an effort to develop for each discipline and workgroup, but over time should be able to be re-used.
On a simpler level, researchers will be able to tag items, then create 'portal' views based on tags and move data via Atom to other services.
The Fascinator is not yet widely deployed in either of its manifestations, but it is freely available to download under a GPL license and at the Australian Digital Futures institute we are working on building a sustainable approach to maintaining the software.
Download from The Fascinator website.
Download the demonstration virtual machine with The Fascinator (plus some other stuff)
Join the Google Group.