Libraries are one of the oldest ways of organising information. All of the great ancient empires had (and venerated) their libraries. From Mesopotamia to Egypt to Greece and Rome. For the vast majority of their history, libraries have organised information that is manifested in physical objects like books, and then later journal articles and microfiche, and even later CDs and DVDs. The organising principles on which libraries have been built, have been rooted in organising physical objects. And physical objects can only be in one place.
Increasingly, libraries are tasked with organising digital resources rather than physical ones. Libraries have depended on organising resources using hierarchies. This creates silos of knowledge with few ways to link different items. Users have to follow specific paths to find knowledge. The early days of the web also shared these issues. Now however, the founder of the web Tim Berners-Lee, is working to bring about the next step of the web: the semantic web. The semantic web depends on the concept of Linked Data (Berners-Lee, 2009).
Linked Data is a way of standardising the metadata used to describe resources and data on the web to make it easier for machines to understand what they are about, using real life language. Resources are then tagged with this metadata so a search could be done for a number of tags. Each tag on a resource comprises a triple: subject-predicate-object or thing-relationship-thing. So for example a resource about Egypt, might have tags that look like: Egypt-has capital city-Cairo, Egypt-has river-Nile, Egypt-in continent-Africa. You could therefore search for rivers in Africa or capital cities in Africa, and still find the resource.
This can break down the silos that occur within hierarchical organisations of knowledge. The BBC are leaders in using Linked Data to connect their content. Dan Ramsden, user experience architect at the BBC, explains that Linked Data is the “natural technology for supporting curiosity” as there is not one route to a resource decided by an editor. Each item of content can be the start end or middle of a journey. He explains that the web used to be like travelling down “narrow passageways” but with Linked Data it is more like a “grand cathedral” (Ramsden, 2017). Now there are calls for libraries to join in with Linked Data and ditch more traditional metadata schemes such as MARC (Alemu, Stevens, Ross, & Chandler, 2012).
Libraries are not new to the metadata game or indeed standardising it. Universal principles and approaches such as FRBR and RDA have been developed to maintain interoperability between library catalogues and records. Similar standards have been developed to meet the unique needs of archives and museums (Zeng & Qin, 2016). But these approaches have tended to try and connect libraries to other libraries, rather than to organisations other than libraries. Linked Data gives libraries the opportunity to do this and put their resources front and centre, in the places where users actually look for information (Godby, 2015).
Furthermore, the existing schemas place too much of a burden on users to navigate them to find the information they need, rather than using machines to do the work (Alemu, Stevens, Ross and Chandler, 2012). Alemu et al (2012) go on to argue that this organising system simply does not scale and maintains the limitations of card catalogues. Like much of the existing web, library metadata helps you find documents rather than data.
Alemu et al (2012) see RDF as a way of navigating the tension between objectivist approaches to organising information such as MARC, and the socially constructed metadata created by users, within web 2.0. They see RDF as a way of overcoming the rigid hierarchies of the former and the shallow inconsistent nature of the latter. They argue that MARC should be abandoned entirely and replaced with RDF. This would require a reconceptualisation of FRBR and RDA. Thereby bridging both approaches (Alemu, Stevens, Ross, & Chandler, 2012).
In digital world dominated by Google, metadata can seem irrelevant, but as Zeng and Qin note, search engine technology only really works for text. Search engines work by using machine readable text, so an image for example can only be found by a search engine if a human has added effective metadata. Metadata increasingly carries the “information burden” for resources on the web (Zeng & Qin, 2016, p. 90). As Godby notes, the current presentation of unstructured data that has dominated the web doesn’t work particularly well even for textual data. It is hampered by the ambiguity of natural language and the “sea of text” overwhelms key information (Godby, 2015, p. 6).
So would the adoption of RDF in libraries work? And be beneficial?
Godby outlines three key benefits of structured data on the web:
- What the search is about has already been resolved – the layout is much more easily comprehended and user friendly.
- By centring on an object which has a unique public identity, rather than a list of documents, you can pull together data from different sources so information like the opening times of a restaurant plus reviews can be seen together.
- Entities have lots of links to and from them – taken as sign of prominence – these should include links to libraries and museums with related information and give them prominence too.
Godby (2015) advocates for Linked Data as a way of balancing our commitment as a profession to the opening up of information with using our expertise to complement the process. Alemu et al (2012) see another balancing (or rebalancing) in terms of power, where users can contribute to the creation of metadata alongside information professionals.
Strides have been made by libraries in Linked Data, with the redesign of Library of Congress Subject Headings to incorporate Linked Data principles and the inclusion of Linked Data in its records. Earlier, the Dublin Core scheme was developed as a “lightweight version of librarians’ descriptive standards (Godby, 2015, p. 12), specifically to describe web documents. Estimates say that by 2015, 15% of Linked Data resources came from libraries and publishers (Godby, 2015). So it does seem achievable in practice.
However, there are major resource implications to replacing all of the MARC records out there. As Godby (2015) acknowledges, Library of Congress Subject Headings are authority control files, whilst bibliographic data is much larger and complex. MARC currently expresses thousands of concepts (Godby, 2015). So it is incredibly complicated for libraries to create RDF metadata, requiring new tools and training. Perhaps most significantly, the problem of how to reconcile principles like FRBR, which are quite necessary for bibliographic items, isn’t an easy one to solve, though attempts are being made (Alemu, Stevens, Ross, & Chandler, 2012; Godby, 2015).
Alemu, G., Stevens, B., Ross, P., & Chandler, J. (2012). Linked Data for libraries: Benefits of a conceptual shift from library-specific record structures to RDF-based data models. New Library World, 113(11/12). doi: 10.1108/03074801211282920
Berners-Lee, T. (Presenter). (2009). The Next Web. [Video]. Retrieved from https://www.ted.com/talks/tim_berners_lee_on_the_next_web?language=en
Godby, C. J. (2015). Library linked data in the cloud: OCLC’s experiments with new models of resource description. San Rafael, California: Morgan & Claypool.
Raimond, Y., Ramsden, D., Bartlett, O., & Angeletou, S. (2017) Linked data and the semantic web. Retrieved October 12 from https://www.bbc.co.uk/academy/en/articles/art20130724121658626
Zeng, M. L. & Qin, J. (2016). Metadata (2nd ed). London, UK: Facet Publishing.