The Age of Misc

Traditionally, organising has been seen as a human activity and retrieval as a machine activity (Glushko, 2013). Elaine Svenonius however recognised that automatic text processing and indexing (part of the information retrieval field) had come to either complement or make up for bibliographic description and cataloguing. Svenonius proposed that retrieving information and organising it are interconnected processes rather than separate ones. If you take a systems view, you see that the more thought that is put into organising information, the more effective its retrieval (Svenonius, 2000). Therefore, “we readily see that computers now assist people in organizing and that people contribute much of the information used by computers to enable retrieval” (Glushko, 2013, p. 7).


Distinctions between retrieval and organising, humans and machines, are rapidly collapsing in the digitally mediated world we live in. These themes are taken up by David Weinberger in his book, Everything is Miscellaneous (2007). His central argument is that physical items have to be organised in a particular way which is highly prescriptive but necessary because physical items can only be one place at a time. And they have to be physically be somewhere. And that place has to be known in order to retrieve them. But digital items can have much more freedom. They don’t need to ordered and organised on the way into an organising system, they can be ordered on the way out meeting the exact needs of a user. In essence, everything can be kept in a miscellaneous messy pile because we can use their digital nature to find them easily.

To frame his argument, Weinberger distinguishes three orders of organising:

  1. Primarily concerned with the arrangement of physical things such as books in a closed stack system where they are ordered by something relatively meaningless like accession number. This could also be the kitchen drawer of utensils to use one of Glushko’s examples (Glushko, 2013).
  2. The arrangement of information through systems like catalogues and classifications (traditionally using trees and hierarchies). Examples include Library of Congress subject headings and Dewey Decimal System. Weinberger does not include faceted classification systems like the Colon Classification scheme in the second order of organising. This is because faceted classification schemes attempt to put items in more than one place (before computers made it truly possible). Second order organising depends on professionals and experts predicting what the majority of their users want information for. There are trade offs to be made. For example, a book on my shelf is called Information Literacy and Social Justice (by Lua Gregory and Shana Higgins, 2013). Which topic has primacy? Information literacy or social justice? I personally would probably see the book as something someone interested in information literacy would be more likely to be interested in than someone whose primary interest is social justice. But that is an assumption. It also prevents people whose starting point is social justice, accidentally (or serendipitously) finding out how information literacy can affect social justice.
  3. This order of organising is largely through tagging and metadata, where things sit in a web of interconnections rather than in a hierarchical tree. Documents don’t have to sit in one place, they can sit in a massive pile of miscellaneous until someone requests them and the data can be served up to them how they’ve asked for it. So for example, rather than browsing iTunes by genre, you could browse by mood or songs inspired by cities or bands with a female vocalist.

It’s important to note, that Weinberger is not suggesting we rely on search engines like Google to simply mine the pile of miscellany and bring back results based on matching the text of a search term to text within a document. This will always be part of the mix but isn’t what Weinberger is particularly interested in. Even in an age of search engines, we still need metadata, especially for non-textual items where metadata carries an “information burden” (Zeng & Qin, 2016, p. 90). Though image recognition technology is developing. For example, Google Photos can find photographs of my children eating ice cream if I search for ‘ice cream’ in my Google Photos account. I haven’t sat and created metadata or tagged those photographs with the words ‘ice cream’ and yet Google does a good job of bringing me back relevant results. It does also bring back photographs of slushies, a piece of chocolate cake and my son as a baby eating a peeled banana. So the technology at the moment is hit and miss! Search technologies are increasingly combining (or doing both) text searching and searching metadata, plus using AI for image recognition.

 

The BBC embarked on a Linked Data project which sought to help machines understand what documents are about rather than simply read the text they contain. A simple example is understanding whether information about Sofia is about a person called Sofia or the capital of Bulgaria. The BBC used their metadata to hang information on many different branches rather than it having to sit in one place. This meant they could break down silos that occur with hierarchical organisation, so links can be made between news and television programmes for example. Users can pull together data in an entirely new way and find it via many different paths (Raimond, Y., Ramsden, D., Bartlett, O., & Angeletou, S., 2017).

 
Weinberger’s framing of the new digital disorder as a positive way of more easily connecting users to information is a compelling one, and is reflected in different aspects of the literature. However, Weinberger doesn’t treat third order organising to any kind of critical analysis. His book reads almost like a manifesto for tagging, especially user tagging. There seems to be an assumption both about how people consume information and indeed that people consume information, rather than interact with it. User tagging clearly has benefits. As Svenonius argues, the “oldest and most enduring source of problems that frustrate the work of bibliographic control” is language and the ambiguity of language (Svenonius, 2000, p. 13). Users may be able to better describe items than professionals, and since many tags can be applied to an item in the misc pile, the language doesn’t have to be scientifically precise.

 
However, in some instances it introduces an inherent ambiguity. Weinberger acknowledges this, saying “That ambiguity can be a problem if you have to find absolutely every resource available” but immediately discounts this criticism by saying no one really needs to find everything (Weinberger, 2007, p. 95). Weinberger focus is largely on the public and how they use the web to find information so this may be the case. But for me, I support healthcare professionals and researchers carrying out systematic reviews. These groups need to find literally all the information on a topic.

 

That’s partly why we rely on tagging in medical databases using MeSH headings for example but also why they are done by expert indexers. It should be noted the experts are not infallible but this provides much better than relying simply upon text searching because of the ambiguity of natural language inherent in medicine and health. Whilst user generated tagging could never effectively replace the second order systems in operation, it is possible that literature searching in health might be improved by introducing an additional set of tags that were user generated to work alongside existing methods.

 
Again there are other issues about user generated tagging that require deep reflection. As with most digital spaces and interactions, they often mirror those structural inequalities we see in the “real” world. Both Google and Flickr’s AI tagging systems have created controversy through accidental offensive and insensitive image tagging (Griffin, 2015 and Hern, 2018). User generated tagging without moderation could potentially be a whole lot worse.

 
Despite the challenges around the collapsing distinctions between retrieval and organising, and people and machines, we can overcome these by bringing together second and third order organisation. A point Weinberger makes about Amazon is that you can search metadata or you can follow a hierarchical tree-like structure of categories. You can go one step further and actually get expert and user generated metadata working alongside each other, for a far more effective search. If we can crack the ethics side of user generated tagging too, the outcome could be grand.

 

References:

Glushko, R. J. (2013). The Discipline of Organizing. Cambridge, Massachusetts: MIT Press.

Gregory, L. and Higgins, S. (2013). Information Literacy and Social Justice. Sacramento, CA: Library Juice Press.

Griffin, A. (2015, May 20). Flickr’s auto-tagging feature goes awry, accidentally tags black people as apes. Independent. Retrieved from http://www.independent.co.uk

Hern, A. (2018, January 12). Google’s solution to accidental algorithmic racism: ban gorillas. Guardian. Retrieved from http://www.guardian.com

Raimond, Y., Ramsden, D., Bartlett, O., & Angeletou, S. (2017) Linked data and the semantic web. Retrieved October 12 from https://www.bbc.co.uk/academy/en/articles/art20130724121658626

Svenonius, E. (2000). The Intellectual Foundation of Information Organization. Cambridge, Mass: The MIT Press.

Weinberger, D. (2007). Everything is Miscellaneous: The Power of the New Digital Disorder. New York, NY: Holt Paperbacks.

Zeng, M. L. & Qin, J. (2016). Metadata (2nd ed). London, UK: Facet Publishing.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s