Entropy, APIs and the Public Record vs the Right to Privacy

CityLIS Term 1 Week 4. In which we move on from the history of documents to the relationship between information, the universe and everything; we play with the shift from a static, publishing web model (Web 1.0) to a service oriented, participatory web model (Web 2.0) by exploring web APIs and mashups; #citylis went to Internet Librarian 2014 (#ili2014), European Conference on Information Literacy 2014 (#ecil2014) and supported Open Access Week (#oaweek); we explored the tensions between freedoms of speech and information and data protections and the right to be forgotten; and we thought about ‘asking’ as research method.

Let’s Get Meta-Philo-Physical

After completing the history of documents Lyn Robinson turned to philosophy and as many sciences as she could throw at us in one afternoon to explore definitions of information, and the gaps between these definitions, across multiple domains. We covered Liebenau and Backhouse and their semiotic theory of levels in understanding information, Popper’s three worlds, Shannon’s 1948 Mathematical Theory of Communication, Professor Brian Cox on entropy and Sir Paul Nurse on Biology as organised systems of information. Not forgetting Luciano Floridi and his philosophy of information.  The book chapter David Bawden and Lyn Robinson wrote on conceptualisation of information across domains is well worth a read.

“We are faced with two kinds of gaps: the gaps between the concepts of information in different domains; and the gap between those who believe that it is worth trying to bridge such gaps and those who believe that such attempts are, for the most part at least, doomed to fail.”

Robinson and Bawden (2013).  Mind the Gap: Transitions Between Concepts of Information in Varied Domains

After being fairly comfortable with history this was fairly mindblowing – in a good way. We discussed information as difference (which I had to write down in three different ways to get my head around) and also information, entropy and the constant interplay of order and disorder. Is there more information in low order/high entropy systems, as Shannon argues, or is there more information in high order/low entropy systems?

It is counterintuitive to think that as the disorder and uncertainty around the arrangement of documents increases the amount of information increases. In LIS, we instinctively think that as order increases so does information. This may not be true. Findability may increase but this may not be the same as information.

Perhaps one of the compelling things about big data is the insight that comes from mining data that is more disordered than in a traditional database. Therefore, there is more to be uncovered about the possible arrangements of things within: hence being able to find more information using NoSQL techniques across a large unstructured corpus than using SQL techniques across a database ordered according to a particular scheme. Alternatively there is no information in big data until order has been found using complex algorithms and approaches (e.g MapReduce).

Blogging Mashup Mixtape Party

In the digital world I reached back towards my love of mixtapes to explore the present Web 2.0 possibilities for mashups by using open content, licensed for reuse, and web services. This was huge fun an involved creating Spotify playlists (including my mashup mixtape and cityLIS radio), Twitter widgets, watching Ted Talks, turning my websites into pictures based on human DNA, playing with WordPress shortcodes and sticking all of them together. Also discovering someone has hacked together a cassette player and tapes as a controller for Spotify playlists using Raspberry Pi. Very cool.

In fact, there were numerous music related API and mashup posts across the DITA blogosphere.

To Know and Forget

This week’s information management and policy session was on Information Law and there was a really interesting discussion about the issues arising from the European Court of Justice ruling (ECJ C–131/12) in the case of Google Spain SL and Google Inc. v Agencia Española de Protección de Datos (AEPD) and Mario Costeja González. This ruling allows individuals in Europe to request that Google remove links from search results to content about them published on the web as part of the European Data Protection Directive (95/46/EC).

“The internet has revolutionised our lives by removing technical and institutional barriers to dissemination and reception of information, and has created a platform for various information society services. These benefit consumers, undertakings and society at large. This has given rise to unprecedented circumstances in which a balance has to be struck between various fundamental rights, such as freedom of expression, freedom of information and freedom to conduct a business, on one hand, and protection of personal data and the privacy of individuals, on the other.”

European Court of Justice Opinion ECLI:EU:C:2013:424

Our discussions ranged over the practical issues, the various roles of publishers and information indexers and mediators, such as search engines, and the ethics and the debate in the public sphere is also ongoing as the many parties involved attempt to implement and digest the ruling.

The European Union has produce a Mythbuster and a Factsheet to help with interpretation.

Google publishes a transparency report on their impementation of the ruling and has also assembled an advisory council to guide it. The council holds a series of public meetings across Europe and invites contributions from members of the public.

Luciano Floridi, a member of the Google advisory council, popped up again with an article in The Guardian considering the right to be forgotten as an exercising of power over information that needs to be carefully considered.

Floridi argued that publishers should have more of a say, a sentiment echoed by the BBC and The Guardian with the BBC saying they will beging to maintain and publish a list of their content for which they have received removal notifications.

Digital Flânerie


Image Credits
Featured Image: time disappears by Travis Miller. Source: Flickr. (CC BY 2.0)

Needs to Knowledge Past and Future

CityLIS Term 1 Week 3. In which we completed the story of documents from the dawn of time to the present day and discovered everything connects; I found out how catalogue cards connect with the pre-history of the web; the Economist wrote about the Future of the Book and played with it’s form; we learnt about asking questions and finding answers using databases and information retrieval and knowledge management.

CC-BY by John Blyberg. Source: Flickr
CC-BY by John Blyberg. Source: Flickr

Inspired Library and Information Science Foundations (LISF) and the story of documents Part 3 this catalogue card shows us the use of classification schemes within a cataloguing code using a 20th century format, the index card.  It also provides some additional user created metadata added to the official typed record.  An added identifier is “the Lemur Book” referring to the animals that usually distinguish the cover of an O’Reilly book.  We also see something written on that links into the information retrieval themes covered in the Digital Information Technologies and Architecture (DITA) information retrieval themes and the contextual siting of search around a seeker and their information context and needs: “What we find changes who we become”.  This image itself was found by practicing information retrieval techniques from the DITA lab session.

Linked History

Yes in this week’s LISF lecture we completed our history of the story of documents taking is from the enlightenment to the present day in the ongoing quest for bibliographic control over the world’s knowledge.  This featured much coverage of the 19th century and Victorian pioneers who laid down such robust foundations for modern library and information science they are still the cornerstones of the discipline to this day.  This includes intellectual tools such as catalogues, classification schemes and memory institutions such as the British Library and the public library network.

The Ancestry of the Web

These themes were reinforced in Week one of the FutureLearn MOOC Web Science: How the Web is Changing the World from the University of Southampton.  I watched a Lecture (activity 1.10) by Professor Les Carr on the pre-history of the web.  This discussed familiar territory now including Paul Otlet’s Mundanaeum and Vannevar Bush’s Memex.  He spoke of the importance of the Mundanaeum not just as another attempt to collate the world’s knowledge but also stressed new intellectual tools: librarians, queries, and technologies: the index card.

Query became part of the bibliographic record.  Content was interlinked.” – Professor Les Carr

He also spoke about the 1937 idea by H.G Wells to use microfilm to capture all the world’s knowledge as The World Brain, a permanent encyclopaedia.

There is no practical obstacle whatever now to the creation of an efficient index to all human, knowledge, ideas and achievement” – H.G. Wells

We then passed through the emergence of the internet, a network of network, inspired by the work of computer scientists such as Vint Cerf towards the emergence of the web.  Despite this lineage from the attempts for bibliographic control and capturing all knowledge the web this wasn’t really the impetus for the web.  The web was intended to solve information management problems at the CERN research lab in Geneva.

The web’s architecture contained three core ideas that realised and embedded interlinking and querying in the digital record:

  • URIs/URLs – the idea that everything has a unique identifier
  • HTTP – a mechanism for allowing clients and servers to communicate via the internet
  • HTML – the ability to encode document structure and links to related documents in a simple markup language

From Geneva it expanded throughout the scientific research community and was then given to the world.  As Tim Berners-Lee famously said: “This is for Everyone” and everyone took it and used it for new and different purposes extending the web into the information service we have today.

If you are not taking #FLwebsci yet register quickly and catch up before it closes.  It’s a well put together course with great discussions going on as participants share their thoughts and experience.

The Future of the Book

Lyn’s whole epic narrative arc of documents from the ancient world through to the world wide web was also supplemented this week by an essay published in the Economist on the Future of the Book called From Papyrus to Pixels.  The article itself is a fascinating read connecting books past, present and future and discussing the connections between formats, technologies, authors, readers and publishing business models to trace things that endure, things that may change and things that may fade and revive.  For all that has changed the essence of the book as a route to pleasure and for encouraging connections between people and knowledge persists across millennia.

Books will evolve online and off, and the definition of what counts as one will expand; the sense of the book as a fundamental channel of culture, flowing from past to future, will endure.” – The Economist.  Future of the Book Essay.  From Papyrus to Pixels.

Interestingly the essay is also provided in three formats:  an audio version, an ink stained, coffee ringed skeuomorphic virtual book and a web page.  It was noticeable when I first encountered this information presentation that my first thought was to call it a ‘traditional’ web page.  I clearly thought using the web to deliver audio or digital reconstructions of a retro physical paper format to be more cutting edge.  The web succeeds most when it takes what was best about old formats and technologies (codices, radio) and brings them them forward to the web creating richer ever more intricate and converged documents.  I still find turning pages (even fake ones) more immersive and a two page layout in soothing black and white more engaging than scrolling through a long single column of text with brightly coloured images, headings and marginalia.  How technically and conceptually clever of them to prompt such debate even before a word has been read.

Finding and Knowing

Over in our cityLIS digital world we covered databases, information retrieval and the precision of search engines.  I had never paid such close attention to the practice of searching before.  Perhaps I have become a lazy searcher carelessly tossing free text searches into the most obvious search box and uncritically accepting what comes.  Thanks to this week’s lab I paid close attention to different types of information need, to different search methods for information retrieval, the precision and recall of different search engines and came up with some varying conclusions.  This also came up in our research methods class where we were introduced to Cyril Cleverdon who was the first person to suggest formal testing of information retrieval systems and developed the measures of precision and recall as part of his investigation into the comparative efficiency of indexing systems.

Cleverdon is an entity in Google’s Knowledge Graph and bridging the gap between information needs and knowledge was another theme of the week.  This connected into our Information Management and Policy lecture on Knowledge Management that was given by guest Lecturer Noeleen Schenk from Metataxis.  In this session we covered some of the models, benefits, drivers, tools and challenges involved in managing knowledge within organisations.

Setting Out

I have been writing about various, thoughts, ideas, work and research in various places over the years.  My interests span many disciplines from history to sociology to software engineering to information science.  I describe my journey as:

I am a historian who became a social scientist who became an information technologist who became a business analyst who found information at the heart of all these things so went back to my first love – archive and information science.

I’ve now decided to start a single blog to write about all these things whether it be academic study or reflective practice stemming from professional experience.  It will be a professional blog about my life’s work; a space to write about my continuing intellectual voyages of discovery from this point.  Previously I was worried about keeping my interests separate and targeted at very different audiences.  Now, in true interdisciplinary fashion, I’m more concerned about keeping them together and exploring bigger pictures.

Sections

The blog is organised into three top level sections covering my three main areas of interest.  These are (in order of current priorities):

  • Iddilica: The Art, Science and Ethics of Information Gathering
  • Culturion: Culture, History and Sociology
  • Addylica: Analysis Programming and Design

Themes and topics I’m particularly interested in at the moment are wide ranging and include:

Themes

  • Self and Society in the Age of Digital Reproduction
  • Surveillance Society; Expose Culture: What do we Mean by Privacy in the Internet Age?
  • Information and the Practice of History:
    • The Right to Know and the Right to Forget (ECJ C-131/12)
  • Freedom of Information and Freedom of Speech
  • The Ideal of the Commons
    • Data. Commons
  • Research Data Management
  • Data Science for the Social Sciences
  • Digital Curation and Preservation
  • Data and Metadata
  • A Social History of Innovation
  • Cartographers of the Digital Age
  • Quantified Self
  • The Evolution of the Internet/Web
    • Web 1.0 Searching *The Internet of Documents*
    • Web 2.0 Social *The Internet of People*
    • Web 2.5 Spatial *The Internet of Places*
    • Web 3.0 Sensing *The Internet of Things*

Topics

  • Me, Myself and Everyone: Identity Curation in the Networked Society
  • The Sensing Web: The Emerging Significance of the Internet of Things
  • The Sensing Web: The Curation and Preservation Challenge of Big Data
  • Realising the Memex: Linked Data, Associative Indexing and Digital Information Management
  • From Liked to Linked: Assessing the Emergence of Web 3.0
  • One Web (Connected Knowledge for People and Machines): The Implications for Catalogues and Cataloguing
  • Web 3.0: Mapping the Shift from Document Thinking to Data Thinking and the Significance for Libraries and Information Centres
  • Wayfinding the Digital Commons: Link Curation and Connected Knowledge
  • Authenticity and Continuity in the Age of Digital Reproduction
  • The Shelfie and the Patron Record: Protecting and Sharing Identity via Reading Patterns
  • The Privacy Paradox: Surveillance Society; Expose Culture and the Disciplinary Power of Identity Construction
  • Architecture and Usability of Academic Discovery Systems
  • Open Access: Authentication and Authorisation Barriers Accessing Resources
  • Shift to Full Lifecycle Research Data Management (Data + Pub)
  • Bibliographic Data as Linked Data
  • Understanding the Implications of ECJ C-131/12 for Archiving, Cataloguing and Information Seeking
  • Architecture and Usability of Academic Discovery Systems
  • Open Access: Authentication and Authorisation Barriers Accessing Resources
  • Navigating Library Ecosystems