IFLA

IFLA Section of Geography and Map Libraries


If you have any suggestions, comments or items that you would like to see added to this collection, please direct your messages to: libmap1@uconnvm.uconn.edu

Title

Digital Map Librarianship: Metadata

Abstract

This paper, "The creation and integration of metadata in spatial data collections," written by Jan Smits for the 63th IFLA Conference, describes the developments in metadata, their different definitions in various fields, and their possible function and use in spatial data collections. It shows how metadata of different quality levels can be integrated to service users on different levels. Furthermore it gives a brief sketch of integrated spatial data infrastructures.

Introduction

For centuries libraries have deconstructed the information in the form of books, periodicals, cartographical materials, etc. They have reconstructed it into bibliographical and access data... cataloging. During this period most librarians have understood what was meant with a bibliographic description.

Since the 1990s, with the Internet coming online for large groups of the global community, everybody suddenly talks about metadata as if it were a phenomenon we knew nothing about before. And everybody talks so loud that you can hardly understand the words through the noise. Metadata seems to be a hype as users become aware that the anarchy on the Internet is not too bad for occasional surfers, but for those who seek information professionally it is very frustrating that there is hardly any systematization or structuring of the information available.

Therefore initiatives are being taken by professional bodies, public as well as private, to create structures by which the available information can be deconstructed and reconstructed in a sensible and recognizable way in order that the information available can be easier sought, evaluated and selected.

At the same time, especially in the field of spatial information, large databases became available which can be used as building stones for intricate and analytical processes. Initiatives arise, mainly from the public sector, to create clearinghouses for the transfer of these datasets or parts thereof. To be able to evaluate the fitness of use and quality of these data, before they are made available, it is necessary to be able to view a record which incorporates all the information necessary for these decision making processes.

The last development important here is the creation of Intranets. In comparison with the Internet the Intranets are secure data viewing and transfer networks for a specific group of users who have to abide by certain ground rules. Especially there where data (can) have a high added economic value Intranets are the only way to secure economic and judicial issues as users are usually contractual partners in the process or have to abide by the rules of their employment contract.

What is metadata

In essence metadata is data about data. This makes, however, almost all data except primary sources metadata and that is a bit too much. To cut a long story short we shall restrict ourselves to the documentary field. Here metadata is in first instance data which helps us to locate and select sources of information. They are identical to bibliographic descriptions together with access data. In libraries for a long time this was the only use for metadata, except when analytical bibliographies are concerned which might also support evaluation processes.

In this electronic age location and selection are only a few of the processes needed to come to a decision which source is best for the purpose information is needed for. In order to fulfill this function certain metadata also must make it possible to evaluate and/or analyse a source before a decision is made.

Metadata, however, is hard to define. This sometimes is such an ambiguous term that the Task Force on Archiving of Digital Information avoids this term (1). Fortunately I can make use of other bodies who do this work for us. I have selected two definitions of two main stream documentary fields with which we occupy ourselves. The first is the library field, the second the digital spatial community.

The first definition comes from the glossary of Biblink(2) D1.1 Metadata formats and reads as follows:

[Metadata is] information about a publication as opposed to the content of the publication; [it] includes not only [a] bibliographic description but also other relevant information such as its subject, price, conditions of use, etc.
The second is the working definition as adopted by the ICA Commission on Standards for the Transfer of Spatial Data at their Summermeeting 1996 in Den Haag, The Netherlands, and reads:
Metadata are data that describe the content, data definition and structural representation, extent (both geographic and temporal), spatial reference, quality, availability, status and administration of a geographic dataset.
It seems that between these definitions there is an apparent controversy with which we have been confronted before in our everyday practise. Or maybe it might be a lasting controversy between keepers of DLO's (Document-Like Objects) and keepers of images, though I do not hope so. Those who work with DLO's mainly focus on location and selection while those who work with images focus more on evaluation and analysis. In the analogue world books and periodicals contra cartographic materials.
Another contrast is that libraries do not see a main role for producers, while the spatial data community clearly reckons with the fact that producers will create the bulk of database metadata. The spatial data community also clearly aims at the transfer of the underlying spatial data.

But the soup might not be eaten as hot as it is served or in proper English: things are sure to simmer down. The latter seems to seep through the Biblink study, as they create a continuum from metadata for locational purposes to metadata for analytical purposes.

Types of metadata

This is illustrated in the diagram which shows a typology of metadata for cartographic and spatial data which I modified after one of the diagrams used in the Biblink study. The colour intensity shows in how far map curators probably will produce and/or use metadata. Easiest will be to follow the diagram in the quality levels to explain the differences.

Band One
These are simple records which are mainly used by robot search engines like NetFirst, AltaVista and Infoseek for full-text Internet indexing and tend to be associated with directory service protocols. They are mainly used by the unaware netsurfers who start out to explore the available information in a random way.

Band Two
These kind of metadata are at the moment probably the most on research agenda's of traditional metadata creators. On the instigation of OCLC workshops have been held since 1995 to try to find a modus in which these metadata can be formulated.
The first workshop held in Dublin, Ohio (U.S.A.) found concensus on a set of elements called since then the Dublin Core(3). It is intended to be sufficiently rich to support useful fielded retrieval but simple enough not to require specialist expertise or extensive manual effort to create. The participants agreed on:

  • A concrete syntax for the Dublin Core expressed as a Document Type Definition (DTD) in Standard Generalized Markup Language (SGML).
  • A mapping of this syntax to existing HyperText Markup Language (HTML) tags to enable a consistent means for embedding author-generated description metadata in web documents.

We can use the Dublin Core Elements to create metadata with the maps we put ourselves on the net. Most HTML-editors have possibilities to create templates which makes creation of these metadata easy. Dublin Core records can be used in the same way as tradional CIP-records, which in a later stage can be enriched to become a full MARC-record.

One of the remaining problems is to find a transfer syntax which makes it easy to embed the Dublin Core elements. This problem is treated in the paper A Proposed Convention for Embedding Metadata in HTML, which might result in a 'Dublin Core DTD [Document Type Definition]'.

The Dublin Core is not the only system available in Band Two. Other resource description models are RFC 1807 , IAFA [Internet Ananonymous FTP Archive] and SOIF [Summary of Object Interchange Formats] to name but a few, but for us the Dublin Core most probably is the nearest to our everyday practises, also because work is in progress to map the Dublin Core Metadata Elements to USMARC (4). Probably other MARC's will follow soon.

Band Three
In whatever way electronic publications and spatial databases will develop our main access vehicle most probably will remain the ISBD-description and MARC-formats. Seen also from the point of view that we would like to offer our users an access-continuum for manuscript and printed records to electronic databases in order that they do not have to search several differently formatted databases.

During the 9th Conference of the Groupe des Cartothécaires de LIBER in 1994 in Zürich, Switzerland, I have shown already that it is possible with ISBD and MARC to incorporate descriptions of dynamic electronic maps and databases in our current catalogue, making some proposals to adapt to both current and future practises. One of the preconditions for MARC's, however, should be that they can be easily extended, especially when it concerns coded data (Unimarc tag 100-199). All MARC's are in the process of adapting their format to be able to incorporate data ,verbal or encoded, which makes retrieval of electronic documents and data easier. They are also extending the formats to be better able to evaluate the bibliographic data.

As example I have included here an ISBD description and an Unimarc description of a dynamic spatial database.

Other formats in this range are TEI [Text Encoding Intitiative] independent headers and EDI [Electronic Data Interchange] messages.

Band Four
Digital spatial databases have been created from the late 1970's onwards and reach nowadays the stage where there is country-wide coverage (on municipal, provincial and state level) and through GIS they can be easily integrated with other databases. Usually they are a continuation of existing analogue processes, but for the fact that they are more in vector-format and are built up of layers of information which can be manipulated independently from each other or in concert with each other. To extend operatability a lot of producers are digitizing their existing analogue data, usually in raster-format.
Seeing the benefit of promotion and the need for a higher return on operating costs producers started to think of ways to access these data. As the economic stakes are higher then before they sought to create a description system which incorporates not only data usually associated with ISBD-descriptions but also data which could help their users to evaluate and analyze the fitness of use and quality of the digital spatial data offered.

National and international bodies were already in the process of creating transfer standards which sometimes also included portions concerned with metadata. As a result Pergamon offers on behalf of the ICA Commission on Standards for the Transfer of Spatial Data a publication(5) which describes and evaluates these standards.

The first to handle the problem of metadata standards integrally was the Federal Geographic Data Committee (FGDC, U.S.A.), which produced the Content Standards for Digital Spatial Metadata (June 8 draft).

The objectives of the standards are to provide a common set of terminology and definitions for the documentation of digital geospatial data. The standards establish the names of data elements and compound groups (groups of data elements) to be used for these purposes, the definition of these compound elements and data elements, and information about the values that are to be provided for the data elements.

The major uses of these metadata are:

  • to maintain an organization's internal investment in geospatial data,
  • to provide information about an organization's data holdings to data catalogues, clearinghouses, and brokerages,
  • to provide information needed to process and interpret data to be received through a transfer from an external source.

Soon after other metadata standards (6) became available or are in development.

The ICA Commission on Standards for the Transfer of Spatial Data has in its third cycle 1995-1999 as terms of reference, among others, to develop and publish:

  • characteristics relating to standards for specifying spatial data transfer metadata
  • descriptions of national and international metadata standards in terms of those characteristics

When we survey the latest document of the ICA Commission on the categories of characteristics for metadata, version 4.0 on which the relevant standards will be evaluated we come to the following categories:

  • 1. Administration of standard
  • 2. Use and implementation of the standard
  • 3. Linkage and coordination
  • 4. Identification of a dataset
  • 5. Status of dataset
  • 6. Data content
  • 7. Data quality
  • 8. Spatial data organization
  • 9. Spatial reference
  • 10. Availability and distribution of the dataset
  • 11. Authorization and verification

We may assume that all or most of these categories of information are present in the relevant standards. It is difficult to relate these categories to specific categories in specific standards. The resulting descriptions using these standards differ especially in categories 6, 7 and 8 from descriptions in Band Three, next to the fact that they usually carry far more detailed information(7).

However, these are descriptive standards comparable with the ISBD, but on a different level. What most of them lack momentarily is a format for processing and retrieval. As the FGDC standards specifically state: The standards do not provide instructions or techniques for its implementation and accordingly does not concern itself with the construction of databases for holding metadata.

Seen the fact that the formats needed will be far more sophisticated we cannot expect that present MARC-formats will be used to process the metadata. To prevent costly conversions to other formats it would be advisable that formats for Band Four descriptions (8) are compatible with the MARC-formats we use. At the same time revision of present MARC-formats to incorporate electronic documents should take into account that it should be possible to extract data from Band Four formats.

Integrated use of metadata

Band Two should be used for images we put on the Internet ourselves. The metadata may derive from existing ISBD(CM) descriptions. When the images are downloaded for archiving in the local electronic collection Band Two metadata should be enriched to create ISBD(CM) descriptions.

Band Three should be used for all images, whether analogue or electronic, which are part of the local collection. The metadata may be enriched Band Two data, prime ISBD(CM/CF) data or data extracted from Band Four descriptions.
In case of a Cartographic Information Center Band Three data may also pertain to items which are not actually held by the Center, as this functions as a one-stop-shop for more than one collection.

Band Four should be used for databases, raster or vector or a combination thereof and statistical data, where the metadata are mainly created to facilitate transfer of data.

When an institution uses Band Two, Three and Four metadata care should be taken that all processes and formats are as much as possible geared for one another. When different formats are used hyperlinks should be made possible between descriptions in the different bands for those users who want to have more or less detailed information. UKOLN (The UK Office for Library and Information Networking) upkeeps a site which shows literature concerning the interoperability of different metadata formats under the title "Mapping between metadata formats."

When gearing or hyperlinking does not happen then different catalogues have to be upkept with each its own processes, which most probably will diverge more and more from each other in time as technologies keep developing. Should it be possible that one format is used different kinds of metadata could be marked with a quality-level code.

While libraries occupy themselves intensively with developments in Band Two and Three this is unfortunately not the case with developments in Band Four. Up till now the only involvement I have observed is the Library of Congress and the Archives and Records Administration with the FGDC and the IFLA Geography & Map Libraries Section with the work of the ICA Commission.

Clearinghouses and geographic infrastructures

Creating metadata for digital spatial datasets is mainly done within an economic framework and involves mainly public and private producers with cost-effectiveness in mind. Should this be the main drive than the possibility exists that when the digital datasets do not have an economic value anymore they might be discarded, or at the best not very well looked after anymore. At the moment it is exceedingly doubtfull that producing agencies will create infrastructures that will keep the datasets fit for use for an unspecified period of time. Such a scenario should touch on a sore spot of the library- and archive-community. No single institution, however, will probably be able to set up an infrastructure which can collect and migrate all datasets produced within the geographical area it is reponsible for. As always with this kind of grand problems the magic word is co-operation.

The same drive which has created the realization of metadata is working to create infrastructures for viewing these metadata and for accessing or transferring the datasets themselves in this sequence. National and international bodies formulate policies which will result in national and international clearinghouses. In this virtual age those clearinghouses must not be looked at as a single material entity, but more as a digital, distributed entity using the same format and protocol. The clearinghouses most probably have an Internet interface for viewing metadata and an Intranet interface for transferring data.

Two initiatives are introduced here.

The National Spatial Data Infrastructure (NSDI)
From the document Data Policies and the National Spatial Data Infrastructure by Nancy Tosta, past staff director of the FGDC we can read the following:

The National Spatial Data Infrastructure (NSDI) is conceived to be an umbrella of policies, practices, standards, organizations, and data that contribute to improved availability and use of high quality geospatial data (9) and technologies. Although the effort to develop the NSDI is being led by the Federal Geographic Data Committee (FGDC), guided by existing Federal policies related to data dissemination, liability, and privacy, the NSDI is envisioned to encompass all data producers, managers, and users in the United States, regardless of organizational affiliation. Various programs begun under the NSDI include development of a national geospatial data clearinghouse(10), creation of a national framework digital geospatial data set, coordination of various themes of geospatial data such as soils, geology, and wetlands, and development and promulgation of standards for geospatial data collection and management. All of these efforts depend on various partnerships, agreements, and policies to share in the production, management, and use of geospatial data.

Extensive information concerning the development of the FGDC Clearinghouse can be found under Geospatial data clearinghouse activity.

Another initiative in this context is the National Geoscience Data Repository System which the American Geological Institute (AGI) is working to establish. A succesful feasibility study in 1994 led to further action. Maybe when the FGDC Clearinghouse is also evolving into a data-transfer system they can built on the processes developed by AGI.

European Geographic Information Infrastructure (EGII)
From the EGII Policy Document Towards a[n] European Geographic Information Infrastructure (EGII) of Directorate XIII of the European Commission I quote the following:

The key issue is to provide a broad, readily available, high quality platform of base data within a uniform infrastructure across Europe so that every market niche is open to every entrepreneur, so that existing data can be combined to provide valuable information and so that new data can be added effectively and immediately...
The EGII would be a stable, European-wide set of agreed rules, standards and procedures for creating, collecting, exchanging and using GI. The EGII would also ensure that European-wide base datasets are readily available and that metadata services exist so that such data can be easily located by potential users.

These are but a few of the initiatives which are being translated into national or global actions, as shown, for example, by the discussion paper of the Open GIS Consortium Building the GSDI by Lance McKee for the September 1996 'Emerging Global Spatial Data Infrastructure Conference'.

As far as I can notice the library and archives-community is neither not very active helping to materialize these infrastructures and clearinghouse, not to say inactive.

Digital spatial information and archiving

I'll try to sketch a scenario for libraries/archives in which I think the whole flow of activities from prime production to archiving the datasets is represented.

1. Production of datasets
The production of base and statistical digital spatial datasets is coordinated, at least for the public sector, by government policies. Private sector datasets will be produced, among others, with these datasets and will become also available for other users. At the same time all kinds of institutions, including libraries, will try to provide digitized datasets of older analogue materials as is currently happening in the project Digimap: National Online access to Ordnance Survey Digital Map Data in the United Kingdom or in the Alexandria Project in the Davidson Library at the University of California, Santa Barbara within the framework of building distributed digital libraries.

2. Production of metadata
Band Four metadata is produced by producers. They are delivered to a national clearinghouse for geographical information. Libraries should try to cooperate in creating metadata standards. Libraries will extract Band Three metadata and use that in their catalogues or national library networks. Those libraries which will hold (part of) these datasets will have a GIS-environment to enable users to work with these datasets.

3. Clearinghouses
A Clearinghouse is the central node in a geographic information infrastructure. In first instance the clearinghouse contains only metadata. Compatible formats have to be developed to hold and search the metadata. In a later stage the actual data itself is transfered through technology held by the clearinghouse. The clearinghouse can overlap with national library networks in which case Band Four and Band Three metadata can be closely associated.

4. Distributed use
Academic institutions and libraries and national/deposit libraries try to make contracts through coordinating bodies with producers for use of datasets and software in research. This will create distributed licenses. An example of such a contract is the British CHEST model (11). When possible the datasets will be running from a server with the national/deposit library in order to speed up transfer and migration processes and to harmonize procedures.

5. Archiving
During its economic life national/deposit libraries will work with all relevant datasets to learn how to cope with them and migrate them and keep them available for general use. When its economic life is ended, when the dataset is superseded or when the Archive Record Law demands that the dataset be transfered to the National Archive or Records Office the National Library and the National Archive in concert will try to keep this dataset available for general use. They will also be responsible for migrating the dataset.

When we take care that spatial data are delivered or converted to nationally and/or internationally recognized transfer standards migration may become easier as consecutive versions of these transfer standards most probably will incorporate former versions or take care for conversion. Metadata records may help us to evaluate whether transfers or conversions have been succesful. The chapter The challenge of archiving digital information of the document Preserving digital information by the Task Force on Archiving of Digital Information gives some insights into this problem and tries to describe some ways of solving it.

This agenda can be tentatively observed in policy documents like that of the NSDI and EGII, except for stages 4 and 5. Libraries should not wait till clearinghouses materialize themselves but must try to influence policies in such a way that they can secure use of superseded and discarded datasets in future. As a lot of this spatial information is in the public domain closer cooperation between libraries and archives is called for. This means an upgrading of our knowlegde of the production and use of spatial datasets. To learn the functioning of digital spatial metadata is a good begin of this shift in library policies.

Suggested Citation

Smits, Jan, "The creation and integration of metadata in spatial data collections. " Digital Map Librarianship: a working syllabus, 63rd IFLA Conference, Copenhagen, Denmark. (18, Aug. 1997) <http://magic.lib.uconn.edu/ifla/meta-smits.htm>

Jan Smits
Koninklijke Bibliotheek
The Hague
jan.smits@konbib.nl