IFLA Section of Geography and Map Libraries |
|
|
Title |
Digital Map Librarianship: Metadata |
Abstract |
This paper, "The creation and integration of metadata in spatial data collections," written by Jan Smits for the 63th IFLA Conference, describes the developments in metadata, their different definitions in various fields, and their possible function and use in spatial data collections. It shows how metadata of different quality levels can be integrated to service users on different levels. Furthermore it gives a brief sketch of integrated spatial data infrastructures. |
Introduction |
For centuries libraries have deconstructed the information in the form
of books, periodicals, cartographical materials, etc. They have reconstructed
it into bibliographical and access data... cataloging. During this period
most librarians have understood what was meant with a bibliographic description.
Since the 1990s, with the Internet coming online for large groups of the global community, everybody suddenly talks about metadata as if it were a phenomenon we knew nothing about before. And everybody talks so loud that you can hardly understand the words through the noise. Metadata seems to be a hype as users become aware that the anarchy on the Internet is not too bad for occasional surfers, but for those who seek information professionally it is very frustrating that there is hardly any systematization or structuring of the information available. Therefore initiatives are being taken by professional bodies, public as well as private, to create structures by which the available information can be deconstructed and reconstructed in a sensible and recognizable way in order that the information available can be easier sought, evaluated and selected. At the same time, especially in the field of spatial information, large databases became available which can be used as building stones for intricate and analytical processes. Initiatives arise, mainly from the public sector, to create clearinghouses for the transfer of these datasets or parts thereof. To be able to evaluate the fitness of use and quality of these data, before they are made available, it is necessary to be able to view a record which incorporates all the information necessary for these decision making processes. The last development important here is the creation of Intranets. In comparison with the Internet the Intranets are secure data viewing and transfer networks for a specific group of users who have to abide by certain ground rules. Especially there where data (can) have a high added economic value Intranets are the only way to secure economic and judicial issues as users are usually contractual partners in the process or have to abide by the rules of their employment contract. |
What is metadata |
In essence metadata is data about data. This makes, however, almost
all data except primary sources metadata and that is a bit too much. To
cut a long story short we shall restrict ourselves to the documentary field.
Here metadata is in first instance data which helps us to locate and select
sources of information. They are identical to bibliographic descriptions
together with access data. In libraries for a long time this was the only
use for metadata, except when analytical bibliographies are concerned which
might also support evaluation processes.
In this electronic age location and selection are only a few of the processes needed to come to a decision which source is best for the purpose information is needed for. In order to fulfill this function certain metadata also must make it possible to evaluate and/or analyse a source before a decision is made. Metadata, however, is hard to define. This sometimes is such an ambiguous term that the Task Force on Archiving of Digital Information avoids this term (1). Fortunately I can make use of other bodies who do this work for us. I have selected two definitions of two main stream documentary fields with which we occupy ourselves. The first is the library field, the second the digital spatial community. The first definition comes from the glossary of Biblink(2) D1.1 Metadata formats and reads as follows: [Metadata is] information about a publication as opposed to the content of the publication; [it] includes not only [a] bibliographic description but also other relevant information such as its subject, price, conditions of use, etc.The second is the working definition as adopted by the ICA Commission on Standards for the Transfer of Spatial Data at their Summermeeting 1996 in Den Haag, The Netherlands, and reads: Metadata are data that describe the content, data definition and structural representation, extent (both geographic and temporal), spatial reference, quality, availability, status and administration of a geographic dataset.It seems that between these definitions there is an apparent controversy with which we have been confronted before in our everyday practise. Or maybe it might be a lasting controversy between keepers of DLO's (Document-Like Objects) and keepers of images, though I do not hope so. Those who work with DLO's mainly focus on location and selection while those who work with images focus more on evaluation and analysis. In the analogue world books and periodicals contra cartographic materials. Another contrast is that libraries do not see a main role for producers, while the spatial data community clearly reckons with the fact that producers will create the bulk of database metadata. The spatial data community also clearly aims at the transfer of the underlying spatial data. But the soup might not be eaten as hot as it is served or in proper English: things are sure to simmer down. The latter seems to seep through the Biblink study, as they create a continuum from metadata for locational purposes to metadata for analytical purposes. |
Types of metadata |
This is illustrated in the diagram which shows a typology
of metadata for cartographic and spatial data which I modified after
one of the diagrams used in the Biblink study. The colour intensity shows
in how far map curators probably will produce and/or use metadata. Easiest
will be to follow the diagram in the quality levels to explain the differences.
Band One Band Two
We can use the Dublin Core Elements to create metadata with the maps we put ourselves on the net. Most HTML-editors have possibilities to create templates which makes creation of these metadata easy. Dublin Core records can be used in the same way as tradional CIP-records, which in a later stage can be enriched to become a full MARC-record. One of the remaining problems is to find a transfer syntax which makes it easy to embed the Dublin Core elements. This problem is treated in the paper A Proposed Convention for Embedding Metadata in HTML, which might result in a 'Dublin Core DTD [Document Type Definition]'. The Dublin Core is not the only system available in Band Two. Other resource description models are RFC 1807 , IAFA [Internet Ananonymous FTP Archive] and SOIF [Summary of Object Interchange Formats] to name but a few, but for us the Dublin Core most probably is the nearest to our everyday practises, also because work is in progress to map the Dublin Core Metadata Elements to USMARC (4). Probably other MARC's will follow soon. Band Three During the 9th Conference of the Groupe des Cartothécaires de LIBER in 1994 in Zürich, Switzerland, I have shown already that it is possible with ISBD and MARC to incorporate descriptions of dynamic electronic maps and databases in our current catalogue, making some proposals to adapt to both current and future practises. One of the preconditions for MARC's, however, should be that they can be easily extended, especially when it concerns coded data (Unimarc tag 100-199). All MARC's are in the process of adapting their format to be able to incorporate data ,verbal or encoded, which makes retrieval of electronic documents and data easier. They are also extending the formats to be better able to evaluate the bibliographic data. As example I have included here an ISBD description and an Unimarc description of a dynamic spatial database. Other formats in this range are TEI [Text Encoding Intitiative] independent headers and EDI [Electronic Data Interchange] messages. Band Four National and international bodies were already in the process of creating transfer standards which sometimes also included portions concerned with metadata. As a result Pergamon offers on behalf of the ICA Commission on Standards for the Transfer of Spatial Data a publication(5) which describes and evaluates these standards. The first to handle the problem of metadata standards integrally was the Federal Geographic Data Committee (FGDC, U.S.A.), which produced the Content Standards for Digital Spatial Metadata (June 8 draft). The objectives of the standards are to provide a common set of terminology and definitions for the documentation of digital geospatial data. The standards establish the names of data elements and compound groups (groups of data elements) to be used for these purposes, the definition of these compound elements and data elements, and information about the values that are to be provided for the data elements. The major uses of these metadata are:
Soon after other metadata standards (6) became available or are in development. The ICA Commission on Standards for the Transfer of Spatial Data has in its third cycle 1995-1999 as terms of reference, among others, to develop and publish:
When we survey the latest document of the ICA Commission on the categories of characteristics for metadata, version 4.0 on which the relevant standards will be evaluated we come to the following categories:
We may assume that all or most of these categories of information are present in the relevant standards. It is difficult to relate these categories to specific categories in specific standards. The resulting descriptions using these standards differ especially in categories 6, 7 and 8 from descriptions in Band Three, next to the fact that they usually carry far more detailed information(7). However, these are descriptive standards comparable with the ISBD, but on a different level. What most of them lack momentarily is a format for processing and retrieval. As the FGDC standards specifically state: The standards do not provide instructions or techniques for its implementation and accordingly does not concern itself with the construction of databases for holding metadata. Seen the fact that the formats needed will be far more sophisticated we cannot expect that present MARC-formats will be used to process the metadata. To prevent costly conversions to other formats it would be advisable that formats for Band Four descriptions (8) are compatible with the MARC-formats we use. At the same time revision of present MARC-formats to incorporate electronic documents should take into account that it should be possible to extract data from Band Four formats. |
Integrated use of metadata |
Band Two should be used for images we put on the Internet
ourselves. The metadata may derive from existing ISBD(CM) descriptions.
When the images are downloaded for archiving in the local electronic collection
Band Two metadata should be enriched to create ISBD(CM) descriptions.
Band Three should be used for all images, whether analogue
or electronic, which are part of the local collection. The metadata may
be enriched Band Two data, prime ISBD(CM/CF) data or data extracted
from Band Four descriptions. Band Four should be used for databases, raster or vector or a combination thereof and statistical data, where the metadata are mainly created to facilitate transfer of data. When an institution uses Band Two, Three and Four metadata care should be taken that all processes and formats are as much as possible geared for one another. When different formats are used hyperlinks should be made possible between descriptions in the different bands for those users who want to have more or less detailed information. UKOLN (The UK Office for Library and Information Networking) upkeeps a site which shows literature concerning the interoperability of different metadata formats under the title "Mapping between metadata formats." When gearing or hyperlinking does not happen then different catalogues have to be upkept with each its own processes, which most probably will diverge more and more from each other in time as technologies keep developing. Should it be possible that one format is used different kinds of metadata could be marked with a quality-level code. While libraries occupy themselves intensively with developments in Band Two and Three this is unfortunately not the case with developments in Band Four. Up till now the only involvement I have observed is the Library of Congress and the Archives and Records Administration with the FGDC and the IFLA Geography & Map Libraries Section with the work of the ICA Commission. |
Clearinghouses and geographic infrastructures |
Creating metadata for digital spatial datasets is mainly done within
an economic framework and involves mainly public and private producers
with cost-effectiveness in mind. Should this be the main drive than the
possibility exists that when the digital datasets do not have an economic
value anymore they might be discarded, or at the best not very well looked
after anymore. At the moment it is exceedingly doubtfull that producing
agencies will create infrastructures that will keep the datasets fit for
use for an unspecified period of time. Such a scenario should touch on
a sore spot of the library- and archive-community. No single institution,
however, will probably be able to set up an infrastructure which can collect
and migrate all datasets produced within the geographical area it is reponsible
for. As always with this kind of grand problems the magic word is co-operation.
The same drive which has created the realization of metadata is working to create infrastructures for viewing these metadata and for accessing or transferring the datasets themselves in this sequence. National and international bodies formulate policies which will result in national and international clearinghouses. In this virtual age those clearinghouses must not be looked at as a single material entity, but more as a digital, distributed entity using the same format and protocol. The clearinghouses most probably have an Internet interface for viewing metadata and an Intranet interface for transferring data. Two initiatives are introduced here. The National Spatial Data Infrastructure (NSDI) The National Spatial Data Infrastructure (NSDI) is conceived to be an umbrella of policies, practices, standards, organizations, and data that contribute to improved availability and use of high quality geospatial data (9) and technologies. Although the effort to develop the NSDI is being led by the Federal Geographic Data Committee (FGDC), guided by existing Federal policies related to data dissemination, liability, and privacy, the NSDI is envisioned to encompass all data producers, managers, and users in the United States, regardless of organizational affiliation. Various programs begun under the NSDI include development of a national geospatial data clearinghouse(10), creation of a national framework digital geospatial data set, coordination of various themes of geospatial data such as soils, geology, and wetlands, and development and promulgation of standards for geospatial data collection and management. All of these efforts depend on various partnerships, agreements, and policies to share in the production, management, and use of geospatial data. Extensive information concerning the development of the FGDC Clearinghouse can be found under Geospatial data clearinghouse activity. Another initiative in this context is the National Geoscience Data Repository System which the American Geological Institute (AGI) is working to establish. A succesful feasibility study in 1994 led to further action. Maybe when the FGDC Clearinghouse is also evolving into a data-transfer system they can built on the processes developed by AGI. European Geographic Information Infrastructure (EGII) The key issue is to provide a broad, readily available, high quality platform of base data within a uniform infrastructure across Europe so that every market niche is open to every entrepreneur, so that existing data can be combined to provide valuable information and so that new data can be added effectively and immediately... These are but a few of the initiatives which are being translated into national or global actions, as shown, for example, by the discussion paper of the Open GIS Consortium Building the GSDI by Lance McKee for the September 1996 'Emerging Global Spatial Data Infrastructure Conference'. As far as I can notice the library and archives-community is neither not very active helping to materialize these infrastructures and clearinghouse, not to say inactive. |
Digital spatial information and archiving |
I'll try to sketch a scenario for libraries/archives in which I think
the whole flow of activities from prime production to archiving the datasets
is represented.
1. Production of datasets 2. Production of metadata 3. Clearinghouses 4. Distributed use 5. Archiving When we take care that spatial data are delivered or converted to nationally and/or internationally recognized transfer standards migration may become easier as consecutive versions of these transfer standards most probably will incorporate former versions or take care for conversion. Metadata records may help us to evaluate whether transfers or conversions have been succesful. The chapter The challenge of archiving digital information of the document Preserving digital information by the Task Force on Archiving of Digital Information gives some insights into this problem and tries to describe some ways of solving it. This agenda can be tentatively observed in policy documents like that
of the NSDI and EGII, except for stages 4 and 5. Libraries should not wait
till clearinghouses materialize themselves but must try to influence policies
in such a way that they can secure use of superseded and discarded datasets
in future. As a lot of this spatial information is in the public domain
closer cooperation between libraries and archives is called for. This means
an upgrading of our knowlegde of the production and use of spatial datasets.
To learn the functioning of digital spatial metadata is a good begin of
this shift in library policies. |
Suggested Citation |
Smits, Jan, "The creation and integration of metadata in spatial data collections. " Digital Map Librarianship: a working syllabus, 63rd IFLA Conference, Copenhagen, Denmark. (18, Aug. 1997) <http://magic.lib.uconn.edu/ifla/meta-smits.htm> |
Jan Smits Koninklijke Bibliotheek The Hague jan.smits@konbib.nl |