The terms 'Digital Libraries' and 'Databases' are often used interchangeably, but they are not identical. Digital Libraries usually conform to a Metadata standard such as the Dublin Core. It is sometimes applied to any database that has a browser interface to access the data underneath, regardless of the nature of the storage mechanism.
The essential difference between these two names is that DLs have metadata and many follow the Open Archive Initiative (OAI) that enables metadata harvesting in a standard way. This means a user can search multiple sources without concern for the specific design of the data.
Databases are organized stores of data that designed to standards, such as Relational, Hierarchical, Object, or supercomputer file stores like SDSC's High Performance Storage System (HPSS). Most of the DB today follow the relational model, but new needs will breed new designs.
The interesting aspect of these two environments is that they are gradually converging to make widespread access easy, but from different directions with different techniques. While DLs are using XML metadata, scientific studies involving supercomputers have invented a technique named 'Federation'.
Federation solves two problems - providing a common access method to different databases of similar data and providing transparent access to very large distributed databases that would not be practical to concentrate in one location. Federation creates a multi-layer interface between the Internet and each DB, with common high level services and custom low level access to each different DB. Thus all of the code can be shared except the detailed access code for each DB.
DLs attack the same problem by making all of the layers code compatible and identifying the differences with XML metadata and schemas. Digital Libraries carry their common code deeper and have more flexibility by the use of XML. Techniques for DLs and DBs were developed independently, but the both are headed in the same direction. I expect to see a merger of techniques in the future - Federated Libraries.
The biggest of the scientific DBs contain scientific information, from physics accelerator experiments which have accumulated petabytes (10**15 bytes = 1 PB) of data, to astronomical data sets like the 2MASS, the Two Micron All Sky Survey at a mere ten terabytes (10**12 bytes = 1 TB). When all astronomical resources are digitized, it may well exceed the physics DB. Another huge DB is based on Earth Science, which includes details about our 25,000 mile diameter globe, the oceans, the atmosphere, and ghu help us, the weather.
A lot of Internet users probably don't know that the first browsers were created by physicists in order to share data among widely separated sources. From the Lineac at Stanford University, California, to the cyclotron at Brookhaven, Long Island, to the world's largest accelerator at CERN in Switzerland, physicists were creating huge files of data and photographs which were not easily movable to other locations because of cost.
One of the original browser intentions was to make this data available easily and transparently over the internet. I doubt any of the originators suspected just how fast their idea would be hijacked for other uses, and the spread of sites to broad range of subjects open to the general public.
Most people don't take advantage of the opportunity to look at what the scientists have made available that doesn't require advanced degrees to appreciate. They have missed much of interest and beauty in science. Here are a few samples.
The largest source of amazing and beautiful pictures comes from the astronomy community. NASA has a number of sites that access these pictures through large DBs. One of these is the Infrared Processing and Analysis Center (IPAC). The Outreach page leads to more than a dozen tutorials, activities and large image libraries.
Another NASA site, Space Science, has images from Hubble, Chandra X-ray telescope, the solar system planets, the photo gallery at the National Space Science Data Center (NSSDC), and the NSSDC list of other image sites. The NSSDC list has ten image sites, seven other resources and a lengthy list of still more image catalogs and resources. If you can't find it here, it may not exist.
A big astronomy site is the 2MASS database. This is joint project of the University of Massachusetts and the Infrared Processing and Analysis Center/California Institute of Technology. Currently at 4 TB with expected growth to 10 TB, this is an all sky survey at three infrared wavelength. When complete, it will catalog millions of sky segments and billions of star images.
The key to the potential of educational DLs is to be able to seamlessly search the collections of multiple online libraries. Operation across disparate sources of information requires some consistent form of indexing. XML schemas are being used to organize metadata because of its power in describing almost any form of content independently of the underlying databases.
This frontier in DLs is being pushed by a collaboration between SDSC's Data-Intensive Computing Environments (DICE) group, who are collaborating as part of the National SMETE Digital Library (NSDL) project with UC Santa Barbara's Alexandria Digital Earth ProtoType Project (ADEPT) led by Terrence Smith, the Digital Library for Earth System Education (DLESE), and the NASA-funded JOINed Digital Library for Science Education.
The SDSC collaboration is faced with a number of difficult problems to solve.
Part of the answer comes from Moore, a leader in the recent workshop.
"The key is that there is a need to characterize knowledge independently of the collections that hold the educational materials, so that this knowledge infrastructure will work across many digital libraries and collections from different communities."
Future integration of cross discipline DLs will enable researchers to go from metadata back to raw data for new analyses covering wider ranges of data than earlier work. This will lead to new discoveries from existing data as the range of data sources expands. Exactly what will be discovered is not yet known. As one scientist put it "If we knew what we were doing, we wouldn't call it research."
As DLs and Federated DBs converge, and XML schemas define the organization of knowledge, our ability to explore and discover new answers to old problems will make multiple jumps.
Already we have the beginning of metadata harvesting under the OAI access using Dublin Core and extensions to fit the subject matter. If it hasn't already happened, some new discoveries will come out of this early version of DLs in no more than a year.
Similarly, Federated DBs have only recently been coupled with distributed supercomputing to attack larger scale problems. As new supercomputers reach through the 10 Teraflop (10**12 floating point ops/sec, 1 TF) range to the 30 and 100 TF range in the next three years, scientists of all areas will dig out new discoveries in genetics, physics, earth science, medicine and more.
The discovery effect of more powerful computers will be multiplied by distributed hookups over faster Internet II and multiplied again by wider data access, and multiplied again by the new educational and discovery possibilities inherent in Digital Libraries.
While this won't happen overnight, nor even in Internet time, by 2003 we should see the first effects of this convergence. By 2005 it will impact our world, and by 2010, our current internet will remind us of the Morse telegraph and the Pony Express.
[30]