Digital Libraries (Dlibs) are organized collections of information in digital form that is usually meant for direct viewing. The definition of a Library includes an equivalent of the classic library card index. Items in a card catalog can be found in the index by title, subject or author. Digital Libraries are more complex entities that have exploded from research efforts into popular use because of recent standard developments.
Digital Libraries are no longer experimental, one-off projects. There is a rapidly growing selection of software for DLibs that is either open source or free for non-commercial use. To be a Dlib requires a catalog that is easily accessable from the Internet. Dlibs usefulness and popularity stem from a group of related standards that make the key element, the catalog index, contain elements that are accessable through a standard protocol. This whole protocol stack, like the TCP stack, enables a new class of knowledge applications.
Digital Libraries are a rapidly growing resource on the Internet. This explosion of resources, detailed at TeraText, shows 126 Dlib sources and almost two million document records. [http://www.teratext.com.au:8123/public/collDetails;collection=OAI] All of these records have been collected (harvested) by TeraText using the OAI-PMH standard to access the many Dlibs.
Open Archive Initative (OAI) [http://www.openarchives.org/] is the core Dlib standard that has enabled rapid growth of Dlibs. OAI has in turn developed its OAI Protocol for Metadata Harvesting (OAI-PMH) [http://www.openarchives.org/OAI/openarchivesprotocol.html] standard. OAI-PMH is what enables a single format for access to Dlib index records, enabling the ability to aggregate DCMI index records from many OAI libraries into a combined or federated search service.
Dublin Core Metadata Initative (DCMI) [http://dublincore.org/] defines the minimum index requirements for documents. Ten specific fields must be created, but this does not limit what additional index fields may be added. Many organizations have defined specialized DCMI fields that are standard for their subject area.
Dlibs subscribe to at least the DCMI base requirements for metadata about the items in the library. These base elements define: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, and Rights.
DCMI records are created in a specific XML structure designed for indexing. The basic standard, eXtended Markup Language (XML) [http://www.w3.org/XML/], has already changed the way many web sites create pages and has become a standard writing and publishing format.
The broad availability of information in Dlibs depends on standards which enable a common access format, OAI-PMH, to each library index. OAI-PMH enables easy access to index records describing each document in each Dlib. This access makes sites with federated indexes possible, where users can do a single search on multiple library indexes for documents. A single index site makes finding the document needle in many library haystacks much easier.
In addition to the powerful indexing and searching functions, Dlibs have other major advantages that will make them dominant in the near future. First is the ease and speed of access from any location with Internet access. Second is the speed of update as new items become available. Third is library items are never out on loan or lost, and fourth, expensive real estate and staff requirements are greatly reduced.
What is particularly beneficial for the public is the combination of local libraries with Internet access to regional collections indexed on line, making access to a large collection of printed books and Dlibs easy. Now all a small town needs for a library is a room with a bookshelf, a couple of computers with Internet access, and one staff person to help people use the systems. The long term benefit for literacy should be significant.
The Internet literature on Dlibs is expanding rapidly. Collected at the end of this column are a number of links, organized by category, with a brief description of what the site covers. The links will provide an introduction to Dlib essentials with several sample sites. Beyond that, follow your interests to explore the Dlib universe.
In addition to being Internet accessable, Dlibs include information that has been the province of university libraries until now. Collections of scientific papers and papers of all sorts - technical, artistic, exploratory, historic and news - open a new door for public access and investigation. You can check out some DLibs listed in References under Digital Libraries.
Companies may OAI index their own proprietary data, making it available in abstract for free, and by payment for the full document. This mixed mode of access enables finding all relevant data, yet allows companies who do research to profit from access to it, and the public profits from the wider availability of that information.
It is likely that some side effects of this meta search capability will be the reduction of duplicated developments because of easier discovery, and the rediscovery of lost gems of information hidden in the mountain of unsearchable data.
New employment opportunities will arise where people with an in depth knowledge of a general area, say astronomy, can become specialists in information discovery for astronomers who need specific information quickly. This will increase the availability of data for scientists, engineers, politicians and others who do not have the time or interest to do data discovery themselves.
Dlibs are based on open standards and open source software. This does not prevent clever people from extending and customizing search software for a profit, nor individuals from using the free software to build their own meta index.
Take as an example my interest in physics. By downloading index collection software and collecting the physics index from all public Dlibs, I can search and run correlations locally on specific subjects to gauge development of more advanced computer technologies. I've been doing this with paper, then on personal computers since the 1980s, and on the web since the early 1990s. The mess that has resulted makes a lot of this information slow to find, easy to lose and difficult to store.
Finally the personal meta index makes organization possible in a unified way. It was possible to organize data before now, but any design would have been too specialized, and probably not too extensible. The real killer to any such project was the sure knowledge that and such effort would have such a small portion of available information as to make it almost useless. See Dlib Software in the References.
With Dlibs and OAI, anyone can set up a system to collect and organize information they are interested in. I won't claim this will be easy at first, but it is possible. Students won't have to physically go to the library to find lots of basic information. Indeed, their main problem will be to make sense of the volume of data. This is where education guidance will really help, teaching them how to generalize, then focus on the specifics.
Similarly, companies can and should set up information specialists who can provide rapid responses to technical and business information needs. This will enable executives to guide their company with more complete and up to date knowledge of the environment they work in. It should significantly increase competitiveness of the companies that blend this new IT skill into their executive ranks.
The rapid growth of Dlibs are one of the early results of open standards based development of architectures and software. Without these standards, we would have a few expensive islands of Dlibs which could only be accessed through special systems designed for each one.
I think it is appropriate to call the structure of Open Standards and Open Software around Digital Libraries an Open Software Environment (OSE). It is the OSE that encourages collaboration on challenges too big for any single corporation, and too broad for any simple standard.
In short, the explosion of Digital Libraries and services that we will benefit from are dependent on open standards. Dlibs are just a taste of what will be possible as open standards gain traction and software development turns away from proprietary standards. See Dlib Core Standards in the reference links.
In the broader view, Dlibs represent the first flowering of the huge potential of Open Software Environments when given time for development. It isn't just one standard. Dlibs depend on a stack of standards starting with XML, which has become firmly established only in the last two years. RDF, DCMI and OAI standards are stacked on top of XML and each other, and only this combination of standards makes a Dlib standard flexible and powerful enough for global use.
Microsoft has supressed competition for more than a decade. The IT industry has reacted like a biological organism by developing an evolutionary branch - the Open Software Environment. This new branch grew slowly for several years and only five years ago came into general public view.
Now we see the first major flowering of the OSE - Digital Libraries. This is no longer a twig, but a sturdy young tree that will someday eclipse the dominant giant.
http://www.openarchives.org/ (Open Archives Initative, OAI Community Center) http://www.openarchives.org/OAI/openarchivesprotocol.html (OAI-PMH: OAI Protocol for Metadata Harvesting Document) http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-general OAI-general@oaisrv.nsdl.cornell.edu (OAI mailing list) http://dublincore.org/ (DCMI: Dublin Core Metadata Initative) http://dublincore.org/documents/1999/07/02/dces/ (The specifications for the DCMI core elements)
http://kepler.cs.odu.edu (Kepler - A DLIB for individuals) http://www.greenstone.org/english/home.html (Digital Library Sfw) http://www.eprints.org/ (Self Archiving and Open Archives, ePrints home) http://sourceforge.net/projects/oaiarc/ The Digital Library Group in Old Dominion University is pleased to announce the availability of OAI compliant Arc through SourceForge. http://elib.cs.berkeley.edu/src/ This page links to source code developed at UC Berkeley for the Digital Library Project and other related projects.
http://www.loc.gov (Library of Congress Dlib - 7.5 million items) http://comm.nsdlib.org/ (The National Science Digital Library) http://dlp.cs.berkeley.edu/ (The Digital Library Project of UC Berkeley) http://www.arxiv.org/ (ePrint Archive for Math, Physics & Computer Science)
http://www.escholarship.cdlib.org/eprints.html (Scholar led initatives in Scholarly Communications) http://www.dli2.nsf.gov (Digital Libraries Project Phase2) http://opcit.eprints.org/ (The Open Citation Project) http://www.iei.pi.cnr.it/cyclades/ (Open Collaborative Virtual Archive Environment) http://www.smete.org/nsdl (The National SMETE Dlib Projects Database) http://www.si.umich.edu/UMDL/ (University of Michigan DLIB Project) http://www.oaforum.org/workshops/ ("Open Access to Hidden Resources" - the 2nd Open Archives Forum Workshop, Lisbon, 6-7 December, 2002)
http://www.octavo.com/ (Octavo, Dlib publisher of historic books including the Gutenberg Bible) http://www.dlib.org/ (DLIB Magazine) [30]
All content on this site is Copyright 2001 and 2002 by Bill Nicholls