Introduction

Beginning with a conceptual overview we have all of the Internet illustrated as "Information Space." Until recently most attention was paid to network clients and the client/server relationships. As a result we have many applications for browsing the internet. However, for the purpose of this presentation I would like to ask you all to consider these general challenges. Then we will explore each of the four elements indepently and finally bring them all together in a seemless environment.

The general challenges are these:

(1) We need to have simple means to electronically publish information from any possible source.

(2) We must be able to recall / retrieve that information with any means that we have available.

(3) We should be able to make use of the information with any network client or appliance that we have access to.

We must be able to use natural language or other facilities to submit a query to the collection of information with useful results.

Publishing

Trying to keep pace with advances in information technologies can be frustrating at best. We have seen how information providers have struggled to keep collections of data and information in formats that can be used in as many venues as possible. Since the early 1990's there have been a number of different file types that have been used for disseminating information over the network. These include postscript, gopher, ASCII text, compressed, MIME, PDF, and now the ubiquitous HTML. There are many utilities that convert from one file type to the next. Some of these permit batch jobs that will convert entire directories or systems of data. Many require manual handling.

As the Internet increases in popularity we are finding that many of the new users of the network have fewer techical skills than were traditionally associated with information technologies. The challenge of how to enable these persons to publish information is one that is being addressed by a project here in Colorado.

Imagine a scenario where clerical personnel are responsible for creating documents with a word processor. These documents, such as meeting agendas, are published on a regular basis to the local community world wide web pages. Each week a new agenda is published and the old agenda is moved to another location to be archived.

This Integrated System for Distributed Informatoin Services is based on an emerging standard known as the Distributed Information Processing Protocol (DIPP).

In it's simplest form DIPP takes input in the form of files that have been submitted by email, ftp, or other transfer methods. Once the submitted file has been authenticated, agents then work to transform the information into a pre-determined output. Additional activity can be initiated to ensure that older files are archived or that proper file permissions have been set. Essentially, DIPP automates the necessary procedures to transform a word processing file into a set of web pages with links to related materials.

Although the current emphasis of DIPP is to publish HTML documents we are already beginnning to explore other uses of this system. Another possible use is to accept input in most any format that would then be stored in one particular format. When a query is submitted by a particular client, the DIPP engine then can transform the stored content into what ever specific form is required by the remote user.

You can find out more information about the Integrated System for Distributed Information Systems and the DIPP engine at: http://www.colorado.edu/DIPP.

Retrieval

As I mentioned earlier, the current emphasis for navigating information on the Internet is any one of a number of clients. Most popular at the moment are two world wide web browsers: Netscape and Microsoft's Internet Explorer. But, not everyone has access to a graphical browser along with a high speed data communications line. Therefore, I would like to suggest that we will see an increase in bridging technologies. These technologies will permit data and information from Information Space to be retrieved in different ways.

In the United States the telephone is one of the most pervasive business communication tools. Following that is telefacsimile. It would be most interesting and useful if a person were able to use their telephone and / or their fax machine to retrieve information from the Internet. In fact, that is happening already in small controlled instances.

What this means is that persons would not be prevented because of financial or technical barriers from having access to information that is currently only available to those with Internet browsers. Granted that color, audio, and now video are rapidly becoming necessary elements for most web pages. However, there has been basic content that can still be provided via fax or posted hard copy.

The retrieval method is placed between the Publishing and the Network Clients on my chart because, in fact, the software that clients use to query and browse the information space will also serve the non-graphical community. Also, the transformational processes of the DIPP engine can just as easily produce plain text or send jobs to printers as it can produce HTML and graphics.

Network Client

I have already mentioned the network clients somewhat. What I would like explore briefly here are two elements that will have important impact upon the future of networked information discovery and retrieval. First, as a major part of the commercialization of the information space we are seeing many "solutions" in search engines. Second, when we realize that progess is a continuum we ought to consider some strategies that will enable us to provide and use information resources now and 5 years from now.

There is not a lot that I can say about the commercialization of the Internet and Information Space. In the United States newspapers report about the new "Gold Rush" in cyberspace. Many new companies are being started to provide services for the millions of users rushing to sign up for the Internet. One of the fastest growing areas of service are the search engine providers. Only a year and a half ago there were relatively few. Now we have 10 or 15 major search engine providers competing for attention, dollars, and data.

This will certainly be an area that we need to pay attention in the coming months. Currently many of the systems are designed to "find" resources such as web pages. But, I am not sure that they are going to be sufficient for finding information from different types of collections.

The present attitude is that all we need is full text search and natural language queries. This is all right for much of the web HTML materials, but the world is made up of more than just HTML. Careful consideration needs to be focused on what types of information as well as how that information is used. I will get back to this with the issue of Discovery.

The second issue surrounding the network client is the issue of rapid obsolescence. Technical advances seem to come every minute of every day. There should be a way that we could have some assurance of maintaining a common set of functions from one generation to the next. One model that I set forth to address is what I refer to as the Black Box Model.

As a child I had a toy. It was a plastic Black Box with a lever sticking out of the top. When you would push the lever to one side, the box would begin to make a noise, bounce up and down, and then a small green hand would push the lever back to turn itself off. The Black Box was a small mysterious self-contained unit.

In a similar manner we now have electronic Black Boxes attached to appliances, automobiles or other complex devices. We do not know exactly what happens in these units, but we do know that without them the equipment would not work.

I suggest that there ought to be a Black Box for networked information discovery and retrieval. For the past two and one-half years I believe that the Z39.50 standard for search and retrieval could be used to provide such a platform. While at the Center for Networked Information Discovery and Retrieval we demonstrated how using a Z39.50 client and server we could access a legacy database operated by the US Patent and Trademark Office. This data collection was hosted on an old mainframe that permitted only 8 con-current users over 9.6 kilo baud leased lines. In other words, it was old technology. At CNIDR's request 1500 patents, associated images, and related data were transferred manually from the mainframe to a UNIX environment. Using the same Z39.50 client and server technology one can now access the US PTO demonstration via the Internet. When we first announced this we had hundreds of searches each day. Meanwhile, a person with the proper authority could use the client to access the original collection, if he or she so desired.

What this means is that not only can we provide access to collections that are old, but that we are not necessarily limited to a single, specialized client. There are now Z39.50 clients that talk directly to world wide web browsers, that can do searchers via email, and, of course, be found in library automation software. So, now there are multiple clients accessing multiple sources.

Reflecting on the US PTO experience there is one other lesson to be learned. While moving data and information forward to be used on new equipment or by new software the Black Box model also provides a bridge to the past or to the legacy systems. This bridge can be sustained until the new information source is ready to be used.

Discovery

Lastly I come to Discovery because it is one of the most difficult issues that we are attempting to solve. I use the term here as applied to search the global information space, the Internet. At present it seems that one almost needs to know the answer before asking the question when looking for stuff on the Internet.

Personally I have found myself moving away from using those services that have collected the millions of pieces of information. Lately I have had more "luck" or satisfaction with services that claim to be more focused. Even then I most often find that it takes visiting two or three services before finding the "right answer".

There are some that I have discussed this issue with who say that what is needed is not just new search engines or search software, but also a better way to track the user's patterns so that the search parameters more closely match a larger set of requrements from the user's perspective.

There are some services on the Internet that are beginning to use the "Magic Cookie" technology in world wide web to track how a user navigates through the information space provided by that service. Other companies have you fill out a survey of interests in order to begin compiling a "profile". Oddly enough a common response to these attempts to provide better service are complaints of "invasion of privacy".

At any rate, it is going to take a while before we are able to plumb the depths of the global information space with facility or elegance. Until that time I am sure that there will be constant employment for persons who have expertise in information retrieval. I should mention here that many libraries are concerned that the Internet will make their services obsolete. Given what I just said, I know that we will be needing librarians and their talents for many years to come.

Concluding Remarks

So, we are moving rapidly in ways that will permit us to publish information in various formats with the ease of sending email. We are also seeing rapid growth in the development and use of sophisticated graphical software that permits us to browse information space. However, we need to keep in mind that these advances also widen certain gaps that either prevent or at least make more difficult access to information for various communities. In addition we need to remember that as we keep adding content, dumping data into the information space that we should be efficient in the way we use that stuff of information.

It is a usual paradox of how can we move forward to meet the needs of those advancing rapidly while keeping in touch with those who do not have the resources to keep up. These are issues that go beyond technical or financial detail. For some communities (education, civic, small businesses) there are cultural artifacts, histories that have to be changed before information technologies will be used daily.

I do see a time when anything can be published, and that it can be accessed by any person who is using any client after asking any question. I believe the word is ubiquitous. Mark Weiser coined the term "Ubiquitous Computing". To close with a quote from the net: "Ubiquitous computing names the third wave in computing, just now beginning. First were mainframes, each shared by lots of people. Now we are in the personal computing era, person and machine staring uneasily at each other across the desktop. Next comes ubiquitous computing, or the age of calm technology, when technology recedes into the background of our lives. Alan Kay of Apple calls this "Third Paradigm" computing."

(http://sandbox.xerox.com/hypertext/weiser/UbiHome.html)

Abstract

The author will explore new facets of networked information discovery and retrieval (NIDR). He will present work that is currently being done at the Boulder Public Library and the University of Colorado that is relevant to advancing the use of networked information.