AITEC Contract Research Projects in FY1997 : Proposal

(13) WEB-KLIC: A Concurrent Logic-based Unified Framework for Internet Programming

Principal Investigator :	Dr. Gopal Gupta, Associate Professor,
	Laboratory for Logic, Databases, and Advanced Programming, Department of Computer Science, New Mexico State University

[Background of the research]

The widespread use of Internet and World-Wide Web (WWW) has prompted researchers to develop new tools for manipulating WWW-related information. The tools that are being currently developed, in our opinion, are still not powerful and user-friendly enough. The tools are not user-friendly enough because even the development of a simple program for interacting through the WWW requires the use of (at least) three different programming languages: HTML to develop the Web pages (i.e., the interface to the clients), a traditional language (e.g., C++) to develop the application which serves the client's requests, and a scripting language (e.g., Perl, Javascript) to develop the CGI script (i.e., the script used to transfer data between Internet and the application).

In addition, no programming language really supports a programming environment suitable for uniformly dealing with entire range of Internet-related applications; in particular:

Web pages read/written from/to the Web are structured documents while most of the languages (only exception is the PiLLoW [2] library) treat them as simple strings of characters;
No language (not even web-dedicated libraries like PiLLoW) supports a structural representation that goes beyond a single Web-page, thus ignoring the fact that different pages on the Web may be interconnected through hyperlinks.
Current Web-enabled programming languages view Web pages as inherently static and passive structures; they assume that the content of the Web-pages is fixed, and provide very limited possibilities of embedding computations in them (CGI [4], SSI [8], Java's Applets [1] are few attempts in this direction, but all of them are very limited in scope).
Current Web-enabled languages either have limited support for expressing concurrency, making them unsuitable for programming servers (servers require concurrency to properly handle multiple client connections), or concurrency is supported at a very low-level (e.g. Java), requiring detailed knowledge of low level primitives on part of the programmer.

[Purpose of the research]

We believe that a (concurrent) logic language is an ideal candidate for developing Internet WWW-applications, as the problems presented above are easily solved in it. In this project we propose to extend the KLIC system to support the effective development of efficient tools for dealing with the WWW.

KL1, and thus KLIC, is particularly suitable for this purpose, as it is a concurrent logic language and possesses very good symbolic processing vcapabilities. Its concurrent nature allows it to easily deal with multi-threaded computations (e.g., server handling multiple client requests or multiple agents searching the Web).

The aim of this project is to extend KLIC with the necessary language features to support the development of WWW-applications at any level. Our extensions will enable KLIC to manipulate WWW documents in a structured way, provide a complete embedding of the most common interfaces (HTTP, CGI, etc.), permit the development of both active and dynamic Web-pages, and integrate concurrency (at a very high level of abstraction) in Web-based applications.

These extensions will transform KLIC into a first ever unified framework for the development of Internet applications. The flexibility of KLIC, its high portability to all types of platforms, and the nature of its implementation (compilation to C) are key components that make this project feasible.

[Contents of the research]

In this project we propose to extend the KLIC system with a collection of features aimed at supporting the development of WWW-related applications at all levels.

The design of the WEB-KLIC system will be organized into four different layers:

Message Layer: this will involve the generalization of the socket-based communication provided by the KLIC system. This level will be used to support all the communications required at the upper layers, and will be accessible to the user through a simple interface based on message structures and built-ins to establish communication. This layer will also include an interface to various Unix system calls (e.g., get_host, ping, etc.).
HTTP Layer: on top of the message layer we will provide a complete implementation of the HTTP communication protocol---HTTP is the protocol employed to support communication in the WWW. This layer is accessible by the user through a collection of built-ins allowing him/her to both (i) retrieve HTML documents given their URLs, or, (ii) transmit them. The built-ins will allow access to most of the features of HTTP (time-out, cache control, etc.).
Object Layer: The object layer will offer higher level structuring to support manipulation of HTML documents as data. This layer will provide a structured view of HTML documents. Each HTML page will be represented using a tree structure (HTML-tree), where each node is labeled with the tag describing the nature of the information encoded in the corresponding subtree (e.g., enumeration, link, etc.). The connection between separate pages through hyperlinks is encoded in a graph structure called HTML-graph. The nodes of an HTML-graph are HTML-trees or other HTML-graphs. This allows a hierarchical structuring of HTML documents (e.g., for security control). These structures are supported by an equational theory (encoded in the unification algorithm) as well as special predicates, like used to express the ``subtree'' relationship. Special quantifiers (``in-each'', ``in-some'') will be available for quickly applying computations to a given HTML-graph.
Common Gateway Interface (CGI) scripts are easily realizable, as they are expressed by simply indicating in the HTML-tree the name of the procedure which implements the script. The procedure is assumed to be part of the program itself. All the lower level issues (e.g., obtaining the input of the CGI script from HTML forms) are hidden from the user. A procedure implementing a CGI script will have two arguments: the first one will be instantiated during the call with a list containing the name-value pairs produced by the HTML form, while the second argument will be instantiated during the execution of the script to an HTML-tree, representing the output page to be returned to the connected client.
Note that this layer is powerful enough to cover most common cases (e.g., connecting existing applications to the Web via HTML forms, etc.). Combining the last two layers will also allow the easy development of Softbots/Agents.
The availability of concurrency (and the possibility of turning concurrency into actual parallelism via KL1 pragmas) is a further advantage of WEB-KLIC over existing proposals (e.g., the complex scheme based on or-parallelism proposed in [7]).
Active Layer: in the object layer links can only connect an HTML-tree to another HTML-tree. The active layer relaxes this restriction and allows links in HTML-trees to refer to WEB-KLIC procedures; following a link will imply executing the corresponding procedure. This leads to a more dynamic view of browsing, allowing, for example to pass arguments between visited pages, and allowing the pages to adapt to the current status of the browsing session---adaptive browsing [3].
We are envisioning two alternatives for the active layer, both will be made available in WEB-KLIC, depending on the location of the execution of an active page.
Executing the active page on the server's side will solve various open problems in today's WWW applications; in particular this scheme creates a ``computational backbone'' to a browsing session, which allows to maintain a state across the whole session---a feature that is of extreme importance and that is achieved in an awkward way if we use currently available technology.
Executing the active page on the client's side (i.e., the client download a page which contains executable code) allows to achieve similar benefits, reducing the load on the servers and allowing ``intelligent'' browsing schemes.
Viewing the active pages as logical theories suggests the possibility of introducing {\em modal operators} to express higher-order computations on the active Web. This allows us to encode highly complex tasks easily and succinctly---e.g., if the active pages represent ``alternative'' databases, a single query is sufficient to verify whether a certain goal holds in each accessible database.
External Interface: some additional interfaces will be provided in order to allow the connection of WEB-KLIC to existing applications. The most relevant interface we intend to realize is the Common Client Interface (CCI) [5], which will allow WEB-KLIC to communicate with some existing browsers (e.g., NCSA Mosaic).

[Project Organization/Research Method]

(1) Project Organization

	Name	Affiliation
Principal Investigator	Gopal Gupta	Dept. Computer Science NMSU
Cooperate Researcher	Enrico Pontelli	Dept. Computer Science NMSU

(2) Research Method

the development of the WEB-KLIC project will proceed through successive stages of refinement. Initially some time will be devoted to the exploration of the current KLIC implementation, in order to gain familiarity with the code. Successively, the various layers will be implemented. As each layer relies on the functionalities provided by the previous layers, the implementation will follow the order given in the previous section. Each layer will have an external interface and will be accessible by the programmer. This will allow separate testing of each layer. This testing of each layer will be achieved by using it to develop some practical tools for Internet activities, like search agents, extensions to existing browsers (e.g., collaborative browsing, adaptive browsing, etc.), client-side scripting, etc.

[Resulting Software]

(1) The name of the software

the resulting product will be named WEB-KLIC.

(2) Functions and features of the software

Most of the features have been detailed described in the previous sections. To briefly summarize, WEB-KLIC is an extension of the KLIC system, which supports:

user accessible interface to HTTP communication protocol;
high-level structuring and manipulation of HTML pages and interconnected HTML pages;
easy access to CGI scripting;
active HTML pages;
CCI compliant interface.

(3) Structure of the software

Structure of the software:
as is obvious from the previous discussion, the software will be realized as an extension of the existing KLIC system. The software will be organized into four layers, each relying on the functionalities provided by the previous ones. The first layer will deal with access to TCP/IP communication and other system level features, the second layer will provide HTTP support, the third layer will provide support for manipulating HTML documents as data, and the fourth layer will provide the model for active pages.
The software we intend to develop is an extension of the KLIC system. The current KLIC source code is available through ICOT Free Software (IFS), and the fact that it is in large part developed in KL1 itself will allow easy modification/extension. WEB-KLIC will add some additional features to the language, in terms of new built-ins and pre-defined operators.
the following detailed functionalities will be added to KLIC:
- new predefined functional symbols used to encode structural representation of Web-pages (HTML-trees) and interconnected collections of Web-pages ( HTML-graphs).
- various builtins to allow efficient and effective manipulation of the structural representations mentioned in the previous point (like `' used to verify the presence of a pattern in an HTML structure);
- integration of the HTTP protocol in the language implementation---allowing direct access through simple interfaces to the Web (e.g., get_url/3 to obtain a documents given its URL).
- integration of CGI scripting in the language---this will allow to directly connect a program generated HTML document to a given KLIC procedure, avoiding the use of external scripts and hiding all the communication details;
- support for a generalized view of HTML documents; in particular this will allow to replace HTML documents with WEB-KLIC modules, turning pages into active entities, and replacing the browsing process with the process of executing these modules.
- extension of the language with modal operators to support an effective manipulation of HTML documents, both in their static and active form.

(4) Related IFS programs (if any)

The software project is directly related to the KLIC project.

(5) Required program language/OS/software packages

The extension of KLIC is expected to be realized partly in KL1 (using the existing KLIC compiler) and partly in C. We will focus our development on Solaris platforms and we expect the software to be directly portable on any platform running such operating system. The project will use standard Unix support for socket-based communications, thus we expect the resulting product to be easily portable to other platforms also (IRIX, Dynix, Ultrix, Linux).

(6) Expected size of the software

we expect the additions to require about 1500 lines of KL1 code and about 2000 lines of C code.

(7) Expected way of use of the software

How this software will be used, who will be expected users: The resulting system, WEB-KLIC, is expected to become a unified and accessible language for developing Internet Tools. The expected users are programmers interested in either developing specific applications for Internet (e.g., agents, intelligent browsers, etc.) or connecting existing applications to the Internet---in order to allow remote access to the application through the WWW.
Advantage of the software from users' point of view:
The major advantage of WEB-KLIC is its ability to embed in a single framework (which is an extension of KL1) all the features required to support a wide variety of WWW-related applications. This will allow non-expert users to easily develop Internet tools, without having to learn a variety of different languages (HTML, Perl, etc.) and without the need to have detailed knowledge of the various communications protocols (HTTP, CGI, etc.).
Furthermore, the advanced features of WEB-KLIC (active documents, modal operators) will allow the development of WWW-applications that are considerably more sophisticated and powerful than those available today.

(8) Documents which will be added to the software

The software will be accompanied by a: (i) user manual that will explain the capabilities of WEB-KLIC; (ii) a tutorial that will explain (via real-life examples) how to effectively use the WEB-KLIC system for developing Internet applications; and, (iii) a technical manual describing the details of the implementation of WEB-related extensions to KLIC.

www-admin@icot.or.jp