Parallel Inference System Abstract A parallel inference system provides the basic portion of the FGCS prototype system. The parallel inference system can efficiently solve, in parallel, symbolic and knowledge processing problems written in the kernel language, KL1. System Components . The kernel language KL1 is a general-purpose concurrent logic programming language which has the descriptive power and functions needed to easily de- scribe and solve symbolic and knowledge processing problems in parallel. ¥ PIMOS provides a KL1 programming environment in addition to conventional OS functions. ¥ The design of PIM hardware architecture is well-balanced to efficiently execute KL1 from a parallel machine point of view. ¥ The KL1 language processor is implemented with a low parallelization and decentralization overhead. ¥ At the application layer, adequate speedup is obtained by dynamic and static load distribution and speculative work. - 8 - Parallel Logic Programming Language "KL1" ABSTRACT KL1 is the "Kernel Language", based on Guarded Horn Clauses, which gave the design principles of the whole parallel inference system, from parallel inference machine hardware to application software. KEY FEATURES ¥ Allows easy description of fine-grain parallel processing, dividing the whole task into small subtasks. ¥ Used throughout the system, from the operating system to application soft- ware. ¥ Clear separation of program meaning from program processing, making load distribution much easier. . The same language is used in all models of PIM, providing high software portability. - 9 - Parallel Inference Machine Operating System, PIMOS ABSTRACT PIMOS is the common operating system for the parallel inference systems, PIM and Multi-PSI, and provides an efficient and comfortable software development environment for the systems. KEY FEATURES ¥ Written completely in the KL1, a concurrent logic language. Realizing high portability. ¥ Removes management job bottlenecks with its distributed hierarchical man- agement scheme. ¥ Provides powerful software development tools. Debugging tools, load distribution visualization, and etc. ¥ Tested through in parallel software R&D for more than four years. All the systems demonstrated on PIM or Multi-PSI have been developed and are running on PIMOS. - 10 - Parallel Inference Machine, PIM What is PIM? . General-purpose Machine PIM is an MIMD machine which can efficiently execute the KL1 general- purpose high-level language. . Parallel Machine with about 1000 Processors Scalable architecture consisting of high-performance processors Adequate speedup is obtained with about 1000 processors. . Inference Machine Dedicated instructions and hardware for efficient implementation of the con- current logic language KL1 (e.g., dereference instruction and tag architecture) . Performance A processing element of PIM/p and PIM/m yields 300 - 600 KRPS (append). Thus, a full size system of PIM/p (512 PEs) achieves 250 MRPS (append) approximately. 250 MRPS corresponds to about 3.6 GIPS. . Five Modules In order to examine various PIM architectures, Five modules are being devel- oped: PIM/p, PIM/m, PIM/c, PIM/i and PIM/k. PIM Architecture - 11 - KL1 Language Processor Abstract The KL1 language processor is software to efficiently implement a common identical KL1 interface on PIM modules with different architectures. Functions and Features ¥ Our method compiles into an intermediate language, similarly to WAM of Prolog. This method is easy to develop and has high portability. . The specification of the abstract machine for an intermediate language is transformed into machine instructions and microprograms according to the hardware architecture. . The transformation of the abstract machine specification into a C program allows easy simulation and debugging on conventional machines. Framework for Executing a KL1 Program . A KL1 program is compiled into an intermediate language KL1-B. . KL1-B codes are executed on an abstract machine. . The abstract machine is described as a runtime system on virtual hardware. ¥ The virtual hardware supposes that shared-memory multiprocessors are con- nected by loosely-coupled networks. - 12 - PIM/p . Two-level hierarchical structure - a six-dimensional hypercube network con- nects clusters, each of which contains eight processors sharing a memory unit. . KL1-oriented snoop caches which realize low latency communication and syn- chronization ¥ Enhanced instruction set by macro calls - 13 - PIM/m ¥ Inherits Multi-PSI's architecture and firmware . A single layer network and a node consisting of a CPU make for simple archi- tecture and high scalability. ¥ Capable of examining various parallel processing techniques, such as task division and mapping - 14 - PIM/c . Contains eight processing elements which employ horizontal microprogram- ming control and are tightly coupled . Dedicated hardware transmits global scope variable (e.g., load information) between clusters with low latency. ¥ A cluster has high-speed KL1-oriented snoop caches and registers with broad- cast facility - 15 - PIM/i . Write-update snoop caches and LIW (long instruction word) are introduced for efficient execution of KL1 programs. ¥ CIF (cluster interface) works as an I/O processor. . Effective system status monitoring by video RAM - 16 - PIM/k ¥ Examines the scalability attained by multi-layer cache . Experiments on multi-layer caches and load balancing management appropri- ate to KL1 execution . Easy implementation of a KL1 language processor on a UMA (uniform mem- ory access) architecture - 17 - Multi-PSI . A prototype of PIM developed in the intermediate stage of the FGCS project. . Research on the KL1 parallel execution method, research on the parallel oper- ating system, and R&D on application programs have been done on Multi-PSI. ¥ The CPU of the PSI sequential inference machine have been employed as the processing element and only a dedicated network controller had to be developed from scratch. - 18 -