Parallel Inference System 

Abstract 

A parallel inference system provides the basic portion of the FGCS prototype 
system. The parallel inference system can efficiently solve, in parallel, symbolic 
and knowledge processing problems written in the kernel language, KL1. 

System Components 

. The kernel language KL1 is a general-purpose concurrent logic programming 
  language which has the descriptive power and functions needed to easily de-
  scribe and solve symbolic and knowledge processing problems in parallel. 
Ą PIMOS provides a KL1 programming environment in addition to conventional 
  OS functions. 
Ą The design of PIM hardware architecture is well-balanced to efficiently execute 
  KL1 from a parallel machine point of view. 
Ą The KL1 language processor is implemented with a low parallelization and 
  decentralization overhead. 
Ą At the application layer, adequate speedup is obtained by dynamic and static 
  load distribution and speculative work. 
- 8 -

Parallel Logic Programming Language "KL1" 

ABSTRACT 

KL1 is the "Kernel Language", based on Guarded Horn Clauses, which gave the 
design principles of the whole parallel inference system, from parallel inference 
machine hardware to application software. 

KEY FEATURES 

Ą Allows easy description of fine-grain parallel processing, dividing the whole 
  task into small subtasks. 
Ą Used throughout the system, from the operating system to application soft-
  ware. 
Ą Clear separation of program meaning from program processing, making load 
  distribution much easier. 
. The same language is used in all models of PIM, providing high software 
  portability. 
- 9 -

Parallel Inference Machine Operating System, PIMOS 

ABSTRACT 

PIMOS is the common operating system for the parallel inference systems, PIM 
and Multi-PSI, and provides an efficient and comfortable software development 
environment for the systems. 

KEY FEATURES 

Ą Written completely in the KL1, a concurrent logic language. 
  Realizing high portability. 
Ą Removes management job bottlenecks with its distributed hierarchical man-
  agement scheme. 
Ą Provides powerful software development tools. 
  Debugging tools, load distribution visualization, and etc. 
Ą Tested through in parallel software R&D for more than four years. 

All the systems demonstrated on PIM or Multi-PSI have been developed and are 
running on PIMOS. 
- 10 -

Parallel Inference Machine, PIM 

What is PIM? 

. General-purpose Machine 
  PIM is an MIMD machine which can efficiently execute the KL1 general-
  purpose high-level language. 
. Parallel Machine with about 1000 Processors 
  Scalable architecture consisting of high-performance processors 
  Adequate speedup is obtained with about 1000 processors. 
. Inference Machine 
  Dedicated instructions and hardware for efficient implementation of the con-
  current logic language KL1 (e.g., dereference instruction and tag architecture) 
. Performance 
  A processing element of PIM/p and PIM/m yields 300 - 600 KRPS (append). 
  Thus, a full size system of PIM/p (512 PEs) achieves 250 MRPS (append) 
  approximately. 
  250 MRPS corresponds to about 3.6 GIPS. 
. Five Modules 
  In order to examine various PIM architectures, Five modules are being devel-
  oped: PIM/p, PIM/m, PIM/c, PIM/i and PIM/k. 
  PIM Architecture 
- 11 -

KL1 Language Processor 

Abstract 

The KL1 language processor is software to efficiently implement a common 
identical KL1 interface on PIM modules with different architectures. 

Functions and Features 

Ą Our method compiles into an intermediate language, similarly to WAM of 
  Prolog. This method is easy to develop and has high portability. 
. The specification of the abstract machine for an intermediate language is 
  transformed into machine instructions and microprograms according to the 
  hardware architecture. 
. The transformation of the abstract machine specification into a C program 
  allows easy simulation and debugging on conventional machines. 

Framework for Executing a KL1 Program 

. A KL1 program is compiled into an intermediate language KL1-B. 
. KL1-B codes are executed on an abstract machine. 
. The abstract machine is described as a runtime system on virtual hardware. 
Ą The virtual hardware supposes that shared-memory multiprocessors are con-
  nected by loosely-coupled networks. 
- 12 -

PIM/p 

. Two-level hierarchical structure - a six-dimensional hypercube network con-
  nects clusters, each of which contains eight processors sharing a memory unit. 
. KL1-oriented snoop caches which realize low latency communication and syn-
  chronization 
Ą Enhanced instruction set by macro calls 
- 13 -

PIM/m 

Ą Inherits Multi-PSI's architecture and firmware 
. A single layer network and a node consisting of a CPU make for simple archi-
  tecture and high scalability. 
Ą Capable of examining various parallel processing techniques, such as task 
  division and mapping 
- 14 - 

PIM/c 

. Contains eight processing elements which employ horizontal microprogram-
  ming control and are tightly coupled 
. Dedicated hardware transmits global scope variable (e.g., load information) 
  between clusters with low latency. 
Ą A cluster has high-speed KL1-oriented snoop caches and registers with broad-
  cast facility 
- 15 -

PIM/i 

. Write-update snoop caches and LIW (long instruction word) are introduced 
  for efficient execution of KL1 programs. 
Ą CIF (cluster interface) works as an I/O processor. 
. Effective system status monitoring by video RAM 
- 16 -

PIM/k 

Ą Examines the scalability attained by multi-layer cache 
. Experiments on multi-layer caches and load balancing management appropri-
  ate to KL1 execution 
. Easy implementation of a KL1 language processor on a UMA (uniform mem-
  ory access) architecture 
- 17 -

Multi-PSI 

. A prototype of PIM developed in the intermediate stage of the FGCS project. 
. Research on the KL1 parallel execution method, research on the parallel oper-
  ating system, and R&D on application programs have been done on Multi-PSI. 
Ą The CPU of the PSI sequential inference machine have been employed as 
  the processing element and only a dedicated network controller had to be 
  developed from scratch. 
- 18 -