Evaluation
- Suitable Data Model for Existing Public Databases
The hierarchical data structure of features, for example, is naturally rep-
resented in the nested relational model. Stable data such as protein names
or taxonomy, and variable data such as feature descriptions which are of-
ten added or corrected by biologists, are stored in separate relations. This
is expected to improve processing efficiency.
- Integrated Environment for Protein Databases
Since existing public protein databases are managed in different institutes,
there are various difficulties in using plural databases. For example, fea-
ture descriptions of functions are stored in PIR, structural features are in
PDB, and relations between amino acid patterns and features are stored
in ProSite. The GUI, which is implemented in Xwindows, communicates
with Kappa-P via RPC and provides an integrated environment for feature
descriptions by showing their positions graphically. It displays functional
features in PIR and structural features in PDB, both of which are stored
in Kappa-P nested relations.
- Speed-up of Exhaustive Search (Motif Search)
The most popular use of protein sequence databases is to predict the
functions of a function-unknown protein through the homology between
its amino acid sequence and those of function-known proteins. Motif
search is another homology-based search which searches for amino acid
patterns within a sequence database. Both require exhaustive searching,
and parallel processing is expected to speed up the search process.
Kappa-P has a feature which passes user-defined programs from the in-
terface process to each local DBMS and executes them. The results are
collected by the interface process and returned to the user. Using this
mechanism is expected to reduce communication costs considerably, com-
pared to using a mechanism in which whole data are once collected by the
interface process and re-distributed to each PE with the user's program.
- 79 -