Object Oriented Models and Data Integration

In the second step of our workflow molecular objects are collected together with their properties and relationships. Actually, we use the object-oriented approach to specify these models and include information about from which datasources the objects stem from. Here we have to answer at first the question of which objects participate in our system. Classes of objects which appear in different data sources are semantically integrated and represented by a unique class in the model. Furthermore, classes which should be included in the model but do not appear in any data source can also be added.

The specification of models implies that software is required to implement databases which are capable to store objects related to the model persistently. Adding or removing classes from the conceptual schema causes classes and objects in the storage layer to be modified. Adding an object class to the database allows the user to both store objects of this class persistently in the database and to retrieve selected objects from the database by using database queries.

The integrative aspects of database modelling consist of (1) integrating classes from different data sources into the model and (2) importing objects from the content stored in these data sources. Figure 11.8 shows by example a conceptual schema

Fig. 11.8 Conceptual modelling of static properties and relationships

containing classes from different domains drawn by our tool iUDB[ISB], which enables the modeller to edit specifications interactively and implements databases automatically based upon the specifications modelled. The conceptual model includes classes at metabolic level (reactions, metabolites, enzymes, ...), as well as classes from gene regulatory level (genes, polypeptides, regulatory interactions, ...). In this drawing attributes of classes are directly linked to the a class, e. g. the class pathway has an attribute name, which means that for each pathway a name is stored as value. Attributes which are additionally linked to other classes determine directed relationships between objects of these classes. In the example shown in Figure 11.8 pathways are related to enzymes, expressed by the attribute ec. These directed relationships are also called references.

With the modelling of processes by means of object-oriented modelling, recent objects participating at processes should be modelled as a separate class. Even if supported by modelling tools, the identification of object classes is conceptual work, it requires knowledge of the biochemical background and it cannot be automated. Nevertheless, we can automate the integration of objects using data integration. One important aspect here is the adequate specification of object keys, which are attributes that uniquely identify objects, so that it is impossible to store two objects that have the same values for all of key attributes.

Figure 11.9 shows how the specification of object keys directly influence the object space. In the example an object class Enzyme is shown which is uniquely identified not only by the attribute EC (EC classification), but also by Organism and Tissue. With this three-dimensional object space, enzymes having the EC number "" occupy the plane defined by Tissue and Organism at the point where ec_number =

When integrating data from relational data sources into a model defined by a conceptual schema object keys become important as integrity constraints for assigning external data to local objects. In general, a set of these assertions is called mapping, whereby in the approach described here for each datasource the modelling of a separate mapping to an object class is necessary. By that, there are no dependencies between different the mappings of different data sources.

The process of object integration will be briefly explained. As mentioned before, the integration of objects requires the modelling of mappings between classes and data sources (Figure 11.10). These mappings contain assertions between the attributes specified with the class and the columns of the related tables. In most cases, different attributes will be mapped to different tables. In the figure above, three col-

Fig. 11.9 Definition of object spaces by object keys.
Fig. 11.10 Integration of objects from relational tables

umns of the tables X, Yand Z assigned to the attributes a, b and c have been marked. After all assertions have been defined by the modeller, an integration mechanism will apply them in order to select values from the assigned columns. To organize the results in an object-oriented format, relational join operations are computed to combine the columns. Finally, the result has to be inserted into the model. A union operation is now storing all values within the associated objects. Object keys are used here to locate the objects in the model.

0 0

Post a comment