Data Models

Data models have been used for many years to design information systems. This type of software typically uses a relational database management system (RDBMS) based on Structured Query Language (SQL). Logical data models are used to organize and understand information structure. Physical data models incorporate data types and other design details needed to generate SQL code.

In addition to creating logical and physical data models, MacA&D and WinA&D can customize diagram presentations, model non-visual database elements and generate SQL for Oracle, DB2, SQL Server, Sybase, Informix and InterBase. This paper discusses SQL, data models, code generation and namespaces for partitioning SQL schemas. These concepts are illustrated with WinA&D screen shots and supported by all editions of WinA&D or MacA&D OSX.

SQL Concepts

The relational model is used by most commercial databases. Most database systems use a vendor specific form of SQL to define the database schema. SQL has undergone several standardization efforts including SQL-99 (also referred to as SQL3). Most RDBMS products support a subset of SQL-99 plus vendor specific extensions.

SQL began life as a procedural-oriented language, but in recent years object-oriented concepts have been added by RDBMS vendors and included in the SQL-99 standard. To support the object-relational model, new capabilities like user defined (abstract) data types, methods for user defined types and object views have been added to RDBMS products. This ongoing effort is less standardized and not as widely utilized by developers.

SQL statements can be categorized as data definition language (DDL), data control language (DCL) and data manipulation language (DML). DDL statements define or delete database objects such as CREATE TABLE and DROP TABLE. DCL statements control access to data and the database using statements such as GRANT and REVOKE. DML statements select, insert, delete, update and query actual data. When SQL is generated from a data model, the primary emphasis is on the DDL statements that define the database schema and to a lesser extent the DCL statements that handle privileges. DML statements are only relevant in the sense that they are used within triggers, procedures and functions stored in the dictionary.

Data Model in WinA&D With Generated SQL

Data Modeling Concepts

The data modeling process can be described as iterations of the following steps.

  • Create a data model as one or more Entity-Relation Diagrams (ERDs) and associated dictionary.
  • Set presentation defaults and save your work into a project folder.
  • Using the tool palette, create entities and relationship connections and fill in property dialogs.
  • Merge the diagrams into the dictionary to generate associated entries.
  • Use a dialog to fill in detailed information about entities, attributes and other SQL elements.
  • Use menu commands to generate foreign keys, verify and balance the model.
  • Generate SQL code.

An ERD represents tables as entities and columns as attributes of those entities. Often the words entity and table or attribute and column are used interchangeability. To be more precise, entity and attribute usually refer to the logical data model, while table and column refer to the physical SQL code. A data model may contain one or more diagrams within one or more documents that store detailed information in the global data dictionary.

Entities are added to a model with the Entity tool and named in a property dialog. Entities represent SQL tables or views. The diagram is merged into the dictionary to add an entry for each entity.

Entity details entered into a dialog are stored in the dictionary. Each entity and each of its attributes has a dictionary entry. Attributes have a dictionary entry name of the form EntityX'AttributeY. In the Dictionary window, the entries for each entity and its attributes are grouped together alphabetically.

Details Dialog for Entity

In addition to attributes, a table can have associated triggers, alternate keys, indexes and other properties. SQL tables (entities) can have associated triggers that execute code when an event occurs. Triggers have a dictionary entry name of the form EntityX-TriggerY.

Logical Data Model

Connection lines between entities represent foreign key relationships. Connection lines can be named and have a dictionary entry with information that further details the relationship.

A menu command can be used to add referential attributes to the referencing entities based on diagram connections. The command puts relationship data into the dictionary entry for each connection. A dialog can be presented to further define each relationship with referential actions.

The SQL Elements dialog is used to define SQL concepts such as domains, constraints, assertions, etc. that do not have graphic representations. Some elements such as domains and constraints are referenced from many columns and tables. If the element name is changed, all references change automatically.

An alphabetical list of each type of item is displayed. When the item is selected the dialog shows the item's Name, Description and Code fields. The Description field is free format text used by the developer and optionally included as comments in emitted code. The Code field contains an SQL code fragment.

SQL Elements Dialog

Design namespaces are used to partition large projects. If your project uses namespaces, a namespace selector is visible in the bottom right of the dialog. It can be a powerful aid when working with complex models. The items in each list can be viewed for a selected namespace or all namespaces. If a specific namespace is selected, then the name of any new item added is automatically prefixed with that namespace.

An information system can be represented as a logical or physical data model. A logical model typically uses longer, more expressive names for entities and attributes, but hides details like data types and foreign keys. In a physical model, names of tables and columns may be more concise and are often predetermined by the existing database schema. Other differences in the physical model depend on designer preferences. A relationship line can use a single arrowhead that points to the referenced entity and shows the primary key in the relationship name. The tool can automatically generate physical names from logical names based on user-determined criteria.

In addition to its logical or physical presentation, data models can hide or show all attributes, selected attributes or only show primary and foreign key attributes. Another option is to show only the entity name and its description. The Diagram Presentation command changes the presentation of all entities on the current diagram. Entities on other diagram levels are not affected. The Presentation command changes the presentation of a selected entity.

The tool has the capability to represent logical and physical models either as a single diagram or as two separate diagrams usually within the same document. When using a single diagram, menu commands allow the designer to easily toggle between logical and physical views. When using a two-diagram approach, a command automatically generates a physical model from a logical model.

The strength and weakness of each approach is briefly described here.

  • The single diagram approach is convenient since only one diagram is created to represent both logical and physical models. Changes or additions to entities and relationships need only occur on one diagram. A designer uses the Physical command to toggle between logical and physical views. Since the physical model shows attribute data types or domains on the diagram, entities require more space and are thus loosely packed when viewed in the logical model.

  • In the two-diagram approach, the designer simply selects the logical or physical model from the Contents view since each diagram exists separately. Each diagram can use different physical or presentation options and represent connections and their names in different ways. Using this approach the logical model is created first, then a command is used to create a physical model in a separate diagram.

    In the two-diagram approach, each diagram can be edited independently. This approach requires extra effort to add or rename entities or relationships since these modifications must be made to both diagrams. However, most of a designer's time is spent defining attributes and related details. Since that information is stored in the global data dictionary, changes are automatically reflected in both diagrams. Alternatively, the physical model need not be maintained since it can be regenerated automatically after editing changes are made to the logical model.

The command to generate a physical model performs a series of steps that can also be done manually by the designer. First it creates an empty diagram or replaces an existing diagram. Objects from the logical model are copied to the physical model. The Physical command is selected and the Diagram Presentation is set to Normal. If a many-to-many relationship exists in the logical model, a new entity is created that formalize the relationship and the connections are changed to one-to-many relationships as described below. Foreign keys are generated. Connection names are changed to reflect the matching of primary key to foreign key values.

A logical model may contain a many-to-many relationship between two entities that can be simplified for implementation by introducing a table in the physical model that formalizes the relationship. Consider a company that stores many parts in several warehouses.

Logical Model Shows Many-To-Many Relationship

In the physical model, a new table can be created that simplifies the implementation. It references the primary keys in the original entities. The new Quantity attribute (column) stores information related to the two entities (tables).

Physical Model Adds New Entity

Code Generation Concepts

An SQL script file can be generated for an entire data model, selected entities and/or selected SQL element types. Namespaces are used to group sections of a design and can be used to selectively generate SQL for specific schemas in the data model. The generated SQL script supports SQL-99, Oracle, DB2, SQL Server, Sybase, Informix or InterBase formats.

The Customize Code dialog is used to control whether SQL constructs are generated and specify the formatting and syntax used. For example, entities produce CREATE TABLE statements that may include primary key, foreign key and alternate key constraints. Alternatively, constraints can be expressed separately with ALTER TABLE statements.

Dialog to Customize SQL Code Generation

Some RDBMS products support indexes to speed access to specific rows in a table. Indexes can be generated for primary keys, alternate keys or generated from indexes defined in the data model. Some of the options is this dialog are only applicable to specific SQL dialects.

Some databases support several named object collections called schemas. SQL code can be generated with or without schema prefixes using the Schema Syntax selection. The Customize Code dialog has a list of Include and Drop checkboxes to selectively include the CREATE and DROP statements of SQL element types like tables, views, domains and constraints. Trigger styles vary between RDBMS products. The selected SQL dialect determines the format of the generate code.

The Generate Code dialog has options to generate code for selected objects, all objects, physical models or objects within a namespace. If your project uses separate logical and physical diagrams, select the Physical Models option to generate code only for entity objects on physical models.

Dialog to Generate SQL Code

Schemas and Namespaces

An SQL Schema identifies a group of related objects such as Tables, Views, Constraints, Assertions, etc. that are owned by a user or authorization identifier. Object names within a schema are unique. In some RDBMS products a database may contain several schemas while in others, the entire database is one schema.

Namespaces can be used to model multiple schemas within a project. If a project uses three schemas named Tom, Dick and Harry, their associated modeling objects and dictionary entries are mapped to the namespaces Tom, Dick and Harry, respectively. Namespaces are an optional design feature so if your database is simple and consists of one schema, they can be ignored completely.

During the modeling process, namespaces can easily be applied to selections in a diagram or dictionary. Commands for reporting information, listing dictionary entries, importing data, etc. can use namespaces as a powerful organizational tool for dealing with lots of information.

Most SQL statements contain element names that can be prefixed with a schema as illustrated below. Notice that in the first table, the name Customers is prefixed with the schema name Tom. For some database products the schema name is also the owner name. The second table belongs to schema Harry and creates a different table with the same name.

    CREATE TABLE Tom.Customers
    (CustomerID NUMBER, FirstName VARCHAR(30), LastName VARCHAR(30));

    CREATE TABLE Harry.Customers
    (CustID NUMBER, FName VARCHAR(30), LName VARCHAR(30));

In the modeling environment, these tables belong to namespaces Tom and Harry, respectively. For example, if you looked in the dictionary you would find entries related to the Tom:Customers table of the form:

    Tom:Customers = entity entry
    Tom:Customers'CustomerID = entity attribute entry
    Tom:Customers'FirstName = entity attribute entry
    Tom:Customers'LastName = entity attribute entry

When generating SQL code from a data model the developer can selectively generate code for a specific namespace or all namespaces. The code may optionally include or exclude the qualifying schema name.

Summary

MacA&D and WinA&D support logical data models, physical data models, various diagram presentations and SQL code generation. It comes with printed and PDF manuals, contextual help and step-by-step tutorials to get you started.