DATABASE MANAGEMENT SYSTEMS

Data Hierarchy:

Databases control ‘data redundancy’ by ensuring that the same data cannot be entered twice. If you recall from Hans if you had a customer table and a sales order table, you don’t have to re-enter the customer’s details when you make an order. You use the primary key (customer id) to get the details from the database. If you had to key in the details and store it in the order table it would mean that the exact same data would be stored twice in the database. If this were the case a database would be unmanageable.

This brings up the point of ‘data inconsistency’. The fact that the user would be inputting the information into the sales order table without knowing exactly what is in the customer table could lead to errors in customer id, name, address etc. Also if the data in the customer table was updated then you would have to search the database (or order table) to find that customer id and update it there as well. In a relational database the user would only have to enter the customer id in the order table and as such would only be able to enter a customer id that exists in the customer table. If that particular customers record was updated then you don’t have to worry about changing any other records as it is only in the customer table.

Advantages of Databases:

Disadvantages of Databases

DATABASE MANAGEMENT SYSTEM (DBMS)

This is a program that makes it possible to:

It provides a logical access to the data stored in the database, i.e. sql query; specifies what tables to draw the data from and under what conditions e.g. where balance > 500 etc. User/programmers do not have to worry about the physical aspects of the database.

There are two main functions of the DBMS:

When creating a database you specify the links between data items or what links tables together; a primary key in a table and a foreign key in another table – the foreign key is the primary key of the first table. However different types of relationships can be specified i.e. different logical views. These views correspond to three main types of DBMS’s:

HIERARCHICAL DATABASES

However there are undesirable side effects of this type of database:

NETWORK DATABASES

RELATIONAL DATABASES

RELATIONSHIPS

CREATING THE DATA STRUCTURE

 

NORMALISATION

FUNCTIONS OF DATABASE MANAGEMENT SYSTEMS

SUPPORT FOR LOGICAL TRANSACTIONS

SHARED UPDATES

RECOVERY

SECURITY

Data integrity is where the user is forced to input data in a specific way i.e. customer id’s can only be numeric. DBMS’s must support the isolation of data structure from the programs. This is known as data independence. Users or application programs are not affected by changes to the database structure i.e. new records, records deleted. Logical and physical Data Independence is usually achieved through subschema or View type mechanisms. Views enable the database to function properly as if the user was using a select query the tables would be locked and so it would affect other users and application programs. A database Schema is a description of the overall logical structure of a database, expressed/programmed in Data Definition Language (DDL). It can be broken down into sub-schemas; logical description of a user’s view or programs view of the data used. DDL can be very sophisticated on a mainframe or trivial on a PC (queries/views).

As organisations are rarely single sites they use telecommunication. Flows of data do transcend the boundaries of the organisations; information systems also do. Databases can be used to support the distribution of information resources. Organisational data sources are varied and so all applications must be integrated to save time (i.e. exchange data). Databases can be used to enable this integration, one example is MFG/PRO. Portability/compatibility is paramount (e.g.ODBC drivers).

DATABASE UTILITIES

DISTRIBUTED DATABASES

This is the next logical step in geographically dispersed organisations where the goal is to provide location transparency. The starting point is a set of decentralised databases located in different places, developed for the specific information needs of each site. The aim of distributed databases is to integrate these decentralised databases into one. Distributed databases have additional roles to play in that they must determine the location of data to be retrieved. They must also translate the request for this data retrieval into the language used by the local DBMS. Heterogeneous distributed DBMS’s means different DBMS’s running at each site. A master DBMS controls the interactions amongst the parts. However it’s not very practical today due to the need for compatibility. More often, each DBMS follows the same data architecture.

ADVANTAGES

CONTROL IN NORMAL DATABASES

CONTROL IN DISTRIBUTED DATABASES

OPTIONS FOR DISTRIBUTED DATABASES

PROBLEMS WITH GLOBAL TRANSACTIONS

DBMS’s can be radically different i.e. relational versus network and only some state of the art commercial products have translating capabilities. One alternative solution is to put some essential data and the directory of the data locations on a central server. Real distributed database systems solve these problems for the users with the help of the Networking Operating Systems (NOS). To ensure the integrity of the integrity of the data in update operations there is a Commit Protocol. This is a well-defined procedure based on the exchange of messages i.e. ‘ok’ or ‘not ok’. Each global transaction can be completed or aborted.

Two phase commit:

TIMESTAMPING

This is an alternative to locking as with locking there is the potential to get deadlocks. Timestamping ensures that transactions are processed in serial order so locking is not needed. All updated records carry the timestamp of the transactions that modified them. If a new transaction attempts to update a record with an earlier timestamp, that’s ok but if the new transaction has a later stamp, the update is denied, the transaction is re-stamped and is re-started. This could happen over a wide network, where transactions are not immediately modified.

EFFECT OF DESIGN ON SPEED

For example if you wanted to design a fast query that needs data located at two sites. The queries must be broken into components isolated at different sites to minimise communication time and traffic. Determine which site has the potential to yield fewer selected records and then move the preliminary results to the other site where the rest of the work can be performed.