DATABASE MANAGEMENT SYSTEMS

Data Hierarchy:

Fields represent a single data item such as name, address, balance etc.
A record is made up of a related set of fields describing one instance of an entity e.g. your bank account details
File is a set of related records so for the above it would be the set of everyone’s bank account details
A database is a collection of related files

Databases control ‘data redundancy’ by ensuring that the same data cannot be entered twice. If you recall from Hans if you had a customer table and a sales order table, you don’t have to re-enter the customer’s details when you make an order. You use the primary key (customer id) to get the details from the database. If you had to key in the details and store it in the order table it would mean that the exact same data would be stored twice in the database. If this were the case a database would be unmanageable.

This brings up the point of ‘data inconsistency’. The fact that the user would be inputting the information into the sales order table without knowing exactly what is in the customer table could lead to errors in customer id, name, address etc. Also if the data in the customer table was updated then you would have to search the database (or order table) to find that customer id and update it there as well. In a relational database the user would only have to enter the customer id in the order table and as such would only be able to enter a customer id that exists in the customer table. If that particular customers record was updated then you don’t have to worry about changing any other records as it is only in the customer table.

Advantages of Databases:

Data are independent from applications – Stored centrally so everyone can access it. The data is retrieved from the database as required. Think of a database as a filing cabinet
Data are accessible to any new program – you only have to tell the program where the data is
Data are not duplicated in different locations – example above of data redundancy
Programmers do not have to write extensive descriptions of the files – Therefore easy enough to maintain
These save enough money and time to offset the extra costs of setting and maintaining Databases

Disadvantages of Databases

Data are more easily accessible so more easily abused
Databases require expensive hardware and software
Specialised personnel is often required to start with large databases
People may object to ‘their’ data being widely available in a database – information is power

DATABASE MANAGEMENT SYSTEM (DBMS)

This is a program that makes it possible to:

Create
Use and
Maintain a database

It provides a logical access to the data stored in the database, i.e. sql query; specifies what tables to draw the data from and under what conditions e.g. where balance > 500 etc. User/programmers do not have to worry about the physical aspects of the database.

There are two main functions of the DBMS:

Query language – this is for people who are not programmers, not very difficult to learn. The sql is the greatest advantage of databases.
Data manipulation language – this is for programmers who want to modify the links between data elements within the DB. Whilst we could simply go into Access and change the properties of tables or the database relationships using a GUI, database administrators use DML which is basically database code.

When creating a database you specify the links between data items or what links tables together; a primary key in a table and a foreign key in another table – the foreign key is the primary key of the first table. However different types of relationships can be specified i.e. different logical views. These views correspond to three main types of DBMS’s:

Hierarchical Databases
Network Databases
Relational Databases – introduces in the 80’s and are the most frequent databases

HIERARCHICAL DATABASES

Data items are related as ‘Parent’ and ‘Child’ in a tree-like structure
‘Parent’ means that data item is higher in the tree than the ‘child’ and connected to it.
One ‘parent’ can have more than one ‘child’ but one ‘child’ can only have one ‘parent’
The most common platform is IBM’s Information Management System (IMS).
This type of database offers very fast retrieval of data

However there are undesirable side effects of this type of database:

Insertion of a record – the dependent record cannot be added without a parent
Deletion of a record – if you delete the parent you delete every child associated with it as well
It is impossible to have two parents

NETWORK DATABASES

Same as the parent and child schema in Hierchical database, but children can have more than one parent
It is also possible to link items upwards to other items parents
Practically, it means that the DBMS can deliver the information based on any data item
The flexibility of data retrieval is the main advantage of Network Databases over Hierarchical databases

RELATIONAL DATABASES

Data items stored in tables
Specific fields from each table related to other fields in other tables (join) i.e. the primary key field (customer id) in the customer table linked to the foreign key field (customer id) in the order table.
Infinite number of possible viewpoints on the data (queries) i.e. select * from customer, orders, supplier, etc where orders.orderid = ??
Most flexible of all databases but slower for complex searches because of the fact that it has to take data from so many tables under certain conditions. It has to check each field to see if it matches those conditions before it will retrieve the data. For a large database that has a lot of data you can see why this delay in retrieval is so.
Many different types of relational databases – Oracle, SQL Server, Sybase, etc.

RELATIONSHIPS

Attempt at modeling the business elements (entities) and their relationships (links)
Specifies dependencies between the data items
Coded in an Entity Relationship Diagram (ERD)
Different types of relationships

One to one
One to many
Many to many

CREATING THE DATA STRUCTURE

There are few rules for this procedure and a lot of it is down to experience
It can get quite complex and you have to try to design it so that the database can operate at maximum speed.
Tables must be normalised i.e. redundancy is limited to the strict minimum by an algorithm
Normalisation is not always the best method though

NORMALISATION

This is the process of simplifying the relationships amongst data items as much as possible
Reasons for normalisation

To simplify retrieval
To simplify maintenance i.e. updates, deletion, etc.
To reduce the need to restructure the data for each new application

FUNCTIONS OF DATABASE MANAGEMENT SYSTEMS

Data storage retrieval and update facilities
A user-accessible catalogue or data dictionary
Support for shared update
Backup and recovery services
Security services
Integrity services
Services to promote the data independence
Telecommunications
Utilities

SUPPORT FOR LOGICAL TRANSACTIONS

A logical transaction involves the reading, writing or updating of a record.
If the transaction is interrupted before it’s completed then the ‘up-to-date’ data is sacrificed for consistent data i.e. it’s not saved.
If the transaction is not interrupted then the transaction is said to be committed i.e. written to disk or successful
A DBMS provides mechanisms that either Commit or Rollback transactions

SHARED UPDATES

Occur if two or more users are updating the database at the same time
For example: double update

Customer balance is $418
Correct

Pat records sale of $100
Reads, updates, writes – Balance = $518
Joe records payment of $100
Reads, updates, writes – Balance = $418

Incorrect

Pat reads and updates
Joe reads and updates
Pat writes – commit $518
Joe writes - commit $318
Value = $318 ?

Shared update solutions - Avoidance

Prohibit shared updates
Allow access for retrieval only
Record updates in transaction file and update database periodically using a batch program
The problem with this is that the data in the database is temporarily out of date
Customer may not be allowed credit because his balance had not been credited with last payment

Shared update solutions – Locking

Lock the table or record while its being updated
Three types of lock

Exclusive Lock – This is extreme and inflexible as other users can neither read or update the locked table/record
Read Only Lock – Other users can read but not update the locked table/record
Lock Time-Out – If a table/record is locked a person could be waiting a long time before it is unlocked. As such some DBMS’s detect lengthy locks and unlock them, undoing any updates made to any records during the transaction

Lock granularity refers to the level of the lock i.e. what’s locked - field, record, page/block, table
Deadlock – Users can have a lock on more than one record at a time. This poses problems when two users require each others locked records

RECOVERY

Backups or saves – normal backup of the database files
Journaling / Audit trail / Audit file

Keep a log or journal of the activity which updates the database
Recovery involves copying the backup of the database and running a special program to update the backup version of the database with the transaction in the log. If you make a backup of the database every night at twelve o’clock and the database crashes at two o’clock the next day you can use the backup and add the entries in the transaction file (audit file) from twelve o’clock to the time the system crashed. The database is now back up to date.

SECURITY

Restriction of access to authorised users only

Passwords
Encryption
Views – basically pre-saved sql queries. The user is viewing a copy of the sql query
Authorisation levels

Read only
Edit
Delete
Create

Data integrity is where the user is forced to input data in a specific way i.e. customer id’s can only be numeric. DBMS’s must support the isolation of data structure from the programs. This is known as data independence. Users or application programs are not affected by changes to the database structure i.e. new records, records deleted. Logical and physical Data Independence is usually achieved through subschema or View type mechanisms. Views enable the database to function properly as if the user was using a select query the tables would be locked and so it would affect other users and application programs. A database Schema is a description of the overall logical structure of a database, expressed/programmed in Data Definition Language (DDL). It can be broken down into sub-schemas; logical description of a user’s view or programs view of the data used. DDL can be very sophisticated on a mainframe or trivial on a PC (queries/views).

As organisations are rarely single sites they use telecommunication. Flows of data do transcend the boundaries of the organisations; information systems also do. Databases can be used to support the distribution of information resources. Organisational data sources are varied and so all applications must be integrated to save time (i.e. exchange data). Databases can be used to enable this integration, one example is MFG/PRO. Portability/compatibility is paramount (e.g.ODBC drivers).

DATABASE UTILITIES

Compact data files
Index/re-index files
Repair database (crash)
Import/export data from and to other sources
Enforce standards (integrity of standards)
Associated data dictionary
Access to remote computers (login, emulation)

DISTRIBUTED DATABASES

This is the next logical step in geographically dispersed organisations where the goal is to provide location transparency. The starting point is a set of decentralised databases located in different places, developed for the specific information needs of each site. The aim of distributed databases is to integrate these decentralised databases into one. Distributed databases have additional roles to play in that they must determine the location of data to be retrieved. They must also translate the request for this data retrieval into the language used by the local DBMS. Heterogeneous distributed DBMS’s means different DBMS’s running at each site. A master DBMS controls the interactions amongst the parts. However it’s not very practical today due to the need for compatibility. More often, each DBMS follows the same data architecture.

ADVANTAGES

Increased reliability of systems and availability of data
Local control preserved
Modular growth possible at each site and at new site
Optimised communication costs
Faster response times

CONTROL IN NORMAL DATABASES

Transaction control: ability of the DBMS to ensure the successful completion of transactions

Commit transactions
Rollbacks

Concurrency control: ability of DBMS to arbitrate between concurrent uses of data

Simultaneous access
Simultaneous update
Deletion

CONTROL IN DISTRIBUTED DATABASES

Different portions of the overall database reside at different locations
These portions are controlled by different processors running sometimes different DMBS’s
Common schema means queries can involve any portion of the database residing at any location

OPTIONS FOR DISTRIBUTED DATABASES

There is the issue of the actual physical design
Performance of the database depends on good design
There are a number of options:

Data replication

Store a separate copy of the full tables in each location
If a copy is stored in every location then there is full replication
Advantages – reliable and fast response
Disadvantages – storage requirements increase and issue of complexity and cost of updating

Horizontal Partitioning

Some of the rows of the tables are stored in the location; others are stored at other locations
E.g. customer banking out of a particular branch
Advantages – efficient, local optimisation and secure
Disadvantages – inconsistent speed access and backup vulnerability due to dispersement of tables

Vertical partitioning

Some columns are projected into base relationship at different sites
All relations share a common domain so the full table can be reconstructed
Advantages – tailor made support for functional areas and also same advantages as horizontal partitioning
Disadvantages – some queries might be very slow and users must understand some design issues

Combination of the three methods

Most of the time, companies will use different methods
Each method is efficient in certain situations with some other security requirements

PROBLEMS WITH GLOBAL TRANSACTIONS

DBMS’s can be radically different i.e. relational versus network and only some state of the art commercial products have translating capabilities. One alternative solution is to put some essential data and the directory of the data locations on a central server. Real distributed database systems solve these problems for the users with the help of the Networking Operating Systems (NOS). To ensure the integrity of the integrity of the data in update operations there is a Commit Protocol. This is a well-defined procedure based on the exchange of messages i.e. ‘ok’ or ‘not ok’. Each global transaction can be completed or aborted.

Two phase commit:

Site originating the transaction sends request to all sites involved in the update
All sites attempt to process their part of the transaction without committing the data (temporary files)
They notify the first site whether it’s OK or not
The first site collects all the OK’s and sends the order to commit the data

TIMESTAMPING

This is an alternative to locking as with locking there is the potential to get deadlocks. Timestamping ensures that transactions are processed in serial order so locking is not needed. All updated records carry the timestamp of the transactions that modified them. If a new transaction attempts to update a record with an earlier timestamp, that’s ok but if the new transaction has a later stamp, the update is denied, the transaction is re-stamped and is re-started. This could happen over a wide network, where transactions are not immediately modified.

EFFECT OF DESIGN ON SPEED

For example if you wanted to design a fast query that needs data located at two sites. The queries must be broken into components isolated at different sites to minimise communication time and traffic. Determine which site has the potential to yield fewer selected records and then move the preliminary results to the other site where the rest of the work can be performed.