DATABASE SYSTEMS
Concurrent and serialisability
Locking techniques including… 2 phase locking,
Locking granuality,
Deadlock,
Detection and Prevention
Rollback and restart criteria
Timestamp ordering policy and implementation
1st Lecture : Monday / 24 / Sep / 2001
What is a Database?
A Database system may simply be described as a computerized record-keeping system. The DB may be regarded as an electronic filing cabinet, whereby the user can perform various operations, such as:
Introductory Example:
The ‘CELLAR’ Table
Bin |
Wine |
Producer |
Year |
QTY |
Ready |
2 |
Char |
Mouchau |
92 |
3 |
99 |
4 |
Char |
P. Mason |
92 |
2 |
98 |
6 |
Reisling |
P. Mason |
89 |
4 |
98 |
7 |
Cab. Sauv. |
Glen Ellen |
93 |
2 |
99 |
8 |
Cab. Sauv. |
Wolf Bass |
87 |
1 |
99 |
22 |
Merlot |
Glen Ellen |
95 |
6 |
99 |
24 |
Merlot |
Red Nun |
95 |
7 |
96 |
30 |
Rioja |
Olde Seville |
98 |
4 |
99 |
33 |
Rioja |
Stone Street |
89 |
2 |
94 |
Info Retrieved… SQL (type of software)
SELECT BIN, WINE, QTY
FROM CELLAR
WHERE READY < 99
Bin |
Wine |
QTY |
4 |
Char |
2 |
6 |
Reisling |
4 |
24 |
Merlot |
7 |
33 |
Rioja |
2 |
INSERT
INTO CELLAR (BIN, WINE, PRODUCER, YEAR, QTY, READY)
VALUES (28, ‘Pinot Noir’, ‘S.E. Exports’, 93, 10, 02)
UPDATE CELLAR
SET READY = 99
WHERE BIN = 24
DELETE
FROM CELLAR
WHERE BIN = 6
Bin |
Wine |
Producer |
Year |
QTY |
Ready |
2 |
Char |
Mouchau |
92 |
3 |
99 |
4 |
Char |
P. Mason |
92 |
2 |
98 |
7 |
Cab. Sauv. |
Glen Ellen |
93 |
2 |
99 |
8 |
Cab. Sauv. |
Wolf Bass |
87 |
1 |
99 |
22 |
Merlot |
Glen Ellen |
95 |
6 |
99 |
24 |
Merlot |
Red Nun |
95 |
7 |
96 |
28 |
Pinot Noir |
S.E. Exports |
93 |
10 |
02 |
30 |
Rioja |
Olde Seville |
98 |
4 |
99 |
33 |
Rioja |
Stone Street |
89 |
2 |
94 |
Database Management System
(DBMS)
Advantages of using a DBMS
Fact
Possible advantages
Database definition: “Collection of persistent data that is used by the application system
of some enterprise”
Data
Data: data consists of a series of facts or statements that may have been collected, stored, processed and/or manipulated but have not been ordered or placed into context. When data is organized, it becomes information. Information can be used to draw generalized conclusions or knowledge
Many persons are involved in the design, use and maintenance of a large DB with a few hundred users.
· Casual
· Naïve or Parametric
· Sophisticated
· Stand-alone
The DBA is responsible for:
The DBA is accountable for problems such as a breach of security or poor system response time
End-users are the people whose jobs require access to the Db for querying, updating and generating reports; the DB primarily exists for their use
Categories of End-users:
§ Sophisticated End-users: thoroughly familiarize themselves with the facilities of the DBMS so as to implement their applications to meet their complex requirements. Examples of such users include Engineers, Scientists and Business Analysts.
§ Stand-alone users: maintain personal DB’s by using ready-made program packages that provide easy-to-use menu or graphics-based interfaces. Example: The user of a tax package that stores a variety of personal financial data for tax purposes
§ System Analysts and Application Programmers(Software Engineers):
· System Analysts determine the requirements of end-users, especially naïve and parametric end-users, and develop specifications for canned transactions that meet these requirements
· Application Programmers implement these specifications as programs; then they test, debug, document and maintain these canned transactions
A fundamental characteristic of a DB is that: it provides some level of data abstraction by hiding details of data storage that are not needed by most DB users
A data model is a collection of concepts that can be used to describe the structure of a DB and provides to achieve this abstraction
The data types, relationships and constraints that should hold the data
Most data models include a set of basic operations for specifying updates and retrievals on the DB
2. Low-level or physical data models (use by computer specialists)
3. Representational(or implementation) data models (used by end-users and computer specialists)
Many DM’s exist, and are categorized according to the types of concepts they use to describe the DB structure
è High-level or conceptual DM’s provide concepts that are close to the ways many
users perceive data
è Low-level or physical DM’s provide concepts that describe the details of how data is stored in the computer
Concepts that are provided by low-level DM’s are generally meant for: computer specialists and end-users.
Between high-level and low-level DM’s, there is
è Representational(or implementation) DM’s which provide concepts that may be
understood by end-users but that are not too far removed from the way data is organized within the computer.
An entity represents a real-world object or concept, such as an employee or a project that is described in the DB.
An attribute represents some property of interest that further describes an entity, such as an employees name or salary
A relationship among 2 or more entities represents an interaction among the entities, for example, a works on relationship between an employee and a project
Employee Employee name
entity works on
Project
Ex
of a relationship
Schemas
The description of a DB is called a DB Schema
The DB Schema is specified during the DB’s design and is not expected to change frequently
Schema changes are usually needed as the requirements of the DB applications change
New DB systems include operations for allowing Schema changes
The Schema change process is more difficult than simple DB updates
Most data models have certain conventions for displaying the Schemas as diagrams. A display schema is called a schema diagram
Student
Course
Each object in the schema is called a schema construct.
The actual data in a DB may change quite frequently.
For ex: In a Student DB every time a new student is added or a new grade for
a student is entered
The data in a DB at a particular moment in
time is called a DB state or snapshot
It is also called the current state of occurrences or instances in the DB.
Everytime a record is inserted or deleted the DB changes from one state into another. It is important to note the distinction between a DB schema and a DB state.
When a new DB is defined, the DB schema is specified only to the DBMS. At this point, the corresponding DB state is the empty state with no data.
The initial state of the DB is when the DB is first populated or loaded with the initial data
At any point in time the DB has a current state
The DBMS is partly responsible for ensuring that every state of the DB is a valid state
Valid state = a state that satisfies the structure specified in the schema
DBMS Architecture and Data Independence
3 important characteristics of the DB approach
A 3-schema architecture exists to help achieve and utilize the characteristics.
The goal of the 3-schema architecture is to separate the user applications and the physical DB.
Schemas can be defined at 3 levels
A high-level DM or an implementation DM can be used at this level.
Each external schema describes the part of the DB that a particular user group is interested in and hides the rest of the DB from that user group.
A high-level DM or an implementation DM can be used at this level
(ILLUSTRATION
of the 3-SCHEMA)
External View External View
End-users
![]() |
![]() |
![]() |
|||
Describes part of the DB that a particular user group is interested
Conceptual
Model
Describes structure
of the whole DB
Logical data
independence
can change
Internal
Schema
Describes the physical storage structure of
the DB
Physical data
independence
can change
Schema = Description
Data independence (DI)
DI is defined as the capacity to change the schema at 1 level of the Db system without having to change the schema at the next higher level
There are 2 types of DI
LDI is the capacity to change the conceptual schema without having to change external schemas or application programs
The conceptual schema may be changed to expand the DB(by adding a record type or data item) or to reduce the DB(by removing a record type or data item)
Logical DI = Change the conceptual schema
PDI is the capacity to change the conceptual (or external) schemas.
Changes to the internal schema may be needed because some physical files have to be reorganized. For ex. – by creating additional access structures to improve the performance of retrieval or update.
Physical DI = Change the internal schema
Once the design of the DB is completed, the next step is to specify the Conceptual and Internal schemas for the DB.
A language called the Data Definition Language(DDL) is used by the DBA and by DB designers to define both schemas.
The DBMS will have a DDL compiler whose function is to process DDL statements in order to identify descriptions of the schema and to store the schema description in the DBMS catalog.
1) Menu based interfaces for browsing
2) Forms based interfaces
3) Graphical user interfaces
4) Natural language interfaces
5) Interfaces for parametric users
6) Interfaces for the DBA
1. Menu based interfaces for browsing
These interfaces present the user with lists of options, called menus that lead the user through the formulation of a request
2. Forms based interfaces
A FBI displays a form to each user. Users can fill out all of the form entries to insert new data, or they fill out only certain entries, in which case the DBMS will retrieve matching data for the remaining entries.
3. Graphical user interfaces
A GUI typically displays a schema to the user in a diagrammatic form
4. Natural language interfaces
These interfaces accept requests written in English and “attempt” to understand them
5. Interfaces for parametric users
Parametric users(bank tellers) often have a small set of operations that they must perform repeatedly. Goal of the interface is to have abbreviated commands; minimizing the number of key-strokes
6. Interfaces for the DBA
DB systems that contain privileged commands that can be used by the DBA’s staff only.
Ex. of commands
· Creating accounts
· Setting system parameters
· Granting account authorisation
· Changing a schema
A DM represents the organization itself
A DM should consist of 3 parts:
1. A STRUCTURAL part – consisting of a set of rules according to which DB’s can be constructed
2. A MANIPULATIVE part – defining the types of questions that are allowed on the data. I.e. updating, retrieving data from the DB
3. A set of INTEGRITY RULES – which ensures that the data is accurate
The purpose of a DM is to represent data and to make the data understandable
3 categories of DM exist:
1. Object-based
Data Models:
Use concepts such as entities, attributes and relationships
Types of Object-based DM’s:
a) Entity – Relationship
b) Semantic
c) Functional
d) Object-orientated
2. Record-based
DM’s:
The DB consists of a number of fixed-format records possibly as differing types. Each record type defines a fixed number of fields, and typically of a fixed length.
There are 3 types of record-based DM’s:
a) Relational DM Links to traditional file
b) Network DM processing concepts
c) Hierarchical DM more evident
A) Relational DM: Data and Relationships are represented as tables, each of which
have a number of columns with a unique name
BRANCH
E.g.:
Branch No |
Street |
City |
Post Code |
|
22 Deer Rd |
London |
SW14EH |
STAFF
Staff No |
F Name |
L Name |
Position |
Sex |
D.o.b. |
Branch No |
SL21 |
John |
White |
Manager |
M |
01/Oct/1945 |
B005 |
BRANCH DB and STAFF DB are related through Branch No:
BRANCH and STAFF details:
So employee John White is a manager with a salary of £30,000, who works at Branch No B005, which from the first table is at 22 Deer Pond in London
NOTE: There is a relationship between STAFF and BRANCH. A BRANCH office has STAFF
B) Network DM: in the network model, data is represented as collections of records,
and relationships are represented by sets
SL41 |
Julie |
Lee |
… |
Assistant |
9000 |
B005 |
|
|
SL21 |
John |
White |
… |
Manager |
30000 |
Branch No is the connection in all this data
They both work in same branch
C) Hierarchical DM: this is a restricted type of Network model.
Again, data is represented as collections of records, and relationships are represented by sets
1. Data storage, retrieval and update
2. A uses accessible catalogue
3. Transaction support
4. Concurrency control services
5. Recovery services
6. Authorisation services
7. Support for data communication
8. Integrity services
9. Services to promote data independence
10. Utility services
1 A DBMS must furnish users with the ability to share, retrieve and update data in the DB
2 A DBMS must furnish a catalogue in which descriptions of data items are shared and which is accessible to users
A system catalogue should hold data about the users and applications. It is (the catalogue)
Expected to be accessible to users as well as to the DBMS
A system catalogue, or data dictionary, is a repository of information describing the data in the DB: It is, the data about the data or the meta-data. The amount of information and the way the information is used vary with the DBMS
Typically, the System Catalogue stores:
· Names, types and sizes of data items
· Names of relationships
· Integrity constraints on the data
· Names of authorized users who have access to the data
· External, conceptual and internal schemas
· Usage statistics such as the frequency of transactions and counts on the number of accesses made to objects in the DB
· Information about the data can be collected and stored centrally. This helps to maintain control over the data as a resource.
· The meaning of data can be defined which will help other users understand the purpose(s) of the data.
· Communication is simplified, since exact meanings are stored. The System Catalogue may also identify the user or users who own or access the data
· Changes in the DB can be recorded
· The impact of a change can be determined before it is implemented since the System Catalogue records each data item, all its relationships and all its users
3 Transaction support
A DBMS must furnish a mechanism, which will ensure that either all the updates corresponding to a given transaction are made or that none of them is made.
A transaction is a series of actions, carried out by a single user or application program, which accesses or changes the contents of the DB.
Ex: To ADD, UPDATE, DELETE
4 Concurrency control services
A DBMS must furnish a mechanism to ensure that the DB is updated correctly when multiple users are updating the DB concurrently
Concurrent access is relatively easy if all users are only reading data, as there is no way in which they can interfere with one another.
However, when 2 or more users are accessing the DB simultaneously and at least one of them is editing data, there may be interference that can lead to inconsistency in the data
5 Recovery services
A DBMS must furnish a mechanism for recovering the DB in the event that the DB is damaged in any way
6 Authorization
services
A DBMS must furnish a mechanism to ensure that only authorized personnel can access the DB
7 Support
for data communication
A DBMS must be capable of integrity with communication software
8 Integrity
services
A DBMS must furnish a means to ensure that both the data in the DB and changes to the data follows certain rules:
DB integrity refers to the correctness and consistency of stored data
Integrity is usually expressed in terms of correctness
e.g. we may want to specify a constraint that no number of staff can
manage more than 100 properties at a time
A relation is a table with columns and rows
An attribute is a named column of a relation. In the relational model, relations are used to hold information about the objects to be represented in the DB.
A relation is represented as a 2-D table, in which the rows of the table correspond to attributes. Attributes may appear in any order
|
|
|
|
||||
Relation |
|
|
Cardinality |
|
Foreign key
Primary key
|
Branch No |
A column contains values of a single attribute:
E.g. The Branch No columns contain only numbers of existing branch offices
This is a row of a relation. The elements of a relation are the rows or tuples in the table. Tuples can appear in any order.
The degree of a relation is the number of attributes it contains.
E.g. in the branch relation, it has degree 4. this means that each row of the table is a 4-tuple containing 4 values.
A relation with only 1 attribute would be called a unary relation.
A relation with 3 attributes is called ternary, and after that the term n - ary is used.
The cardinality of a relation is the number of tuples it contains. The number of tuples is called the cardinality of the relationship and this changes as tuples are added and deleted.
A collection of normalized relations with distinct relation names
A relation may be referred to as a file, the tuples as records and the attributes as fields
Normal terms Alternative 1 Alternative 2
Relation Table File
Tuple Row Record
Attribute Column Field
A relation has the following properties:
(Domain = the set of allowable values for 1 or many attributes)
There are no duplicate tuples within a relation therefore we need to be able to identify one or many attributes (called relational keys) that uniquely identifies each tuple in a relation
An attribute, or set of attributes, that uniquely identifies a tuple within a relation
The candidate key that is selected to identify each row uniquely. This means that a relation always has a primary key
An attribute, or set of attributes, within
1 relation that matches the candidate key of some (possibly the same) relation
SQL – Structured Query Language
Ideally a DB language should allow the user to:
· Create the Db and relation structures
· Perform basic management tasks, such as the insertion, editing, and the deletion of data from the relations
· Perform both simple and complex queries
A DB language must perform the tasks with minimal user effort, and its command structure and syntax must be relatively easy to learn.
Finally, the language must be portable, that is, it must conform to some recognized standard so that we can use the same command structure and syntax when we move from one DBMS to another.
SQL is intended to satisfy these requirements
SQL has 2 major components:
è DDL (Data Definition Language): for defining the DB structure and controlling access to the data
è DML (Data Manipulation Language): for retrieving and updating data
SQL is:
CREATE TABLE, INSERT, DELETE, etc…
E.g.
CREATE TABLE staff(staffno VARCHAR(5), Lname VARCHAR(15), salary DECIMAL(7,2));
INSERT INTO staff VALUES (‘SG16’, ‘Brown’, 8300);
SELECT staffno, Lname, salary
FROM staff
WHERE salary < 10000;
An SQL statement consists of RESERVED words and USER-DDEFINED words.
Reserved words are a fixed part of the SQL language and have a fixed meaning. They must be spelt EXACTLY as required and cannot be split across lines.
User-defined words are created by user – according to certain syntax rules! They represent the names of various DB objects such as tables, columns, etc…
Most components of an SQL statement are
not case sensitive.
The one important exception to this rule is that literal character data must be types exactly as it appeared in the db.
E.g. if we store a persons name as ‘John’ and then search for it using the string ‘JOHN’, the row will not be found.
SQL is the first and, so far, the only standard Db language to gain wide acceptance.
The ISO SQL standard does not use terms of relations, attributes and tubles. Instead it uses the terms tables, columns and rows.
SQL allows the table produced as the result of the select statement to contain duplicate rows. It imposes an ordering on the columns, and it allows the user to order the rows of a table.
SQL statements
e.g. inserting data using literals into a table
INSERT INTO PropertyForRent (PropertyNo, Street, City, PostCode,
Type, Rooms, Rent, OwnerNo, StaffNo, BranchNo)
values (‘PA14’, ’16 Holland’, ‘Aberdeen’, ‘AB75SU’, ‘House’, 6,
650.00, ‘CO46, ‘5A9’, ‘B0007’);