Chapter
5
Design Decisions and
Project Development
Introduction
This
chapter discusses the decisions that were made during the course of the project
about how the system would operate. These decisions were sometimes constrained
by time, complexity, or other external factors, and sometimes they were made to
try and ensure that the system was easy to use or met its requirements. Each
decision listed includes the alternative solutions that were considered, with
the reasoning behind the final decision.
The
chapter will also show how the system could have been developed differently if
the constraints had not existed. The chapter is divided into three
sections:
5.1 Design Decisions
5.2 Constraints Upon Design Decisions
5.3 Development Method
Employed
5.1. Design
Decisions
During development a number of
decisions had to be made about how to implement certain aspects of the
design.
The main decisions that were made
were:
- What programming language to use for the
project
- How to create the user interface
- How to retrieve the web pages
- What to do with graphics
- How to handle tables of data
- How to present the formatted information to the
user
- How to store the database of pages
- What to do with shockwave flash presentations and other
active content
- How to most efficiently get the pages from their
respective locations
- What links (if any) to follow on the pages, and how
deep to follow links
- How to handle tables etc. on sub pages
- How to store users subject preferences
- How much customisation to allow for individual page
downloads
- Whether or not to store profiles for multiple
users
The alternative solutions to
the above problems, with their pros and cons, which solution was chosen and why
are listed below:
- What programming language to use for the
project
- Visual Basic – This is the easiest option, it
would reduce the amount of time needed to create the user interface, but it is
not as flexible as some other languages.
- Perl – A good method of developing a server-based
solution, but the project was not intended to be server based.
- C/C++ – This is a very powerful language, and is
well suited to an application such as
this.
C/C++ was chosen because it
would be more challenging than VB, and it is what would be used by a
professional organisation if they were do develop the
system.
- How to create the user interface
- Using a text-based menu system – This is the
simplest solution, but the resulting application may not be usable with a screen
reader. It also does not look very professional to a sighted user.
- Using Visual Basic linked to C/C++ code complied into
DLLs – This simplifies construction of interface, but does not allow quite
as much control over individual aspects of interface.
- Using MFC – A good way of developing the
interface quickly, but the author has no experience in it, and there would be
insufficient time to learn it.
- Using a Windows application programmed in C – a
good compromise, experience gained in lectures can be used to aid in
development.
A windows application
programmed in C was chosen, because this allows a high level of control over the
operation of the interface, and allows a very professional looking application
to be produced.
- How to retrieve the web pages
- Windows Socket calls – Easy to use and powerful
enough for the project’s needs.
- COM OLE Object – This is a good method, but the
author has no experience in using COM objects, and there would be insufficient
time to learn about them.
COM OLE
objects would have been a good method, but insufficient documentation was found
to make this feasible – the timescale was too short to spend a lot of time
researching it. Windows Sockets were chosen instead.
- What to do with graphics
- Ignore them – This is the easiest
option.
- Try to interpret them – Unfeasible.
- Use the ‘alt’ tag to provide a textual
description – This is the best solution as long as alt tags are used, on
many pages they are not.
The option of
using the alt tag where used was chosen as the best option because it attempts
to convey some of the meaning of the image to the user.
- How to handle tables of data
- Ignore them – This is the easiest option, but
impractical, too much information would be lost.
- Automate the extraction of data using some rules
– The rules could be complicated and are unlikely to work well for all
cases.
- Allow the person who maintains the page database to
specify how the data should be extracted – Probably the best, but time
consuming, and very susceptible in changes to page layout.
- Automate extraction of data, but based on some settings
customised to the page by the person who maintains the page database, with extra
rules to cater for changes to the page layout – This is a good
compromise.
The
“partial-automation” option was chosen as a compromise between ease
of setting up and accuracy of information. For some tables, a slightly altered
version of the guidelines for “Table rendering by non-visual user
agents” in section 11.4 of the W3C HTML 4.0 specification [HTML]
could be used.
- How to present the formatted information to the
user
- Using text controls in a windows application –
This is overly complicated, and may not be easily usable with a screen reader.
It would mean that no other software would be needed for it to work.
- Using HTML pages – This allows for easier
hierarchical structures to be built up and browsed interactively by the user,
but it does require a web browser to be installed.
- Using a text file(s) – The simplest solution, but
limited capacity for hierarchical documents – the user cannot easily jump
to the next section.
- Using a word document(s) – Could be difficult to
set up the link between the application and word, suffers from the same problems
as text files.
Presenting the
information using HTML pages was chosen because they allow easy creation of
hierarchical documents and the use of talking web browsers.
- How to store the database of pages
- Some form of SQL database, possibly accessed through an
ODBC link – This is time consuming, meaning slow download times, and it
would also be difficult to implement.
- Comma delimited text file – This is simple,
small, fast to download, easy to program, and easy to maintain. It has the
disadvantage that it may be more difficult to understand when setting up and
maintaining pages than a database.
The
comma delimited text file was chosen to store sites and categories, because it
can hold the same information as a database without the added complications
during implementation.
- What to do with shockwave flash presentations and other
active content
- Ignore it – This is the only realistic
option
- How to most efficiently get the pages from their
respective locations
- Download pages sequentially – This is easier to
program, but has the possibility of not using the full bandwidth of the modem,
meaning slower download times.
- Download pages concurrently – This would be
quicker for larger bandwidth connections, but is more complex to
implement.
Downloading the pages
concurrently was chosen as the best solution, because it minimises the amount of
time spent online, and therefore the cost to the user.
- What links (if any) to follow on the pages, and how
deep to follow links
- Do not follow any links – This is the simplest
solution, but has potential for missing out important information.
- Arbitrary for all pages – This is inflexible,
pages are likely to need treating individually.
- Dependant on page, set at time of setting up page
database – This is a simple solution, but it limits the users control over
download times for specific pages.
- Set by user for each page – This is the most
complex solution – It would allow complete control; the user could choose
to download extra information on a subject they are especially interested in,
but it would add an extra level of complexity to the
system.
Setting up link following
policies for individual pages at the time of setting up the page database was
chosen. This provides enough flexibility for most people. Link following is
limited to one depth, because it greatly simplifies the code and reduces
download time without a great loss of content.
- How to handle tables etc. on sub
pages
- Use the same settings as for the main page – This
method does not work in practice because sub pages are rarely laid out the same
as the main page.
- Have a separate setting to apply to all the sub pages
– A better solution, but it is unlikely that all the sub-pages will have
the same layout.
- Have different settings for different groups of
sub-pages using some sort of filter to decide which settings to use – A
complex solution, but the only way to ensure a solution that works in most
cases.
The more complex solution was
chosen, using settings applied to sub-pages grouped using filters of some sort.
Choosing a simpler system would limit the pages that could be used with the
system too much to be practical.
- How to store users subject
preferences
- Using registry keys – This is the preferred
method for use in Windows 9x applications, but it makes it more difficult to
copy preferences to another computer if you wished to do so.
- Using a standard windows INI file, storing ID numbers
of pages chosen for download – This is a good all-round solution, which
adheres to standards set by other windows applications.
- Using a text file stored in the application directory
– This is the simplest solution, but it does not adhere to windows
standards
- Using some form of SQL database – This is
unnecessarily complex.
- Maintaining a remote profile on the UMIST server
– This would allow the user to move to different computers. It would be
complex, and would create possible security problems transmitting data to the
server.
The INI file method was
chosen, with an INI file stored in the application directory to store the
preferences. It is a simple solution that meets the needs of the
system.
- How much customisation to allow for individual page
downloads
- None – This is the simplest solution.
- Allow user to specify links to follow – This
would allow greater control over what sections of sites were downloaded, but
would add a whole extra level of complexity.
- Allow user to specify a whole range of customisations
for individual pages – This would allow the most control, and allow the
user to limit the amount of time spent downloading information, but suffers from
the same problems as the previous
option.
It was decided not to allow
the user any control over individual downloaded pages, because the extra
complexity is not justified in terms of the benefits it would
bring.
- Whether or not to store profiles for multiple
users
- Do not store multiple profiles – This is the
simplest solution.
- Store multiple profiles in the INI file – This
would allow many different users to set up their own individual page
preferences, and would be useful where a computer is being shared between more
than one user. The extra complexity in terms of implementation would be quite
substantial.
It was decided not to
allow multiple profiles because in most cases this feature would not be
necessary, and it would add a great deal of complexity to the
system.
5.2. Constraints Upon Design
Decisions
Some of the design decisions
listed above were constrained by external factors, such as time, and the
author’s level of expertise in the
area.
This section describes which decisions
would have been taken differently in a perfect world where development was not
limited by these constraints.
Ideally the
system could have used an Internet Explorer COM object to implement the
connection to the Internet and the downloading of pages. This would have
provided a more flexible base from which to build the rest of the application.
It would have cut down the amount of code that relied on specific versions of
various windows components, thus making the code more future-proof. It would
have made the application easier to adapt to changes in new versions of the HTML
language by using the built-in functions of the Internet Explorer object to
interpret the new features appropriately. It could also have made use of the
built in functions of the Internet Explorer model to convert the web pages to
plain text, although it does not really do as good a job as the parser outlined
in this report.
It may also have been a good
idea to use MFC to create the windows interface, as this simplifies the
construction of windows applications, and makes links with COM objects easier to
implement. It is also supported by a number of standard template libraries that
aid in the development of systems like
mine.
Unfortunately insufficient information
about how to use COM objects in C++ was found, and it was therefore impossible
to do so. There was not enough time to learn how to programme MFC applications,
so this was also impractical.
The project
would have benefited from a much more reliable way of making a dial-up
connection than that used. Unfortunately, there was not sufficient time to allow
coding of a more complex system. Getting access to a machine running Windows NT
that had a MODEM also proved problematic. This precluded implementing any sort
of automated dial-up networking for this operating
system.
5.3. Development Method
Employed
The method used when developing
this project followed a spiral development model. It used a system of
incremental prototypes, releasing new versions for testing every time a major
new function was added. This enabled testing that each new function worked fully
before adding any more functions. This simplified the debugging process, as any
new bugs were usually limited to the new functions or problems with the
integration of the new functions with the old
ones.
Each time a new version was released
for testing it was accompanied by a document containing “release
notes” that listed new features that had been added, and known problems
with the system. A copy of the most recent set of release notes is included in
Appendix D. These release notes allowed anyone who tested the project to
concentrate on the new features to ensure that they worked
properly.