Designing an OO System
There are potentially many activities involved in the full lifecycle of an OO
System. Some people will start with an Analysis Model, which describes 'the
problem'. From this, they will create a Design Model, which describes 'the
solution', and then go on to develop from that design, with changes made as the
implementation and the local change control system require. Finally, there will
be the Implementation model of the system as developed - which hopefully is
pretty close to the Design model. All these models will probably use UML and
have some number of backing documents.
In this process the Models are usually more or less inviolate: they may change,
but only with great effort and reluctance.
Another approach is to skip the Analysis Model and go straight to the Design
Model. This model provides a start for development but is viewed very much as a
perishable entity: it is valid for about one hour after it's finished and must
then be viewed (if kept at all) with suspicion. This is much closer to my view
of Models: they communicate something which is valid now and as we all know
better ideas come along later, or constraints we hadn't thought of appear, or
the user changes his mind about something.
One thing I do like to do, though, is to create a fairly high-level model at the
start. This will show me the major areas that I'm going to have to implement at
the very least. I am likely to go through several iterations to get to a model
I am happy with. As I get more experienced, or know more about the problem
domain, this initial model looks more and more like the final implementation -
as long as the user doesn't stick his oar in!
The purpose of this exercise is to take some requirements and produce this
There are a number of ways to get from the requirements to an initial model. The
most common one is pretty na�ve, in a way, but does provide a way to start.
This approach involves going through the requirements and making a list of all
the nouns. After removing duplicates and making plurals singular each noun
remaining is a candidate Class. This is called Textual Analysis:
Nouns and Noun Phrases become classes and attributes
Verbs and Verb Phrases become methods and associations
Possessive Phrases indicate that nouns should be attributes not classes
Next, look through this list and remove items that are unnecessary (redundant or
irrelevant), or incorrect (too vague, or they represent things outside the
scope of the model, or they actually represent actions).
The requirements with nouns highlighted:
|The system will record animals, their names and
owner's names, and the results of genetic tests made on animals.
These results come in the form of a 5-character string describing
the expression of an allele for each allele tested; there
will be 5 or 6 alleles per animal.
A technician will be able to call up the results for one animal
and ask the system to match its test results, via an algorithm
to be supplied, against other results in the database and display
the matching animals and their test results. The technician
will then select from this list animals which are, by his
interpretation of the tests, related to the original animal. This relationship
will be stored for later enquiries.
Our initial pass gives us:
|animal (and matching animal and original animal)
|Owner (owner's name)
|result (and test result)
Database should be eliminated. We may end up with a database class
encapsulating some form of storage, but right now it's too nebulous to be a
List is either a GUI object - thus irrelevant at the moment - or a collection
of animals, and so can be excluded either way.
System is too nebulous.
Technician is an Actor and so may appear on use case diagrams, but does not, so
far, merit being a class.
Algorithm is pretty nebulous: it's going to have to be expanded on but it will
definitely be there; it stays.
5-character string is the same as expression and so we'll drop it as expression
is a more descriptive word.
Name and Owner are clearly attributes of an animal.
This gives us the following list of candidate classes, with class names given:
It's important not to spend too long on this activity, otherwise you'll get
bogged down in analysis-paralysis. Once you're read and assimilated the
requirements document I would guess half a day to a day to get to this point is
enough on all but huge projects. We aren't trying to get every class, just the
most important ones from which the others will appear as we get into more
detailed analysis. By the time you've assimilated the results you possibly have
an idea of the broad shape of the system anyway; I usually skip this step as
I've effectively done it in my head, and go straight on to the next.
This is where you write out the relationships between the classes you've
discovered. I start by showing the connections between the classes and their
cardinality: for instance, one Animal will have 0 or many GeneticTests (the
requirements at this stage don't say anything about this, so I'll assume what
seems likely to me and clear it up later).
One important thing to note is that this diagram does not include the
RelatedAnimal class but does have a relationship from Animal to itself. This is
the 'Open Diamond' relationship, indicating that one animal may have a
relationship to many Animals. I have labelled this 'isRelatedTo'. This
self-relationship does away with the RelatedAnimal class for the moment: all we
know about the relationship is that it exists; we don't have any more
information about it to store (e.g. what the relationship actually is such as
'mother' and so on) and so it's redundant.
There are two questions which this diagram shows up:
TestResult and AlleleExpression have a 1:1 relationship. Are they actually the
same thing? I suspect that they are, and without further information would
RelativeMatcher is linked to every class, given that TestResult and
AlleleExpression are the same. (I haven't bothered to put cardinalities in.)
This is worrying: it's starting to look like a
God-object. When we get the details of the algorithm we may be able to
split this up a bit. It's something to keep an eye on.
This diagram would also show any generalisation (inheritance) relationships I
can find. Since I can't find any, I show none.
The cardinalities I have put in are guesses. In this simple example I am fairly
sure of them, but in a larger system I might not be. They can be missed
entirely, but I find them useful. In any case, don't spend a lot of time on
them at this stage.
Another thing to look out for is Many-Many relationships: These can often cause
problems which can be solved with the addition of an 'Association' class. For
example, a contractor can work for many companies, and a company can have many
contractors. Introduce an Association class called Contract which has one
Company and one Contractor. Do this now if you are sure of the cardinalities.
My Top 10 mistakes
This is similar, though not identical, to a list in
Use Case Driven Object Modeling with UML by Doug Rosenberg and Kendall
Scott. The differences exist because the list is, after all, my mistakes.
10: Don't spend too much time assigning cardinalities to relationships, believe
that those you have assigned are correct, or insist that there must be a guess
for every association
9: Don't do noun and verb analysis so exhaustive you pass out on the way
8: Don't optimise your design for reusability before ensuring you've met the
7: Don't debate whether to use aggregation or composition (open or closed
diamonds) for each part-of association
6: Don't use hard-to-understand names for classes. For example,
AlleleExpression is preferred to FiveOrSixCharacterString. Obvious, but very
important. Always use
Intention-revealing names even if they're long!
5: Don't go directly to implementation constructs such as namespaces, or public
or internal modifiers.
4: Don't create a one-one relationship between database tables and classes if
one source of information is a legacy database.
3: Don't go for premature-patternisation, building cool constructs which may
have little or no relationship to the eventual solution. For instance, I think
that RelativeMatcher may end up as a Mediator with a Strategy. It's fine for me to think that (geeky daydreams - yes!), but not for me to start
planning it in any way (how many of your dreams do you act on?).
2: Don't get attached to your solution yet.
1: Don't truly believe a pixel of it! It's a snapshot of your thinking based
on not enough information. It's a place to start from which you can find
what really needs to be done (which may include starting from somewhere else entirely,
if you can't get there from here).
Looking at the final diagram below I've not done too badly against these rules. I've got
too many cardinalities, true, but I don't trust them so it's OK really. Honest!
The more important rules I think I've done OK on.
Although I've used a diagramming tool to produce the diagram, I would probably
not do so yet in practice. A sheet of paper is much easier to throw away, and
it's likely that I will throw away several versions of it before I'm happy.
Here's the paper version I did:
Second-Cut: Methods and Attributes
One thing I'll mention here, and won't again, is that I always assign an OID to
each object. This may be a GUID or an int assigned by an SQL Identity column
(I'm using GUID more and more these days). Just assume that that every class
has a read-only OID property.
Similarly, I'm not assuming anything about a storage mechanism as yet. In fact,
I'm ignoring it. Obviously there must be some way to get data into and out
of storage but unless the specification explicitly states that something must
be stored or loaded and it has 'interesting' properties I'm totally ignoring
that problem. It's an implementation detail and I'm not worried about it for
this iteration of the design.
In the following TestResult has become AlleleExpression
The requirements with verbs and possessive nouns highlighted:
The system will record animals, their names and owner's names,
and the results of genetic tests made on animals. These results come
in the form of a 5-character string describing the expression of an
allele for each allele tested; there will be 5 or 6 alleles per animal.
A technician will be able to call up the results for one animal and ask
the system to match its test results, via an algorithm to be supplied,
against other results in the database and display the matching animals
and their test results. The technician will then select from this list
animals which are, by his interpretation of the tests, related to
the original animal. This relationship will be stored for later
Our initial pass gives us
|be able to call up
|to be supplied
At this stage, I'm labelling attributes, properties, and methods as
methods or properties almost arbitrarily. What they actually are
is an implementation artefact. What we're doing is creating the high-level
interface for each class, not a program spec.
|This means we have to store the Animal in some way, and tests etc
|Animal.Save(), GeneticTest.Save(), AlleleExpresion.Save()
|Animal to GeneticTest
|Association and method
|Association GeneticTest to AlleleExpression and AlleleExpression.Description
|Allele to GeneticTest or AlleleExpression
|be able to call up
|A GUI Requirement
|Ignore for now
|A GUI requirement
|Ignore for now
|Possibly a method on Animal
|to be supplied
|Irrelevant; just says we don't know details of RelativeMatcher
|But not to schedule - this goes into Risk Analysis
|A GUI Requirement
|Ignore for now
|A GUI Requirement
|Ignore for now
|Out of scope
|This is an action the technician performs, not our system.
|Association of Animal to Animal
|Covered by Animal.Save()
|Could be left out
The associations we have discovered also imply attributes and methods. Here we
must pin down cardinality and navigability. Usually this will involve
discussions with the client. These discussions are fed back into the first cut
diagram as more data is discovered, and classes and associations are clarified.
The model I have so far is:
Things to note about this diagram:
Possibly the most important thing is that it's a simpler design than the first
cut! This may be due to some assumptions I have made, but is certainly affected
by the fact that I have a clearer idea of the design. The second cut isn't
always clearer than the first, but I quite enjoy it when it is.
I am assuming Collection Classes exist, but have not included them explicitly.
They are present as return types for methods.
Animal.Relatives() replaces the self-association, although both are real. If
the diagram wasn't showing the methods I would put the self-relation in. I am
50/50 on including it anyway. At the time of printing the diagram I left it
out. Right now I think it should be in. The criterion for making the decision
is: does it improve communication or is it unnecessary (in this context)
TestResult has been merged into AlleleExpression
TestResult still lives as a method on GeneticTest
RelativeMatcher is still sketchy. It could be folded up as a method of
Animal, but I assume that it's complicated enough that in the interests of
clarity and maintainability it's better off in a separate class. This may go
against YAGNI but in this case,
I think I am (and YAGNI applies to implementation anyway).
Right now, it's not worth any more thought; after all, 'to be supplied' could mean
that the user's going to supply a COM object or something to do it for me!
I have assumed that all associations are navigable both ways: a GeneticTest can
give its AlleleExpressions, and each AlleleExpression can give the GeneticTest
it's part of. The navigability will actually come out of the actual implementation:
will the GUI ever need to navigate that way? I don't really know as yet.
In busier diagrams I would leave most of these navigation methods
out as being unnecessary detail. The ones I would probably leave are
Animal.Test and GeneticTest.TestResults - the rest are probably clutter.
I have decided that on the requirements as they stand there is only one
GeneticTest per Animal, and have firmed up the cardinalities. This means that
GeneticTest could be part of Animal, but a) I'm not sure of this and b) it
seems to me to be sufficiently dissimilar to an Animal as a concept, so I've
left it separate. (Some people insist that all objects/classes must have a
counterpart in the real world, that modelling must match real-world
things. I think this is profoundly wrong for many reasons. However I still find
myself making this sort of decision which appears to be based on what I think
of the real-world. Go figure.)
I have assumed that an Allele has a name so that the technician entering the
test results will have some way of identifying which AlleleExpression is which.
Depending on the matching algorithm this may not be necessary. If it's not
necessary then we probably don't need the Allele class.
As things stand, I have a design without any implementation details, but one
that I can actually use to start implementations. To finish it off, I would list the forms or Web
Pages I think I need - there are three:
Animal Entry, including GeneticTest and Results
Animal Relationships - match and allow the user to select.
Enquiry - to show an Animal's relationships.
These may be cut up in different ways - they could all be in one screen, but for
me it would be too busy and complicated. Anyway, this is what I'm going to
propose to the user.
Now it's time to have some discussions with the user to confirm my assumptions
and suggestions - or otherwise.