In the previous installment, I covered some of the inconvenient aspects of legacy FORTRAN composition and output, explained the origin of what we now consider misfeatures, and presented a case for changing Castlequest to play more comfortably on modern systems.
Apologies
Before I get into today's topic, I have to apologize for the delay in writing. The Castlequest story has been stagnant but not for lack of interest or motivation. Originally I hoped to discuss what we might mean by legacy code and show qualitative views of a project that may help us estimate "legacyness" of a project.
This rapidly devolved into an extended tool-building effort which has gone on far longer than I had intended. As tool construction dragged on, it has been difficult to show even interim results because showing a single indirect representation of the code or project seems to overemphasize the significance of the results or the importance of a single tool.
It's ironic that this article has taken so long to produce because the whole intent of using qualitative graphical views is to make initial assessment faster and easier.
Let's Talk About Code Shape
One method of assessing code "legacyness" I have been exploring is visualizing code "shape". You can often tell the vintage of old Fortran source code just by looking at the arrangement of text on a page without looking directly at the source code.
As an example, FORTRAN 66 is very "flat", monolithic, and linear because the language lacks the block constructs introduced in FORTRAN 77. "Structured" programming didn't arrive until the late 60s or early 70s. Also, F66 dates to the era of punchcards and batch computing on relatively limited hardware. The cost of whitespace, structure, routine calls, etc. was much higher in that era. The shape of F66 code reflects limitations of the language, hardware, and development paradigms available at the time.
In contrast, FORTRAN 77 inherited much of ALGOL's block structure (IF-THEN-ELSE-ENDIF
, DO-ENDDO
). It was released well into the era of interactive computing where developers had immediate access to a command shell, editor, and compiler. The cost of whitespace, routine calls, structure, etc. had dropped and resources for large programs were more widely available. The problems with monolithic and unstructured code were well known by this point. Developers knew decomposition was necessary to build and maintain large systems. The shape of F77 code again reflects the state of the language, hardware, and development paradigms. F77 source code is much more likely to be indented, decomposed into subroutines and functions, and generally be much more readable than F66.
Fortran 90 introduced free-form source format, doing away with the fixed columnar format dating back to the language's introduction in 1956. The form of comments and line continuations were substantially changed and new block structures, modules, user-defined data types, and classes were added to the language. The six character limit on variable and routine names was lifted and lower case names were supported as standard. These advances meant that modern Fortran can have a much less distinct shape than its predecessors; from a distance, modern Fortran can be indistinguishable from C or C++.
For better or worse, Fortran has valued backward compatibility from its earliest standards so it's possible to write F77 in F66 format or F90 in F77 format or just ignore modern constructs entirely and write F66 in F90 format. Due to backwards compatibility, the presence or absence of specific statements or constructs may not be enough to assess a code's "legacyness". Code shape may be a better indicator of "legacyness" than file extension.
While it's an imperfect measure, the shape of source can tell us much about the condition of a Fortran codebase. Compare that to Python where the shape of modern and legacy source code is virtually identical making "legacyness" difficult to detect without detailed analysis, either by automatic parsing or manual inspection.
The Problem With Parsing
Even though parsing can be automated, it is still computationally intensive compared to generating a bitmap image from a text file. Legacy code parsers also need to cope with a variety of proprietary and obsolete constructs whereas an image generator just needs the ability to read text and produce a bitmap. This makes a code shape visualizer much easier to write than a special-purpose legacy code parser or a utility to automate compilation and sift through errors and diagnostics. Special-purpose parsing is valuable, but as I have found out it's also complex and time-consuming. Identifying and visualizing characteristics of legacy code is not exactly straightforward and it's not a simple matter to repurpose a development tool for custom static analysis.
Another problem with repurposing a compiler for legacy code detection is that the source code may be so obtuse or non-standard that it does not compile with modern tools. A code may be of interest specifically because it does not easily compile and the owner wants to estimate the effort needed to return the code to a buildable, maintainable state.
The Benefits of Shape Analysis
Intentionally reducing the fidelity of the source code view keeps the eye focused on the code shape without the risk of staring too deeply into details. There's a limit to what code shape will tell you about a project but that's a good thing.
A quick assessment should be quick. It's too easy to get bogged down looking at specific code constructs and thinking through how they might be improved or eliminated instead of doing a quick high-level inspection.
Fire!
It's time for another tortured analogy to justify rapid low fidelity assessment.
Imagine there's a fire reported at an apartment complex. Firefighters arrive and the first thing they do is survey the fireground. They look very generally at what's on fire (a dumpster, a single unit, one building, multiple buildings, etc.), how intense or widespread the fire appears, and sort out where residents and employees are likely to be. They do this quickly.
Once they estimate the totality of the fire and the characteristics of the fireground, they'll formulate a plan of attack and get to work. Eventually they will check unit by unit, floor by floor, and building by building to evacuate people. Initially however they need a rough idea of what they're up against and will spend the time to gather enough information to make a plan.
Refactoring software does not have the same urgency and risk as firefighting. On the other hand, firefighters don't have to negotiate with the apartment complex owner, mayor, and city council over how many firefighters, trucks, hoses, gallons of water, and time they will be allowed to use to put out the fire.
In our case, gathering enough information to make a plan takes effort, effort translates to time which translates to money, and clients and managers want to see a plan and cost estimate before they risk any money.
Firefighters trade personal safety for freedom of action and occasionally a sweet mustache. They are very judicious about what risks they'll accept but they also have some very direct priorities: preserve life (theirs, the public's) and property in that order. If they need to break down a door or cut a hole in the roof, they just do it. No all-day meetings with managers and stakeholders. Door, meet halligan...
In both cases, you need enough information to make a plan and you need to gather that information quickly. That plan will be created from incomplete information but that's okay. It only needs to be good enough to address big picture risks and estimate the resources required to finish the job. You won't know what you're actually up against until you're in the middle of it. You always risk facing a situation substantially worse than your initial assessment. With practice and experience, you'll learn to "read" the fireground faster and more accurately, you'll make better plans and reduce avoidable risks.
Nice analogy, but what does this have to do with Castlequest? Not much. Let's look at some pretty pictures.
The Shape of ADSCOR
This is ADSCOR.F
, the subroutine that totals the player's score and returns it via the argument II
:
C----------------------------------------------------
SUBROUTINE ADSCOR(II)
INTEGER SAVAR(400)
LOGICAL DEBUG
COMMON DEBUG, ISEED
COMMON /BLOCK2/ SAVAR
II = 0
IF (SAVAR( 4) .EQ. 72) II=II+9
IF (SAVAR( 7) .EQ. 72) II=II+10
IF (SAVAR(12) .EQ. 72) II=II+10
IF (SAVAR(17) .EQ. 72) II=II+10
IF (SAVAR(18) .EQ. 81) II=II+1
IF (SAVAR(19) .EQ. 72) II=II+10
IF (SAVAR(23) .EQ. 72) II=II+10
IF (SAVAR(24) .EQ. 72) II=II+10
IF (SAVAR(28) .EQ. 72) II=II+10
IF (SAVAR(29) .EQ. 72) II=II+10
IF (SAVAR(30) .EQ. 72) II=II+10
C Score is dependent on the number of moves.
ITEMP = SAVAR(90) - 250
IF (ITEMP .GE. 0) II = II - (ITEMP/5)
IF (DEBUG) WRITE(6,8001) II
8001 FORMAT('0 RESULT OF "ADSCOR" IS:', I3)
RETURN
END
Try to ignore the literal text and just focus on its shape. It's difficult. We are familiar with text, we expect it to have meaning, and we are naturally predisposed to look for patterns. We can't really help it. If we want to look at the shape, we need to obscure the text:
A note on color: black is source code, red is a comment, light green is a line label (number), spaces are very light gray, and the page background is a very light cream color to very slightly differentiate space characters from empty space (a lack of characters).
What stands out?
- There are very few comments and no blank lines
- The first few columns contain either blanks, a label, or a comment
- Every comment starts in the first column
- Every line of code is indented by a (mostly) consistent number of spaces with only one further level of indenting
- There is unique content at the top and bottom of the file but there are a number of very similar lines in the middle
- This block of similar lines is the only section of the code which is indented
- Only one line near the end of the file contains a statement label
We can immediately tell this source code is fixed-format, probably F77 or F66. It's rather short, probably a single routine. There's some indenting but nothing that would indicate more than one level of nesting.
In Fortran, line labels specify a GOTO
target, the terminating statement of a loop, or the reference number of a non-executable FORMAT
statement. If this is F66, it probably contains only one loop if it has any; multiple loops may use a single labeled statement as their termination. What we know is the label is not on a simple CONTINUE
statement which is the typical loop termination. It has the right shape for a FORMAT
statement but we can't infer much more than that. We don't know what the labeled statement is, but it's definitely not a CONTINUE
.
The uniform section in the middle of the file is possibly a table of data or a block of very similar I/O statements. Looking closer, it's probably not I/O: each line starts with a two-character word, much too short to contain READ
or WRITE
. It's long enough to contain a variable name or the statements IF
, or DO
. We can rule out DO
since there's only one labeled statement to act as a loop terminator and we can't identify a corresponding number of lines that could contain END DO
or ENDDO
statements. I believe the only statement which can begin with a variable name is an assignment such as A = B + C
so this block is most likely a stretch of conditional statements or assignments.
This seems like a simple linear routine with no obvious signs of convoluted GOTO
logic or deeply-nested loops. This really restricts the type of painful constructs that might appear. The routine is also short which limits how many painful constructs might appear.
Elementary?
There's something just so about this analysis. It feels like a Sherlock Holmes tale with too many "obvious" inferences and details. We looked at the source code first so how many of those brilliant inferences are just post hoc rationalizations?
Further, is this sort of detailed inference any faster or better than just looking at the source code? Did the shape tell us anything we couldn't discover just by skimming the source text or using simple line count and pattern matching utilities? Skepticism is definitely warranted. Keeping this criticism in mind, we can move on to a more complex example.
Differential Analysis
Let's look at the shape of several longer routines. DES.F
, HELP.F
, INIT.F
, and MOVE.F
are all about the same length (120-130 lines), about five times the length of ADSCOR.F
. We'll reduce the magnification so the images fit better on the screen. This has the added benefit of making it more difficult to cheat by counting pixels.
A new color has appeared: cornflower blue indicates continuation characters.
Instead of analyzing each image individually, identifying and explaining interesting features, let's take a more open-ended approach. Compare the set of images and think through the following questions:
- How many routines are there in each file?
- Which routines are most likely to have complex logic, for example loops, conditionals, and
GOTO
s? - Which produce the most diverse output? Hint: Look at how the
FORMAT
statement is represented inADSCOR.F
, both as source code and as a bitmap. - Which routines contain the most static data?
- What is the balance between data and code in each routine?
- Can you tell the difference between comments that provide documentation versus those that disable source code?
- Can you see groups of similar statements? For example, can you identify large blocks of assignment or
FORMAT
statements? - Do the routines look as if they were written by the same person, many different people, or a team using a consistent code style?
- Which routines seem most interesting? If you could look at only one routine in detail, which would you choose?
Now review the list of questions and ask how the answers color your view of how "legacy" a project is and what sort of refactorings may be needed for each routine. This is clearly procedural code; if you were converting it to object-oriented code, which routines suggest attributes and which suggest methods?
Finally, consider how long it took to answer these questions for each file. The Castlequest source code is distributed among 16 files; how long would it take to assess the whole codebase by viewing bitmaps versus skimming source files in an IDE or text editor? By comparison, the NASTRAN-95 project consists of almost 1900 separate source files distributed among 10 directories - would the graphical or the textual approach be more effective on a project of that size? Would either approach be effective at that scale?
Aside: Copypasta
Compare the bitmaps of HELP.F
and INIT.F
: did you notice the large block of code that appears to be duplicated in both files?
Your Own Half-Acre of Misery
Part of my motivation for revitalizing Castlequest was to discover new and useful questions to ask about legacy code and find creative ways of getting answers. You probably aren't refactoring legacy FORTRAN or complex text adventure games, so what good is this in general?
I believe there's value in stepping back from your code and asking these questions. It took a few hours over a day or two to write a tool to create very simple colorized bitmaps from Fortran source code. What do you think you'd learn about your own projects with this sort of analysis? Does your code or language have an identifiable shape? For as well as you know your language, what details can you identify just by looking at the shape of your source code?
Tomb of Horrors
Let's look at one last bitmap: MAIN.F
, the heart and brains of Castlequest. Overall, the project contains roughly 3000 lines of combined source code and comments in total, and of this, two-thirds is concentrated in MAIN.F
. The shape of the code mostly speaks for itself:
Due to the color scheme and image scale it's difficult to see but there are a whole lot of statement labels scattered throughout this routine. There are many loops and at least 650 GOTO
statements in this single routine.
Every one of those GOTO
statements was refactored away. Eliminating the last 10 required a great deal of effort and resulted in some architectural compromises that I'm still not comfortable with. Rather than being an exercise in diminishing returns, it highlighted where deeper object oriented analysis and reorganization is needed. But that's a story for another day.
In the next installment in the Castlequest series I'll map the flow of control in MAIN.F
. There's much we can learn by qualitatively visualizing our code's logical complexity.
References and Credits
My limited understanding of practical firefighting comes from "Collapse of Burning Buildings: A Guide to Fireground Safety", 2nd edition by Deputy Chief Vincent Dunn, NYFD (ret.). It's a fascinating book with interesting criticisms of fire science as well as modern building construction techniques. For as often as software developers use the analogy of firefighting, the reality is that a fire chief that regularly orders crews into hopeless conflagrations with inadequate planning and support and no regard for their well-being is unlikely to keep their job. In the software industry, that level of pathological behavior often gets one promoted.
Halligan bar photo by Marcel Rogge - Own work, CC BY-SA 3.0
Top comments (0)