My Physics analysis code runs in 3 modes. For 1996 data, for 1997 data and for both year's data. The 3rd mode also runs over different values of cuts. This is so as to build a set of results that can be used to estimate systematic errors in the analysis. For example, changing the value of the cut on Calorimeter energy can provide an indication of how well is is modeled.
To run all these checks a number of "Steering card" files were created. Each one listing out the cuts to be made for a single batch run. The analysis would then run over the entire set of '96/'97 data for each batch in turn and build a list of result ntuples.
Now, while running '96 and '97 individually was successful, when running both, on the second iteration the process crashed with the following.
ANALYSE: File contains 14614 Events | |
Program received signal SIGSEGV: Segmentation fault - invalid memory reference. | |
Backtrace for this error: | |
#0 0xb5aa8e63 | |
#1 0xb5aa7f8e | |
#2 0xb7fbabbf | |
#3 0xb6f7fa9f | |
#4 0x4a0c95 | |
#5 0x4b13c7 | |
#6 0x4ba4bf | |
#7 0x4ad0c4 | |
#8 0x4ad5ba | |
#9 0xb57d0f20 | |
#10 0x494ec0 | |
Segmentation fault (core dumped) |
The dreaded Fortran segmentation fault!
In years gone by, I would have taken the approach of writing loads of print(*,*) statements around the "ANALYSE: File contains..." log and eventually homed in on the offending code but surely I can do better now!
It turns out I can. I can add a DEBUG flag: -fcheck=all -g
../data/mc96/mc96_1.rz | |
ANALYSE: File contains 14614 Events | |
Program received signal SIGSEGV: Segmentation fault - invalid memory reference. | |
Backtrace for this error: | |
#0 0xb5a8de27 in ??? | |
#1 0xb5a8cf8e in ??? | |
#2 0xb7f9fbbf in ??? | |
#3 0xb6f64a9f in ??? | |
#4 0x44634d in isr_cuts96_ | |
at ./src/isr_cuts96.fpp:408 | |
#5 0x457df8 in anevent96_ | |
at ./src/anevent96.fpp:147 | |
#6 0x46282a in do_all_ | |
at ./src/do_all.fpp:131 | |
#7 0x453a0a in MAIN__ | |
at ./src/analysis.fpp:44 | |
#8 0x453f2a in main | |
at ./src/analysis.fpp:66 | |
Segmentation fault (core dumped) |
This is much more useful
(fcheck=all adds runtime checks for array bounds amongst other checks, -g adds extra information for the gdb debugger)
The line in isr_cuts96.fpp is:
call hf1(510,temp,1.)
This is filling a histogram with data. The fact it's failing on the second iteration of the batch suggests the histogram isn't being correctly cleared down after the first run.
The module I have to do the clearing down is called terminate.fpp
A review of the HBOOK User Guide led me to needing this call
CALL HDELET ID
Action: Deletes a histogram and releases the corresponding storage space.
Input parameter description:
ID identifier of a histogram. ID deletes all existing histograms.
Adding this to terminate.fpp fixed the problem and I am now able to run multiple batches in the same process
call hdelet(0)
Why this ran OK back in 2000 is probably a combination of the older CernLib library and Fortran compiler.
No comments:
Post a Comment