First, the bottom line. For the same algorithm, the fortran version took 66 seconds to run, while the python version took 33,900 seconds (9 1/2 hours) to run. Fortran is a factor of 500 faster. Ooo! Buuuurn!

Here’s some context. Fortran is a grand old language built for math and science applications. It has been updated through the decades, adding dynamic memory management, tools for parallel computing, and conveniences for the programmer. Python has stolen the hearts of many astronomers and other scientists for its graphical capability and price (free) and general ease of use. The most glaring difference between the two is that python is a scripting language whereas fortran must be compiled. So, to run a python program, you simply run it. For fortran, you have to compile it, then run it.

The wisdom I hear in the halls is that python can be slow compared to C++ or fortran. This statement is often followed by a disclaimer that if you have a pile of python code, you can compile it into a new package, and it gets fast. And, python fans are quick to point out, programming in python is super easy. All three of these statements are tip-of-the-iceberg claims, about which much more could be said.

The science problem I am tackling is to find the age structure of galaxies by examining their integrated starlight. Stars have predictable evolution, so the sum of that starlight should contain fossil relics of the galaxies’ histories. So I was testing a simple version that. Ultimately, I will analyze actual galaxy spectra, but for now I am using a model galaxy spectrum. I throw trial solutions (composed of simpler units of a single age galaxy) at the problem, compare them, reject the ones that don’t match and proceed until I narrow down to the optimum answer.

I tried a “particle swarm optimizer” but it seldom converged to the right answer.

So I tried a brute force approach. The programs I am comparing search a 3-dimensional space: two ages (of theoretical “galaxies”) and the mass ratio between them. Age 1, Age 2, and ratio – three things. Ages can run from 11 million years to 17782 million (18 billion) years. These limits are just built into my model ingredients – they have no profound meaning except to illustrate that the “search space” is pretty big. The brute force optimizer chops up the search space into a number of voxels, finds the best-fitting trial solution from amongst those, then modestly narrows the search space. Then, it loops. Chop-compare-narrow, over and over, until suitable accuracy is reached.

And one more wrinkle: I need to assess the error associated with the two-age solutions, so I put the preceding algorithm into a Monte Carlo loop. That’s fancy science (not geography) talk for a scheme where I apply noise to the target “galaxy” measurements, then seek another solution. That solution will be a tad different. Then I re-randomize and solve again. At the end, I have 50 slightly whacked solutions, but the cloud of solutions gives me the expected error of my solutions, which is extremely useful. That “50” number is arbitrary. I wanted more, but 50 solutions burned 9.5 hours of my life, so I stopped there.

As for the code, “apply noise,” above, implies I need a random number generator. I also need an “interpolator” since the models are tabulated at certain discrete ages, but age values can be any real number. In the case of python, convenient functions have been written:

```
# For my python code, I imported numpy and some other packages
import numpy as np
from scipy.stats import norm # random numbers
from scipy.stats import uniform # random numbers
from scipy.interpolate import interp1d
# For my fortran code, I used a couple of subroutines from "randgen.f" by Richard Chandler
# (for random numbers) and a wee little subroutine called "locate" from Numerical Recipes
# (Press et al., probably first edition) that searches an ordered list.
```

Fortran is much older than python, and therefore less hip, but it has excellent subroutine libraries, even if they are harder to find, and you have to actually paste the code into files rather than typing an import statement. In any event, as I was waiting for the python code to finish its 50 iterations, I rewrote the code in fortran. Writing a code twice is something nobody really would do willingly, but in this case, I did it. And the speed result (one minute versus 9.5 hours) is astonishing. Fortran is 514 times faster in this case.

Well, I haven’t shared my code (I would, if you are interested, just shoot me an email via the contact page) but the number of random number calls are small, so it can’t be that. So it’s a combination of the various convenience routines I used in python. I used numpy.sum() and plenty of other numpy shortcuts, such as array math:

```
# Suppose you had arrays X and Y that are two-dimensional, 100 by 100.
# In python, with numpy, you multiply them like this:
Z = X * Y
# And the result is Z, another 100 by 100 array, where each element in it
# is the multiplication of the corresponding elements in X and Y
C In fortran, that shortcut is not available by default, so you have to,
C firstly, reserve memory space for everything
integer nx,ny
parameter (nx=100,ny=100)
double precision X(nx,ny),Y(nx,ny),Z(nx,ny)
C and then construct something like this
do i = 1, 100
do j = 1,100
Z(i,j) = X(i,j)*Y(i,j)
end do
end do
C so at first glance, fortran code seems to take more typing
```

I also used interpolate1d that is part of the scipy package. As regards the number of lines of code, you should expect fortran to be a bit more wordy, and it is (but not by a factor of 500!).

Language | # lines I wrote | #lines total | Execution time |

python | 185 | unknown | 33900 sec |

fortran | 275 | 550 | 66 sec |

The total number of lines of code would include subroutines that I did not write. Maybe somebody can tell me how many lines of code are in the scipy.interpolate1d package and, gulp, numpy, which has to be huge.

This particular calculation was not graphics-heavy, but I generate graphs constantly. Python is absolutely fabulous for that (matplotlib). A lot of problems really don’t stress the CPU, so who cares if it takes 1 millisecond or 500 milliseconds? In either case, there’s insufficient time to take a sip of coffee before your calculation is done.

However, I did learn a lesson. I was thinking of rewriting my main population synthesis code in python. This no longer seems like a good idea. Maybe when Moore’s law ramps up another factor of 500.

Perhaps you could try accelerating the Python code with Numba (A JIT compiler using LLVM). They argue they have similar times as provided by Fortran. See the overview PDF from a 2018 presentation at CERN (https://indico.cern.ch/event/709711/). I am preparing some content for my students and suddenly google pointed me to your website, so I thought it would be nice to give you some feedback. BTW, provide the code would be awesome (from the reproducibility POV).

LikeLike