My advice on computational speed

spacedout · Post by **spacedout** » 26 Dec 2023, 18:40

If one is interested in reducing execution time by a factor of a 100 (as I observed from the output of CheckTimer), than one should avoid routines that reference the SIF like for example

ListInitElementKeyword
ListGetElementRealVec
ListGetElementReal
GetReal

and instead, only use pure Fortran code when assigning the values of all coefficients related to the assembly of the equation Ax = b to be solved.

Seasons Greetings to all

raback · Post by **raback** » 27 Dec 2023, 13:35

Hi spacedout,

There is always balance between generality and speed. Indeed, if you want extreme speed then hard-coding everyhthing is fastest.

Of the above the "Init" routine should only be called once at the start of the assembly. There ListGetElementReal routines are not really that slow when you have constants, or things depending on global variables etc. However, if you have functional dependency then they may be slow. There are four evaluation options: MATC is by far the slowest, followed by LUA, xy-table, and user defined function in Fortran.

If you recommend hard-coding are you comparing to MATC, LUA, xy-table, UDF? You should compare hard-coded fortran to UDF fortran for a fair game.

There are several test called "KeywordHandleTimer*" which illuminates the issue. The idea whit the handles was to allow cheat codes for easy evaluation (up to ten times faster than ListGetReal) and also allow transparent dependence on variables located on nodes, integration points and elements.

-Peter

spacedout · Post by **spacedout** » 28 Dec 2023, 20:27

I should have been more specific as to what my CheckTimer results were. The bulk assembly of the equation Ax=b took .08 sec in cpu time when I calculate its coefficients from a knowledge of some fairly complicated plasma physics equations and insert that corresponding Fortran code inside some variant of ModelPDEevol. On the other hand if I don't but instead read constants from the SIF for that solver to compute these coefficients of the same Ax=b equation, like for example

Material 1
......

Diffusion Coefficient 11 = Real 1.0

Convection Velocity 11 = Real 1.0

Convection Coefficient 1 = Real 1.0

Reaction Coefficient 11 = Real 1.0

Time Derivative Coefficient 1 = Real 1.0

....
End

.......

Body Force 1
....

Field Source 1 = Real 1.0
...

End

then, CheckTimer yields .9 sec.

Therefore it took more than 10 times as long for computations that are just setting coefficients to a constant !

And it will worsen to 5 sec when you replace

Mycoefficient = Real 1.0

by

Mycoefficient = Variable MyVar
Real MATC "tx(0)"

kevinarden · Post by **kevinarden** » 28 Dec 2023, 21:08

Most computer performance bottle necks are I/O to storage devices. If you have to go read or write data to I/O, performance will take a major hit. It is always better to work in RAM if possible. Just the act of having to read a sif file is slow compared to other operations in RAM. If you need that kind of performance and be able to read/write a file you could consider a RAM disk.

raback · Post by **raback** » 29 Dec 2023, 00:42

Hi spacedout,

Are you sure you initialize the handles only once? From my measurements fetching constants using the handles adds very little time since there is a very early exit after one or two logical checks. Compared to the time used for fetching the basis functions for the elements and generating the bilinear forms is probably <1% of time.

Also worth noting that the evaluations are done separately for each integration point. You would save a lot of time taking the evaluation of the coefficients outside the integration loop but that is not the same.

But on MATC is totally agree. Best to avoid the on-the-fly evaluations for anything except quick trials. Now reason really to use it as we also have LUA.

Maybe you can share the example to see if there would be some other explanation.

-Peter

spacedout · Post by **spacedout** » 08 Jan 2024, 00:37

Hello

in the following very simple test, the handles are initialized only once. Just compile SoSimpleTest and run the two SIFs. You will observe from the output of CheckTimer that the material coefficient reading a field variable using Real MATC "tx(0)" yields a time value that is about 40 times higher than when it reads a constant.

Sorry for the delay but I was sick

raback · Post by **raback** » 08 Jan 2024, 13:44

Hi spaceout,

I took your test (with add'hoc mesh) and compared some results for assembly loop.

* Hardcoded constant value: 0.365 s

Dependence on local variable:
* MATC: 5.95 s
* LUA: 0.825 s
* "Equals": 0.419 s
* F90: 0.422 s

Dependence on global variable:
* MATC: 0.375 s
* LUA: 0.366 s
* "Equals": 0.365 s
* F90: 0.419 s

So it is not surprice that MATC (and lesser degree LUA) have significant overhead when they are called for every point. However, the overhead largely vanishes when the dependence is an global variable and the code knows not to evaluate the value each time. F90 is the same since there we cannot assume anything as the code could do also local stuff that we are not aware off.

So I would say that using information that the variable is global is here essential. If you hard code the value you assume that so it is a fair game to compare to dependencies on global variable.

"Equals" (special case of table) and F90 do pretty good job here fetching the value in each node before evaluation having only 10% overhead.

So based on this I would not buy the argument that one should compute the parameters in the code. However, avoiding MATC like a pest in non-global type of use may be a good idea.

-Peter

Code: Select all

! Hardcoded: 0.3645 
!  Time Derivative Coefficient 1 = Real $ne     ! 0.375 
!  Time Derivative Coefficient 1 = Variable Ne ! 5.95 / 0.366 
!    Real MATC "tx(0)"  
!  Time Derivative Coefficient 1 = Variable Ne   ! 0.825 / 0.378
!    Real LUA "tx[0]"
!  Time Derivative Coefficient 1 = Equals Potential   ! 0.4191 / 0.365
  Time Derivative Coefficient 1 = Variable "Ne"  ! 0.422 / 0.419
    Real Procedure "Test" "TestFun"

spacedout · Post by **spacedout** » 09 Jan 2024, 20:16

Maybe I have not recovered from the cold yet. Not thinking straight - forgot to supply the mesh. Here it is attached with the previously reported code.
Frankly, not sure what you mean by Hardcoded constant value , "Equals" and F90.

In the end, I don't think I will use GetElementNodesVec. The only reason I considered this block approach is to help me with nonlinear instabilities I experience with 3 ion continuity equations for the last 5 years (whether I use OpenFoam's FVM or Elmer's FEM). The nonlinear source terms are complicated enough that I just plug in the last iteration's field variable values in them. I am only aware of one technique to deal with convergence problems that can arise in nonlinear iterations and that is relaxation. Unfortunately, I was not successful. But I guess I should start a new thread for this topic.

raback · Post by **raback** » 10 Jan 2024, 13:30

Hi

Some form of linearization of the nonlinear terms may be needed. Many codes have Newton method implemented but that requires some knowledge of the functional form. Or at least what term should be linearized even if using numerical differentiation. If you have several equations then monolithic approach will only benefit the convergence when you linearize the cross terms that result to off-diagonal terms in the monolithic matrix.

-Peter

Elmer Discussion Forum

My advice on computational speed

My advice on computational speed

Re: My advice on computational speed

Re: My advice on computational speed

Re: My advice on computational speed

Re: My advice on computational speed

Re: My advice on computational speed

Re: My advice on computational speed

Re: My advice on computational speed

Re: My advice on computational speed