Linear Solver - Different behaviours on different OS

Clearly defined bug reports and their fixes
Post Reply
kfourteau
Posts: 15
Joined: 25 Jun 2020, 16:45
Antispam: Yes

Linear Solver - Different behaviours on different OS

Post by kfourteau »

Hello everyone,

I am currently experiencing a strange behaviour of Elmer, and I really do not understand its origin.

In a nutshell: though using the same Elmer version (9.0), same sif file and same mesh, I am getting very different results on different computers.

My workflow is simply to solve the strain-imposed compression of a piece of porous material, in steady-state and using an isotropic non-linear viscoplastic rheology for the constitutive material. The porous media meshes are produced with the CGAL library and converted to the Elmer format with a personal program.

I'm running the simulations on two different HPCs, one based on Centos 7 on which I did not personally do the installation, and another based on Ubuntu 20 where I manually compiled Elmer (linked against, MMG, Hypre, and Mumps). Both machines use Elmer 9.0. After the installation on the Ubuntu machine, 85% of the tests pass the ctest.

The resolution of the linear system does not occur properly on the Ubuntu machine. I've put a dropbox link below towards an archive with a test case. When I'm running the simulation with the mesh mesh_100 (sif file is Compression_100.sif), the pre-conditioner manages to obtain quite low residuals (1e-6), which is below my standard convergence criterion (1e-4). However, when I look at the proposed solution it is clearly faulty (and just correspond to the initial solution provided in the sif). However, when I run the same case of the Centos HPC, everything looks fine.

From there I've tested a few things on the Ubuntu machine:
- Modifying the initial condition from imposing the average strain everywhere to zero everywhere. The residual after the pre-conditioner remains below the convergence criterion, and Elmer simply outputs the initial condition.
- I dropped the convergence criterion to 1e-10. In this case the residual after the pre-conditioner is thus above the criterion, and in this case the linear solver diverges.
- If I use a smaller mesh (mesh_50 in the attached archive), then everything occurs nicely (residuals are not so low just after the pre-conditioner, and the linear solver converge afterwards). Here the Centos and Ubuntu machine behave similarly.
- If I use a large mesh with a simple geometry produced with GMSH (not with CGAL) and then convert it with ElmerGrid, everything occurs nicely.
- I've tested on another Ubuntu 20 machine. I have the same behaviour.
- The problem occurs whether I am running the problem sequentially or in parallel.
- If I increase the "Critical Shear Rate" of the material, things start to look normal on Ubuntu.
- I also realised that simulations on the Ubuntu machine are prone to producing the error "WARNING:: RealBiCGStab(l): kappal^2 is non-positive, iteration halted" during the Linear Solving stage. It is something I seldom encounter with Centos.

I am quite lost, and have no idea why the two machines behave differently. Visibly it could be related to the mesh (as it only occur with sufficiently large CGAL meshes), to the libraries/os (as the behaviour is different on Ubuntu and Centos), and/or to the way the effective viscosity is computed in the material law.

Does any one have any idea on the origin of this problem? Let me know if you require some more informations (specific version of the libraries, etc).

Thanks a lot!
Kévin

DROPBOX LINK: https://www.dropbox.com/s/pnd6on1bcdl2s ... n.zip?dl=0
kevinarden
Posts: 2221
Joined: 25 Jan 2019, 01:28
Antispam: Yes

Re: Linear Solver - Different behaviours on different OS

Post by kevinarden »

The Centos is based on fedora/red hat. Ubuntu is Debian based. There is no readily available binary for fedora/red hat the code has to be compiled. The Ubuntu has a binary release and a nightly update. Therefore, even though they are bot Elmer 9, the Centos one is likely static whereas the Ubuntu version may be updated everyday, and there are nearly daily code changes. This means it is possible the two installed codes are not the same.
kfourteau
Posts: 15
Joined: 25 Jun 2020, 16:45
Antispam: Yes

Re: Linear Solver - Different behaviours on different OS

Post by kfourteau »

Thanks for the response (and sorry for the delay on my side, I wanted to do a few more tests before posting).

The two Elmer installation on Ubuntu and Centos were manually compiled (no packaged binaries involved). I tried installing several version of Elmer on the Ubuntu machine (sources some from Elmer 8.4, from the elmerice branch on github, etc) and the behavior is still the same: if the "Critical Shear Rate" parameter in Glen's law is too low, the computation of the residuals does not make sense on Ubuntu, while it seems fine on Centos. If i increase this parameter, Ubuntu and Centos behaves similarly.

Reading the code, I would expect that having a very low Critical Shear Rate simply implies that the linear behavior is never enforced when computing the effective viscosity of the material. I would thus have said that setting a very low value for the Critical Shear Rate should not have any impact on the simulations. But clearly is has one on my Ubuntu simulations.

For now, I increased this Critical Shear Rate parameter to get rid of the very strange computed residuals. But I'm still wondering what is concretely happening and if it means there's something wrong in some of the installs.

Kevin
raback
Site Admin
Posts: 4801
Joined: 22 Aug 2009, 11:57
Antispam: Yes
Location: Espoo, Finland
Contact:

Re: Linear Solver - Different behaviours on different OS

Post by raback »

Hi Kevin,

This is indeed strange behavior.

When this happens I usually try to isolate the problem:
* What is the smallest size of case the problem occurs
* What is the 1st timestep where the problem occurs

Then I raise the "Max Output Level" to at least 20 or so, 32 is the maximum. This includes some additional debugging info. Run the exactly same problem on the two platforms (or two versions etc.) and direct the output to a file.

Then take some advanced diff tool - my favourite is "meld" - and see where the two cases start to diverge. If you share such two files, I can also try to make my best guess what is happening.

-Peter
tzwinger
Site Admin
Posts: 99
Joined: 24 Aug 2009, 12:20
Antispam: Yes

Re: Linear Solver - Different behaviours on different OS

Post by tzwinger »

Hi Kevin,
I just read that thread - I, personally, would not be that surprised to get different results on different installations (guess also processors?) from Glen's flow law that is regularized with an extremely low value of the critical shear rate. I do not exactly know the setup of your case, but I could imagine that differences in hardware, compilers, pre-compiled libraries, etc. can lead to accumulated deviations. Simply, you have very strong singularity with that kind of flow law if approaching zero shear. But be aware, that if you set too high of a value for the critical shear rate, you are basically solving a Newtonian fluid. What one could do is to you run the strain-rate solver simultaneously and perhaps evaluate the order of magnitude of your strain rate tensor invariant and compare it to the threshold value in those areas where you get deviating results?

Regards,
Thomas
kfourteau
Posts: 15
Joined: 25 Jun 2020, 16:45
Antispam: Yes

Re: Linear Solver - Different behaviours on different OS

Post by kfourteau »

Hi everyone,

Sorry for the long response, I did not find the time until recently to test the issue a bit more.

The strain-rate solver tells me that I indeed have a large variability in terms of local strain rates in my structure (basically going down to the Critical Shear Rate). This suggests that when the parameter is too low, the strain rate might approach zero in some locations and create some singularities. I'm still not sure why different hardware/compilers would handle them differently, but I trust you that it could be some expected behavior of computers.

By setting some reasonable Critical Shear Stress (still allowing about 10 orders of magnitude between my largest strain rate and the Critical Shear Rate), it seems that the issue is resolved. I've checked that by increasing the Critical Shear Rate of an order of magnitude, my final results stay the same (less than 1% deviation in my diagnostic variable, which is the volume averaged stress tensor over my structure).
So my guess is that the solution was simply to find the Critical Shear Rate that properly regularizes Glen's law in the few spots where the strain rate drops, but still keeps a non-linear behavior in the zones where the shearing occurs.

Since the simulations run smoothly and give physically acceptable results, I'll keep this set-up.
Thanks again for your help and your ideas !

Kevin
Post Reply