USER FORUM
(you are viewing a thread; or go back to list of threads)
performance optimization (by Alex B)
Folks,
Performance is pretty terrible when doing translate-extrude-rotate with large number of repetitions to the point Solvespace locks up. Are there any ways to improve on it ?
I'm attaching a simple example. Try changing the number of revolutions to say 40 to see the effect :)
Performance is pretty terrible when doing translate-extrude-rotate with large number of repetitions to the point Solvespace locks up. Are there any ways to improve on it ?
I'm attaching a simple example. Try changing the number of revolutions to say 40 to see the effect :)
(no subject) (by Andrew)
What you can try is breaking you problem into multiple drawings and use link/assemble to produce the final result. While this speeds things up somewhat, the program can still slow down. This does mean however that you can work on the cylinder object without the slow down of the holes and only suffer it when updating the hole array and the final assembly. The attached zip demonstrates this, with final.slvs being the final drawing. Note also the use of a construction drawing to control diameter and height of the holed cylinder.
Hope this helps
Hope this helps
(no subject) (by Alex B)
Good suggestion, thank you. Another thing I was observing is extreme memory usage on order of 8-9GB. Seems a bit extreme for something like this.
(no subject) (by Paul)
Another thing is to consider making the holes shorter. I used an outer sketch distance of 27.5 and a depth of 3.0 to cut the holes. It didn't help in this case and still takes about 1 minute to complete the change to 40 rotations.
I didn't see any significant memory usage. What version of SolveSpace are you running?
I didn't see any significant memory usage. What version of SolveSpace are you running?
(no subject) (by Andrew)
As far as I now, using link/assemble, brings in a solve model, hence creating a column of 'holes', and then linking it into another drawing, and creating the circular array of 30 columns. Linking that in to create the actual holes, save a huge amount of solver work where there are a lot of entities in a model.
(no subject) (by Alex B)
I've built the latest code from the master branch for this.
(no subject) (by Paul)
There have been some modest improvements since this was posted. I've been using that holed_cylinder as a performance test, changing the repeats on the rotate group from 4 to 40. It used to take about 3 minutes to complete with a normal build and 58 seconds with an OpenMP build. Now it's about 44 seconds normal and 25 with OpenMP (I have 4 cores/8 threads).
The real problem here is algorithmic complexity - Many things in SolveSpace have timing that scales as the square of the number of items (surfaces, edge segments, triangles, etc) so it will slow down as complexity increases. Some of these have been improved, but not all.
Running things on multiple cores gives a modest speed increase of 2-4x but that doesn't scale with the geometry. If you're building from source try adding -DENABLE_OPENMP=yes to your cmake command.
There are good solutions to the scaling problems but nobody has time to implement them right now.
The real problem here is algorithmic complexity - Many things in SolveSpace have timing that scales as the square of the number of items (surfaces, edge segments, triangles, etc) so it will slow down as complexity increases. Some of these have been improved, but not all.
Running things on multiple cores gives a modest speed increase of 2-4x but that doesn't scale with the geometry. If you're building from source try adding -DENABLE_OPENMP=yes to your cmake command.
There are good solutions to the scaling problems but nobody has time to implement them right now.
(no subject) (by Alex B)
Thanks Paul, I will try adding that flag.
CPU vs SolveSpace (by Peneloppe PPE)
I'm running a i5 2500k (dual core 12 years old ) and I<m running out of tricks to assemble complex stuff ;) Would a new CPU make a significant improvement? It never seems to reach 100% CPU usage on my machine tho, it floats at ~75% when "thinking" hard ;)
(no subject) (by Paul)
@Peneloppe PPE
Yes, CPU will help. If you're seeing 75% on both cores, that's because it's running some portions of the code on a single core and some on both. The result is an average utilization of 75% for both. A newer CPU should be at least twice as fast after 12 years, and you should also be able to throw more cores at the parallel parts of the code. Oddly that will show a lower percent utilization as the parallel portions speed up more than the single threaded parts. Overall it will be faster.
Unfortunately the scaling problems will just be pushed back a bit. If you double the number of repeats, or parts in an assembly, it will probably end up with the same performance you have now but with a more complex model.
Did you raise the chord tolerance? The default is now 0.1 percent. You may see some performance increase by raising that to 0.2 or even the old 0.5 percent.
Yes, CPU will help. If you're seeing 75% on both cores, that's because it's running some portions of the code on a single core and some on both. The result is an average utilization of 75% for both. A newer CPU should be at least twice as fast after 12 years, and you should also be able to throw more cores at the parallel parts of the code. Oddly that will show a lower percent utilization as the parallel portions speed up more than the single threaded parts. Overall it will be faster.
Unfortunately the scaling problems will just be pushed back a bit. If you double the number of repeats, or parts in an assembly, it will probably end up with the same performance you have now but with a more complex model.
Did you raise the chord tolerance? The default is now 0.1 percent. You may see some performance increase by raising that to 0.2 or even the old 0.5 percent.
Good news (by Eric)
It should be at least twice as fast with the current Edge version.
OpenMP, LTO, 4 cores, 8 threads, Intel handicap: "Generate::DIRTY took 3269 ms"
Force triangle mesh: "Generate::DIRTY took 5651 ms"
OpenMP, LTO, 4 cores, 8 threads, Intel handicap: "Generate::DIRTY took 3269 ms"
Force triangle mesh: "Generate::DIRTY took 5651 ms"
Post a reply to this comment: