Results 1 to 10 of 10

Thread: Final Gather performing very differently

  1. #1
    Join Date
    Jan 2013
    Posts
    50

    Angry Final Gather performing very differently

    We came across a very weird issue when rendering a rather complex scene on different machines in our cluster running RedHat 6, Maya 2016, mental ray 3.13

    On machine 1 We used a sample frame and the scene would render in about 7 minutes on one machine using about 1 minute for final gather and the rest actually doing the tracing. Since we were rendering with 16 threads the process would show about 1600% cpu usage. Checked this using top from the command line.

    On machine 2, much newer, we rendered the same frame from the exact same scene and noticed that the final gather took 5 minutes. Checking with top I could see that only during final gathering the cpu usage wouldn't go higher than 600%. This means that only about 6 cores are used. Weird thing is that while raytracing it actually went up to 1600% again.

    Since the machines have more than 16 cores we always create multiple render jobs using at least 16 cores. On machine 1 this only gives us a small slowdown since communication processes are taking time and so the perfomance drops a little meaning we will get something like 1400% per job when using 4 jobs on that 64 cpu machine. Doing so on machine 2 which happens to have 128 cores the performance of each of these now 8 jobs would drop below 100%! This means that it's worse than a 1-CPU job Remember we're talking final gather stage only. When tracing machine 1 and 2 behave the same and use all cores at maximum power.

    Has anyone stumbled over this weird phenomenon?

    Any help is highly appreciated since this is a real job stopper and most of our machines are affected by this - only 2 of them 3 years old are doing fine.

    Matthias

  2. #2
    Join Date
    Dec 2004
    Location
    Marina Del Rey, California
    Posts
    4,143

    Default

    Are you examining memory usage?

    For FG each process has to see everything used for FG in the scene. It is not as memory efficient as shooting regular rays for each tile/bucket/job within a single render process.
    Barton Gawboy

  3. #3
    Join Date
    Jan 2013
    Posts
    50

    Default

    Quote Originally Posted by bart View Post
    Are you examining memory usage?

    For FG each process has to see everything used for FG in the scene. It is not as memory efficient as shooting regular rays for each tile/bucket/job within a single render process.
    Hi Bart, thanks for the reply. I don't think it's a memory issue. The older but faster machines have 256GB of RAM while the newer but slower ones have 1TB of RAM installed. We're running a maximum of # of cores/16 jobs on each machine. This would mean even for the older ones that they have 64GB of RAM available for each job.

    Looking at the logs I can see that the scene itself is only less than a GB during FG. Here's a snippet:

    Code:
    JOB  0.7    812 MB progr:    37.7%    computing final gather points on ezvlvc011.7
    JOB  0.18   815 MB progr:    37.8%    computing final gather points on ezvlvc011.18
    JOB  0.12   816 MB progr:    37.9%    computing final gather points on ezvlvc011.12

  4. #4
    Join Date
    Dec 2004
    Location
    Marina Del Rey, California
    Posts
    4,143

    Default

    You hint this may be related to machine age?

    Are the newer machines greater than 32 physical cores?
    Barton Gawboy

  5. #5
    Join Date
    Jan 2013
    Posts
    50

    Default

    Quote Originally Posted by bart View Post
    You hint this may be related to machine age?

    Are the newer machines greater than 32 physical cores?
    The newer machines are 4-way Xeons systems with 18 cores per CPU. We opened up Hyper-Threading to get a little more performance. This gives us a total of 144 cores per machine. The ones from 2015 showing the same behavior. The have 4x16 physical cores totaling in 128 cores per machine.

  6. #6
    Join Date
    Dec 2004
    Location
    Marina Del Rey, California
    Posts
    4,143

    Default

    The mr 3.13 in 2016 does not support more than 64 threads, or 32 physical cores. The 1.0.1 release we made for mr 3.14 does support that. This might be what you are running up against for a given mental ray process.

    If you have 18 cores, and 36 threads per CPU, then only one CPU will be enabled, as the next one would bump it over 64 threads. The machine with 16 cores per CPU with hyperthreading making for 32 threads per CPU, then it should be able to use two CPUs.
    Barton Gawboy

  7. #7
    Join Date
    Dec 2004
    Location
    Marina Del Rey, California
    Posts
    4,143

    Default

    Is it possible you could try mental ray 3.14 for Maya 2016? Using version 1.0.1
    Barton Gawboy

  8. #8
    Join Date
    Jan 2013
    Posts
    50

    Default

    Quote Originally Posted by bart View Post
    The mr 3.13 in 2016 does not support more than 64 threads, or 32 physical cores. The 1.0.1 release we made for mr 3.14 does support that. This might be what you are running up against for a given mental ray process.

    If you have 18 cores, and 36 threads per CPU, then only one CPU will be enabled, as the next one would bump it over 64 threads. The machine with 16 cores per CPU with hyperthreading making for 32 threads per CPU, then it should be able to use two CPUs.
    Good to know but does unfortunately not apply to the current case where we are usually running 16 threads per job.

  9. #9
    Join Date
    Jan 2013
    Posts
    50

    Default

    Quote Originally Posted by bart View Post
    Is it possible you could try mental ray 3.14 for Maya 2016? Using version 1.0.1
    This is unfortunately not an option since the cluster nodes that show this behaviour are headless and we do not own any mental ray licenses for the newer versions.

  10. #10
    Join Date
    Dec 2004
    Location
    Marina Del Rey, California
    Posts
    4,143

    Default

    ok since these are redhat 6, I wonder if process related memory control could be sneaking into this. Or a solution to this.

    Also, in the render report with progress level verbosity during FG, do you notice any differences. Each report line represents a job/thread completion. It notes the memory usage and the thread id is at the end. Example in the middle of FG on a super simple render I just did (explaining low memory):
    Code:
    JOB  0.6  16:09:35    51 MB progr:    38.7%    computing final gather points on [my_machine].6
    JOB  0.4  16:09:35    52 MB progr:    38.7%    computing final gather points on [my_machine].4
    JOB  0.9  16:09:35    52 MB progr:    38.8%    computing final gather points on [my_machine].9
    J
    Barton Gawboy

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •