|
|
|
|
| |
| GEMS Box -- A Turnkey System |
| |
|
|
|
| |
Purchasing a parallel EM software package is just a click away for a company or an institute. However, using a parallel package to solve problems on a high performance cluster is a quite complex system problem. For instance, the user has to take care of the hardware including computational nodes as well as login and I/O nodes, network system, operating system and MPI, cluster management software and so on. All of these matters require the professional team for system design and maintenance.
In order to solve this type of problems, we offer a total system solution, which includes GEMS parallel EM software package, hardware (third party), network (third party), operating system, cluster management software, compilers (third party), MPICH1 and MPICH2. Everything has been already built in one box (9U (Gigabit Ethernet) or 13U (Fiber network) rack), and the users can generate their own results when they open the GEMS Box. In addition, we offer the system customization for different costumer requirements such as number of computational nodes, processor type, memory, network devices, operating system, cluster management software, and compilers.
In addition to GEMS software package, GEMS Box system has been tested for standard Linpack and its actual performance can reach up to 82 Gflops. |
GEMS Box 
 |
|
| |
|
|
Parallel Efficiency |
| |
|
|
|
| |
|
|
|
| |
| GEMS Box Performance |
| Problem size ( Unit : Cells ) |
Simulation time ( 1,000 time step ) |
| 100x100x100 ( 1 milllion ) |
13 sec. |
| 200x200x200 ( 8 milllion ) |
59 sec. |
| 300x300x300 ( 27 milllion ) |
2 min. 54 sec. |
| 400x400x400 ( 64 milllion ) |
6 min. 25 sec. |
| 500x500x500 ( 125 milllion ) |
13 min. 10 sec. |
| 600x600x600 ( 216 milllion ) |
20 min. 30 sec. |
| 700x700x700 ( 343 milllion ) |
32 min. 40 sec. |
| 800x800x800 ( 512 milllion ) |
48 min. 30 sec. |
| 900x900x900 ( 729 milllion ) |
65 min. 30 sec. |
| 970x970x970 ( 913 milllion ) |
87 min. 10 sec. |
|
| |
|
| |
System Performance |
| |
|
|
Peak performance: 82GFlops |
| |
|
|
Parallel efficency: 82% |
| |
|
|
|
| |
System Components |
| |
|
|
16 or 32 computational cores (4 nodes, 8 dual core CPUs or 8 quad core CPUs) |
| |
|
|
1 login node (1 dual core processor) |
| |
|
|
1 I/O node (shared with login node) |
| |
|
|
64GB memory (may extend to 128GB) |
| |
|
|
KVM and 15'' LCD |
| |
|
|
|
| |
Network System |
| |
|
|
Gigabit Ethernet (management) |
| |
|
|
Fiber network system |
| |
|
|
|
| |
Storage System |
| |
|
|
Hardware raid 5 |
| |
|
|
1.5TB storage space |
| |
|
|
|
| |
System Dimension |
| |
|
|
Portable 9U or 13U rack (No special cooling system required) |
| |
|
|
|
| |
EM Simulator |
| |
|
|
8 CPUs (16 or 32 cores) |
| |
|
|
|
| |
Cluster Tools |
| |
|
|
Cluster management software |
| |
|
|
Web based interface (Job management and result visualization) |
| |
|
|
|
| |
Operation System and Compilers |
| |
|
|
Linux |
| |
|
|
C/C++ and Fortran compilers |
| |
|
|
MPICH1 and MPICH2 |
| |
|
|
|
| |
Service |
| |
|
|
Total system solution (sale, software leasing and hardware rent) |
| |
|
|
System consulting |
| |
|
|
|
| |
Maximum Problem Size |
| |
|
|
One billion number of cells (unknowns) |
| Basic terminologies in parallel processing technique: |
| |
| • |
Parallel scalability: S = T1 / Tn
If there are n processors in a cluster, the parallel scalability of the cluster is defined as S = T1 / Tn, where T1 and Tn are the simulation time using one and n processors to simulate the same problem, respectively. |
| |
|
| • |
Parallel efficiency: E = S / n
If there are n processors in a cluster, the parallel efficiency of the cluster is defined as E = S / n, where S and n are the parallel scalability and number of processors in the cluster.
Example 1: There are 96 processors in the GEMS HPCC system with 192 GB memory, which can be used to simulate a problem size of 8 Billions unknowns. Peak performance of each processor in the GEMS HPCC system is 7.6 GFlops compared to 3.8 GFlops of Intel Pentium-4 3.0 GHz. GEMS parallel scalability on this cluster is 78 (parallel efficiency = 82 percent), namely, GEMS HPCC system equals to 78 independent processors that are working on one problem. In another words, GEMS HPCC system equals to 156 Intel Pentium-4 3.0GHz processors.
Example 2: There are 16 computational processors in the GEMS Box system with 32 GB memory (can be extended to 64), which can be used to simulate a problem size of 1 Billions unknowns. Peak performance of each processor in the GEMS Box is 6.9 GFlops compared to 3.8 GFlops of Intel Pentium-4 3.0 GHz. GEMS parallel scalability of this cluster is 13, namely, GEMS Box equals to 13 independent processors that are working on one problem. In another words, GEMS Box equals to 23 Intel Pentium-4 3.0GHz processors. |
| |
|
| • |
Peak performance
The peak performance of a cluster is defined as P = m x f, where m is number of multipliers and f is the frequency of processor. |
| |
|
| • |
Actual performance
The actual performance of a cluster is tested by using a standard code, for instance, Linpack, when all the memory is used for the calculation. The actual performance of a cluster depends on not only the way how to develop the application code but also processor and memory performance as well as network system. At this moment, the major bottleneck of the parallel processing is memory and network speed but not processor speed. Therefore, GPU, IBM CELL processor or 80-core Intel processor can be employed to speed up the small problem simulation, but they would not help at all for the large problem solving due to the memory speed. |
|
|
|
|
|
|