Science and Technology Cross-cooperation to tackle key problems, for the first time to achieve 10 million-core parallel first-principle computational simulation in Chinese homegrown supercomputer Sunway TaihuLight


Recently, USTC aiming at the first-principles calculation and simulation of large-scale tens of thousands of atomic and molecular solid systems, based on the low-scale plane wave high-precision calculation software DGDFT, has realized 10-million-core ultra-large-scale parallel computing on the domestic Sunway TaihuLight supercomputer. The research results were published online in Science Bulletin (Science Bulletin under the title High performance computing of DGDFT for tens of thousands of atoms using millions of cores on Sunway TaihuLight), and the magazine China Science commented on it. The journal is a comprehensive academic journal of natural science, which is in charge of the Chinese Academy of Sciences and sponsored by the Chinese Academy of Sciences and the National Natural Science Foundation of China. it mainly publishes the latest research achievements of innovative, high-level and important significance in the basic theory and applied research of various disciplines of natural science.

paper page of Science Bulletin


This work was jointly tackled by the Hong An Research Group of SCST, Hefei National Laboratory for physical scienceat the microscale and the Jinlong Yang Research Group of the School of Chemistry and Materials Sciences. It was completed with the close cooperation of the researchers of Wuxi Supercomputing Center and the Software Institute of the Chinese Academy of Sciences.


Sunway TaihuLight supercomputer system is the first supercomputer system with theoretical floating-point computing capacity of one billion orders of magnitude in China and the world. Compared with the commercial multi-core processors of the same era, Sunway many-core processors are good at handling regular and parallel computing-intensive tasks, with larger-scale multi-level parallel computing units and unique on-chip memory architecture. parallel algorithm design and performance optimization on it are faced with many challenges, so there is an urgent need to develop algorithm design and optimization implementation methods driven by major application problems.


The DGDFT (Discontinuous Galerkin Density Functional Theory) method uses the adaptive local basis function (Adaptive Local Basis, ALB) generated on-the-fly during the Self-Consistent Field, SCF) iteration to solve the KS (Kohn-Sham) equation, which is comparable to the high precision calculation result of the plane wave basis set. The ultra-large-scale high-performance DFT simulation results on Sunway TaihuLight show that the DGDFT method can scale to 8519680 computing cores (131072 core groups) in parallel on Sunway TaihuLight supercomputer. It can be used to study the electronic structure properties of two-dimensional metallic graphene system containing tens of thousands of carbon atoms (11520 carbon atoms).

DGDFT’s ALB basis set, block tridiagonal Hamiltonian matrix, flow chart, Sunway master-slave parallel acceleration


With the rapid development of supercomputer and high performance computing technology, the first-principles computational simulation based on Kohn-Sham density functional theory (KS-DFT) becomes more and more important in condensed matter physics, material science, chemistry and biology. With the rapid development of domestic supercomputers, it is necessary to develop corresponding theoretical algorithms and ultra-large-scale parallel computing software, so as to give full play to the powerful computing power of these supercomputers and study larger-scale physical and chemical problems. This time, led by Professor Hong An of our college, through the supercomputing application team, software transplantation and performance optimization team, working closely with the basic algorithm library development team and the hardware technical support team of the National Supercomputing Center, combine the low-scale theoretical algorithms of our school and computational chemistry with the advantages of domestic high-performance parallel computing software and hardware. It gives full play to the powerful computing power of the domestic Sunway TaihuLight supercomputer. A parallel computing method with low scale, low communication, low memory and low memory access is developed, and a 10-million-core ultra-large-scale high-performance parallel computing with plane wave accuracy is realized. At the same time, the size of the simulation system (tens of thousands of atoms) is hundreds of times higher than that of the international simulation software with the same plane wave accuracy. This result shows that with the help of the most advanced calculation methods and the world's top high-performance computing platform, large-system, long-time high-precision first-principles material simulation has become a reality.


The advanced computer system and high-performance computing team led by Professor Hong An of our college have been deeply engaged in the field of high-performance computing chips and systems for a long time, and have accumulated a great deal of experience in the development and optimization of supercomputing application software. In 2019, they took the lead in organizing a number of major E-level application cooperation projects on the campus. Through close cooperation with the advantage teams of internal science and engineering computing, these projects have made important research progress on Sunway supercomputer. It initially shows the unique advantages of deep cross-cooperation of many first-class disciplines of the USTC in the development of domestic science and engineering computing software and solving major scientific and engineering problems.