Luca Ghiringhelli
(Fritz Haber Institute of the Max Planck Society)
Big Data of Materials Science – Critical Role of the Descriptor
Statistical learning of materials properties or functions so far starts with a largely silent, non-challenged step: the introduction of a descriptor. However, when the scientific connection between the descriptor and the actuating mechanisms is unclear, causality of the learned descriptor-property relation is uncertain.
Thus, trustful prediction of new promising materials, identification of anomalies, and scientific advancement are doubtful.
For many, maybe most, material functions, the “cause → property/function” relation is complex and indirect. Let us label the “cause” by a multi-dimensional descriptor d, which is initially unknown. The property/function is a number P (e.g. the thermoelectric figure of merit of a material). Obviously, the nuclear numbers and stoichiometry uniquely identify the many-body Hamiltonian and its results. However, in order to establish a d → P mapping, the question is: What is the (microscopic) mechanism behind the desired quantity. In other words, what is the best descriptor d? At the same time, we like to request, that d should not require an involved computation. It should relate to simple material properties, or even properties of the involved atoms, e.g. energy levels and wave functions. Only then, the P(d) relation can serve the above-mentioned wanted purpose.
We analyze this issue and define requirements for a suited descriptor. For a classical example, the energy difference of zincblende/wurtzite and rocksalt semiconductors, we demonstrate how a meaningful descriptor can be found systematically.