MedeA HT-Descriptors: High-throughput Descriptor Generation and Exploitation


MedeA ®[1] HT-Descriptors is a tool within the MedeA Environment that can define, compute, exploit, and organize materials descriptors. For example, HT-Descriptors can identify and characterize layered compounds in a structural database, it can compute coordination geometries of selected atom types, and it provides tools to combine these experimental properties with computed data using mathematical expressions, thus generating sophisticated materials descriptors. These expressions are stored in catalogs which can be managed through the HT-Descriptors interface, ready for re-use on new structural datasets.

Key Benefits

  • Efficient and practical approach to solve materials problems involving properties that cannot be computed directly
  • Powerful tool to harvest descriptors from experimental crystal structure databases containing hundreds of thousands of compounds
  • Leveraging the capability of MedeA HT-Launchpad to generate systematic and coherent sets of fundamental materials property data
  • Ability to combine pre-calculated generic descriptors of entire structural databases (ICSD, Pearson, COD) with computed data
  • Infrastructure to catalog and manage protocols for the calculation of descriptors

Role of MedeA HT-Descriptors as link between MedeA HT-Launchpad and the exploitation of correlations.

Many important materials properties are difficult to compute directly. Catalytic activity, stress-corrosion cracking, and lubrication are illustrative examples. Descriptors offer a path to link these phenomena to properties that can be obtained directly from a structural analysis, or which can be readily computed, such as the binding energy in compounds, the electronic density of states, elastic moduli, and the viscosity of fluids. A critical part of such an approach is the ability to combine a wide range of basic descriptors, such as the presence of layers, the width of channels, the coordination geometry of cations, or electronegativity differences (“ionicity”), with computed properties, such as the bulk modulus, vacancy formation energy, and the Debye temperature. MedeA HT-Descriptors facilitates the definition and exploitation of such descriptors in a convenient and sophisticated way.

‘Descriptors constructed from basic materials properties are like keys opening very heavy doors.’

An Illustrative Example

MedeA HT-Descriptors uses structure lists to associate descriptors with each structure. The following example illustrates this concept. The question is finding layered compounds which contain sulfur, and are likely to have a large band gap. To this end, two descriptors are defined, namely a topological descriptor identifying layered compounds containing sulfur and a second descriptor related to the ionicity of the compound. The latter is expressed as the difference between the highest and lowest electronegativity of all atoms in each compound. Using the interface of the MedeA HT-Descriptors module, the descriptors are defined and stored in a catalog. Next, the descriptors are applied in a search of the ICSD and Pearson databases using MedeA InfoMaticA, resulting in 540 unique layered compounds containing sulfur. As expected, this list contains familiar compounds such as molybdenum sulfide. The results from this query are stored in a structure list for further investigation. Then, the electronegativity difference is computed for each of the 540 compounds, and they are sorted from highest to lowest value. In this particular example, the system with the highest ionicity is sodium dithionite, as illustrated in the adjacent figure.


Structure list showing the electronegativity range as descriptor of layered compounds containing sulfur. In this example, Na-dithionite has the highest electronegativity range (ionicity). A subsequent calculation using MedeA VASP confirms that this compound is an insulator with a large band gap.


MedeA HT-Descriptors includes the following basic descriptors that can be used as building blocks to define new descriptors, using standard mathematical operators:

  • Topological descriptors: cages, channels, layers including their dimensions, e.g. smallest channel diameter
  • Coordination of atoms, including coordination number, type of coordination, e.g. octahedral, tetrahedral, square planar; deviation from ideal geometry; type of atoms in nearest neighbor coordination shell, distance to nearest neighbors, dispersion of distances; confidence in the coordination type defined by the separation between first and second nearest neighbor shells
  • Choice between covalent and ionic radii for the determination of topologial properties and coordination geometry
  • Atomic properties, including atomic number, atomic mass, electronegativity, and valence
  • Use of computed properties as arguments in defining descriptors
  • Recursive definition: a new descriptor can contain other descriptors as arguments
  • Tools for creating and managing catalogs of descriptors to be applied to new structure lists

MedeA HT-Descriptors applied to the structural databases of InfoMaticA relies on pre-computed properties, such as topological features (cages, channels, layers) since the computational analysis of hundreds of thousands of compounds contained in these databases requires significant computational effort. Thus, databases with these properties are delivered with the MedeA releases.

When operating on structure lists rather than full structural databases, MedeA HT-Descriptors computes these properties on the fly, thus giving the user great flexibility to include, for example, properties computed with MedeA VASP and MedeA LAMMPS in the expressions for descriptors. Structure lists are usually applied in the context of specific projects where the focus is on certain classes of compounds, rather than on all known structures stored in databases like ICSD and Pearson.

In summary, MedeA HT-Descriptors is a unique and extremely powerful tool for tackling complex materials problems by combining experimental structural data with computed properties obtained from the wealth of methods available in the MedeA environment.

Required Modules

  • MedeA Environment
  • MedeA HT-Launchpad