Xdmf2 Model and Format Archive

From XdmfWeb
Jump to navigationJump to search

XdmfLogo1.gif

The need for a standardized method to exchange scientific data between High Performance Computing codes and tools lead to the development of the eXtensible Data Model and Format (XDMF) . Uses for XDMF range from a standard format used by HPC codes to take advantage of widely used visualization programs like ParaView and EnSight, to a mechanism for performing coupled calculations using multiple, previously stand alone codes.

XDMF categorizes data by two main attributes; size and function. Data can be Light (typically less than about a thousand values) or Heavy (megabytes, terabytes, etc.). In addition to raw values, data can refer to Format (rank and dimensions of an array) or Model (how that data is to be used. i.e. XYZ coordinates vs. Vector components).

XDMF uses XML to store Light data and to describe the data Model. Either HDF5 or binary files can be used to store Heavy data. The data Format is stored redundantly in both XML and HDF5. This allows tools to parse XML to determine the resources that will be required to access the Heavy data. For the binary Heavy data option, the xml must list a filename where the binary data is stored.

While not required, a C++ API is provided to read and write XDMF data. This API has also been wrapped so it is available from popular languages like Python, Tcl, and Java. The API is not necessary in order to produce or consume XDMF data. Currently several HPC codes that already produced HDF5 data, use native text output to produce the XML necessary for valid XDMF.

XML

The eXtensible Markup Language (XML) format is widely used for many purposes and is well documented at many sites. There are numerous open source parsers available for XML. The XDMF API takes advantage of the libxml2 parser to provide the necessary functionality. Without going into too much detail, XDMF views XML as a "personalized HTML" with some special rules. It it case sensitive and is made of three major components : elements, entities, and processing information. In XDMF we're primarily concerned with the elements. These elements follow the basic form :

<ElementTag
  AttributeName="AttributeValue"
  AttributeName="AttributeValue"
  ... >
  CData
</ElementTag>


Each element begins with a <tag> and ends with a </tag>. Optionally there can be several "Name=Value" pairs which convey additional information. Between the <tag> and the </tag> there can be other <tag></tag> pairs and/or character data (CData). CData is typically where the values are stored; like the actual text in an HTML document. The XML parser in the XDMF API parses the XML file and builds a tree structure in memory to describe its contents. This tree can be queried, modified, and then "serialized" back into XML.

Comments in XML start with a "<!--" and end with a "-->". So <!--This is a Comment -->.

XML is said to be "well formed" if it is syntactically correct. That means all of the quotes match, all elements have end elements, etc. XML is said to be "valid" if it conforms to the Schema or DTD defined at the head of the document. For example, the schema might specify that element type A can contain element B but not element C. Verifying that the provided XML is well formed and/or valid are functions typically performed by the XML parser. Additionally XDMF takes advantage of two major extensions to XML :

XInclude

As opposed to entity references in XML (see below), XInclude allows for the inclusion of files that are not well formed XML. This means that with XInclude the included file could be well formed XML or perhaps a flat text file of values. The syntax looks like this :


<Xdmf Version="2.0" xmlns:xi="[http://www.w3.org/2001/XInclude]">
<xi:include href="Example3.xmf"/>
</Xdmf>


the xmlns:xi establishes a namespace xi. Then anywhere within the Xdmf element, xi:include will pull in the URL.

XPath

This allows for elements in the XML document and the API to reference specific elements in a document. For example :

The first Grid in the first Domain

/Xdmf/Domain/Grid

The tenth Grid .... XPath is one based.

/Xdmf/Domain/Grid[10]

The first grid with an attribute Name which has a value of "Copper Plate"

/Xdmf/Domain/Grid[@Name="Copper Plate"]

All valid XDMF must appear between the <Xdmf> and the </Xdmf> tags. So a minimal (empty) XDMF XML file would be :

<?xml version="1.0" ?>
<!DOCTYPE Xdmf SYSTEM "Xdmf.dtd" []>
<Xdmf Version="2.0">
</Xdmf>

While there exists an Xdmf DTD and a Schema they are only necessary for validating parsers. For performance reasons, validation is typically disabled.

Entities

In addition to Xinclude and XPath, which allow for references to data outside the actual XMDF, XML's basic substitution mechanism of entities can be used to render the XDMF document more readable. For instance, once an entity alias has been defined in the header via

<?xml version="1.0" ?>
<!DOCTYPE Xdmf SYSTEM "Xdmf.dtd" [
<!ENTITY cellDimsZYX "45 30 120">
]>

the text in double quotes is substituted for the entity reference &cellDimsZXY; (note the trailing semicolon) whenever the parser encounters the latter in the remaining part of the document.

XDMF Elements

The organization of XDMF begins with the Xdmf element. So that parsers can distinguish from previous versions of XDMF, there exists a Version attribute (currently at 2.0). Any element in XDMF can have a Name attribute or have a Reference attribute. The Name attribute becomes important for grids while the Reference attribute is used to take advantage of the XPath facility (more detail on this later). Xdmf elements contain one or more Domain elements (computational domain). There is seldom motivation to have more than one Domain.

A Domain can have one or more Grid elements. Each Grid contains a Topology, Geometry, and zero or more Attribute elements. Topology specifies the connectivity of the grid while Geometry specifies the location of the grid nodes. Attribute elements are used to specify values such as scalars and vectors that are located at the node, edge, face, cell center, or grid center.

To specify actual values for connectivity, geometry, or attributes, XDMF defines a DataItem element. A DataItem can provide the actual values or provide the physical storage (which is typically an HDF5 file).

XdmfItem

There are six different types of DataItems :

  1. Uniform - this is the default. A single array of values.
  2. Subset - A variation of what was originally Hyperslab in Xdmf2. The Attributes of the tag specify a selection over the child array.
  3. Function - calculates an expression.

Uniform

The simplest type is Uniform that specifies a single array. As with all XDMF elements, there are reasonable defaults wherever possible. So the simplest DataItem would be :

<DataItem Dimensions="3">
    1.0 2.0 3.0
</DataItem>

The default Format is XML and the default NumberType is a 32 bit floating point value. So the fully qualified DataItem for the same data would be :

<DataItem
  Format="XML"
  NumberType="Float" Precision="4"
  Rank="1" Dimensions="3">
  1.0 2.0 3.0
</DataItem>

Since it is only practical to store a small amount of data values in the XML, production codes typically write their data to HDF5 and specify the location in XML. HDF5 is a hierarchical, self describing data format. So an application can open an HDF5 file without any prior knowledge of the data and determine the dimensions and number type of all the arrays stored in the file. XDMF requires that this information also be stored redundantly in the XML so that applications need not have access to the actual heavy data in order to determine storage requirements.

For example, suppose an application stored a three dimensional array of pressure values at each iteration into an HDF5 file. The XML might be :

<DataItem
  Format="HDF"
  NumberType="Float" Precision="8"
  Dimensions="64 128 256">
  OutputData.h5:/Results/Iteration 100/Part 2/Pressure
</DataItem>

Alternatively, an application may store its data in binary files. In this case the XML might be.

<DataItem ItemType="Uniform"
  Format="Binary"
  Dimensions="64 128 256">
  PressureFile.bin
</DataItem>

Dimensions are specified with the slowest varying dimension first (i.e. KJI order). The HDF filename can be fully qualified, if it is not it is assumed to be located in the current directory or the same directory as the XML file.

Subset

A Subset specifies a subset of some other DataItem. The slab is specified by giving the start, stride, and count of the values in each of the target DataItem dimensions.

<Subset ConstructedType="DataItem" 
  DataType="UInt"
  Dimensions="8" 
  Format="XML"
  Precision="4"
  SubsetDimensions="2 2 2"
  SubsetStarts="0 0 0"
  SubsetStrides="2 2 2">
  <DataItem DataType="Int"
    Dimensions="1"
    Format="XML"
    Precision="4">
      0
  </DataItem>
  <DataItem DataType="UInt"
    Dimensions="3 3 3"
    Format="XML"
    Precision="4">
      10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
  <nowiki</DataItem></nowiki>
</Subset>

The first DataItem child is what the subset is initially set to before its associated heavy data is read in.

Function

Function ItemType specifies some operation on the children DataItem elements. The children ar assigned variables based on the VariableNames attribute. The first child is what the function is initialized to before reading in the operation thus it is not assigned a variable. For example, the following DataItem would add the two children DataItem elements together in a value by value operation resulting in the values 5.1, 7.2 and 9.3 :

<Function
  ConstructedType="DataItem"
  DataType="Float"
  Dimensions="3"
  Expression="A+B"
  Format="XML"
  Precision="4"
  VariableNames="|A|B">
  <DataItem Dimensions="3"
    DataType="Float"
    Format="XML"
    Precision="4">
    1.0 2.0 3.0
  </DataItem>
  <DataItem Dimensions="3"
    DataType="Float"
    Format="XML"
    Precision="4">
    4.1 5.2 6.3
  </DataItem>
</DataItem>

The ConstructedType attribute allows function to produce Geometries and Topologies in addtion to standard Arrays. It inherits the attributes of the array subtype that it would produce.

The function description can be arbitrarily complex and contain SIN, COS, TAN, ACOS, ASIN, ATAN, LOG, EXP, ABS, and SQRT. It allows for the old concatenate JOIN in addition to the following operators: -,+,/,*,|(concatenate),#(interlace)

Add the value 10 to every element

<Function
  ConstructedType="DataItem"
  DataType="Float"
  Dimensions="3"
  Expression="10+A"
  Format="XML"
  Precision="4"
  VariableNames="|A">
  <DataItem Dimensions="3"
    DataType="Float"
    Format="HDF"
    Precision="4">
    data.h5:Data0
  </DataItem>
</DataItem>

Multiply two arrays (element by element) and take the absolute value

<Function
  ConstructedType="DataItem"
  DataType="Float"
  Dimensions="3"
  Expression="ABS(A*B)"
  Format="XML"
  Precision="4"
  VariableNames="|A|B">
  <DataItem Dimensions="3"
    DataType="Float"
    Format="HDF"
    Precision="4">
    data.h5:Data0
  </DataItem>
  <DataItem Dimensions="3"
    DataType="Float"
    Format="HDF"
    Precision="4">
    data.h5:Data1
  </DataItem>
</DataItem>

Concatenate two arrays

<Function
  ConstructedType="DataItem"
  DataType="Float"
  Dimensions="3"
  Expression="A|B"
  Format="XML"
  Precision="4"
  VariableNames="|A|B">
  <DataItem Dimensions="3"
    DataType="Float"
    Format="HDF"
    Precision="4">
    data.h5:Data0
  </DataItem>
  <DataItem Dimensions="3"
    DataType="Float"
    Format="HDF"
    Precision="4">
    data.h5:Data1
  </DataItem>
</DataItem>

Interlace 3 arrays (Useful for describing vectors from scalar data)

<Function
  ConstructedType="DataItem"
  DataType="Float"
  Dimensions="3"
  Expression="A#B#C"
  Format="XML"
  Precision="4"
  VariableNames="|A|B|C">
  <DataItem Dimensions="3"
    DataType="Float"
    Format="HDF"
    Precision="4">
    data.h5:Data0
  </DataItem>
  <DataItem Dimensions="3"
    DataType="Float"
    Format="HDF"
    Precision="4">
    data.h5:Data1
  </DataItem>
  <DataItem Dimensions="3"
    DataType="Float"
    Format="HDF"
    Precision="4">
    data.h5:Data2
  </DataItem>
</DataItem>

The old Xdmf2 style of functions is still supported in most cases.

Grid

The DataItem element is used to define the data format portion of XDMF. It is sufficient to specify fairly complex data structures in a portable manner. The data model portion of XDMF begins with the Grid element. A Grid is a container for information related to 2D and 3D points, structured or unstructured connectivity, and assigned values.

A Grid can also consist of a collection of child grids arranged either temporally or spatially. This is known as a Grid Collection and is specified by setting the GridType attribute to "Collection"

   <Grid
     CollectionType="Spatial"
     GridType="Collection"
     Name="Collection">
     <Grid Name="Grid0">
       <Geometry Origin="" Type="ORIGIN_DXDYDZ">
         <DataItem
           DataType="Float"
           Dimensions="3"
           Format="XML"
           Precision="8">
           1 1 0
         </DataItem>
         <DataItem
           DataType="Float"
           Dimensions="3"
           Format="XML"
           Precision="8">
           1 1 1
       </DataItem>
       </Geometry>
       <Topology Dimensions="10 10 10" Type="3DCoRectMesh"/>
     </Grid>
     <Grid Name="Grid1">
       <Geometry Origin="" Type="ORIGIN_DXDYDZ">
         <DataItem
           DataType="Float"
           Dimensions="3"
           Format="XML"
           Precision="8">
           1 1 1
         </DataItem>
         <DataItem
           DataType="Float"
           Dimensions="3"
           Format="XML"
           Precision="8">
           1 1 1
       </DataItem>
       </Geometry>
       <Topology Dimensions="10 10 10" Type="3DCoRectMesh"/>
     </Grid>
   </Grid>

Topology

The Topology element describes the general organization of the data. This is the part of the computational grid that is invariant with rotation, translation, and scale. For structured grids, the connectivity is implicit. For unstructured grids, if the connectivity differs from the standard, an Order may be specified. Currently, the following Topology cell types are defined :

Linear

  • Polyvertex - a group of unconnected points
  • Polyline - a group of line segments
  • Polygon
  • Triangle
  • Quadrilateral
  • Tetrahedron
  • Pyramid
  • Wedge
  • Hexahedron

Quadratic

  • Edge_3 - Quadratic line with 3 nodes
  • Tri_6
  • Quad_8
  • Tet_10
  • Pyramid_13
  • Wedge_15
  • Hex_20

Arbitrary

  • Mixed - a mixture of unstructured cells

Structured

  • 2DSMesh - Curvilinear
  • 2DRectMesh - Axis are perpendicular
  • 2DCoRectMesh - Axis are perpendicular and spacing is constant
  • 3DSMesh
  • 3DRectMesh
  • 3DCoRectMesh

There is a NodesPerElement attribute for the cell types where it is not implicit. For example, to define a group of Octagons, set TopologyType="Polygon" and NodesPerElement="8". For structured grid topologies, the connectivity is implicit. For unstructured topologies the Topology element must contain a DataItem that defines the connectivity :

<Topology TopologyType="Quadrilateral">
  <DataItem Format="XML" DataType="Int" Dimensions="2 4">
    0 1 2 3
    1 6 7 2
  </DataItem>
</Topology>

The connectivity defines the indexes into the XYZ geometry that define the cell. In this example, the two quads share an edge defined by the line from node 1 to node 2.

Mixed topologies must define the cell type of every element. If that cell type does not have an implicit number of nodes, that must also be specified. In this example, we define a topology of three cells consisting of a Tet (cell type 6) a Polygon (cell type 3) and a Hex (cell type 9) :

<Topology TopologyType="Mixed">
  <DataItem Format="XML" DataType="Int" Dimensions="20">
    6        0 1 2 7
    3   4   4 5 6 7
    9        8 9 10 11 12 13 14 15
  </DataItem>
</Topology> 

Notice that the Polygon must define the number of nodes (4) before its connectivity. The cell type numbers are defined in the API documentation.

Geometry

The Geometry element describes the XYZ values of the mesh. The important attribute here is the organization of the points. The default is XYZ; an X,Y, and Z for each point starting at parametric index 0. Possible organizations are :

  • XYZ - Interlaced locations
  • XY - Z is set to 0.0
  • X_Y_Z - X,Y, and Z are separate arrays
  • VXVYVZ - Three arrays, one for each axis
  • ORIGIN_DXDYDZ - Six Values : Ox,Oy,Oz + Dx,Dy,Dz
  • ORIGIN_DXDY - Four Values : Ox,Oy + Dx,Dy


The following Geometry element defines 8 points :

<Geometry GeometryType="XYZ">
  <DataItem Format="XML" Dimensions="2 4 3">
    0.0    0.0    0.0
    1.0    0.0    0.0
    1.0    1.0    0.0
    0.0    1.0    0.0 
    0.0    0.0    2.0
    1.0    0.0    2.0
    1.0    1.0    2.0
    0.0    1.0    2.0
  </DataItem>
</Geometry>

Together with the Grid and Topology element we now have enough to define a full XDMF XML file that defines two quadrilaterals that share an edge (notice not all points are used):

<?xml version="1.0" ?>
<!DOCTYPE Xdmf SYSTEM "Xdmf.dtd" []>
<Xdmf Version="2.0" xmlns:xi="[http://www.w3.org/2001/XInclude]">
<Domain>
<Grid Name="Two Quads>
<Topology TopologyType="Quadrilateral" NumberOfElements="2" >
<DataItem Format="XML" 
DataType="Int"
Dimensions="2 4">
0 1 2 3
1 6 7 2
</DataItem>
</Topology>
<Geometry GeometryType="XYZ">
<DataItem Format="XML" Dimensions="2 4 3">
0.0    0.0    0.0
1.0    0.0    0.0
1.0    1.0    0.0
0.0    1.0    0.0
0.0    0.0    2.0
1.0    0.0    2.0
1.0    1.0    2.0
0.0    1.0    2.0
</DataItem>
</Geometry>
</Grid>
</Domain>
</Xdmf>

It is valid to have DataItem elements to be direct children of the Xdmf or Domain elements. This could be useful if several Grids share the same Geometry but have separate Topology :

<?xml version="1.0" ?>
<!DOCTYPE Xdmf SYSTEM "Xdmf.dtd" []>
<Xdmf Version="2.0" xmlns:xi="[http://www.w3.org/2001/XInclude]">
<Domain>
<DataItem Name="Point Data" Format="XML" Dimensions="2 4 3">
0.0    0.0    0.0
1.0    0.0    0.0
1.0    1.0    0.0
0.0    1.0    0.0
0.0    0.0    2.0
1.0    0.0    2.0
1.0    1.0    2.0
0.0    1.0    2.0
</DataItem>
<Grid Name="Two Quads>
<Topology Type="Quadrilateral" NumberOfElements="2" >
<DataItem Format="XML" 
DataType="Int"
Dimensions="2 4">
0 1 2 3
1 6 7 2
</DataItem>
</Topology>
<Geometry Type="XYZ">
<DataItem Reference="XML">
/Xdmf/Domain/DataItem[@Name="Point Data"]
</DataItem>
</Geometry>
</Grid>
</Domain>
</Xdmf>

Attribute

The Attribute element defines values associated with the mesh. Currently the supported types of values are :

  • Scalar
  • Vector
  • Tensor - 9 values expected
  • Tensor6 - a symmetrical tensor
  • Matrix - an arbitrary NxM matrix

These values can be centered on :

  • Node
  • Edge
  • Face
  • Cell
  • Grid

A Grid centered Attribute might be something like "Material Type" where the value is constant everywhere in the grid. Edge and Face centered values are defined, but do not map well to many visualization systems. Typically Attributes are assigned on the Node :

<Attribute Name="Node Values" Center="Node">
<DataItem Format="XML" Dimensions="6 4">
100 200 300 400
500 600 600 700
800 900 1000 1100
1200 1300 1400 1500
1600 1700 1800 1900
2000 2100 2200 2300
</DataItem>
</Attribute>

Or assigned to the cell centers :

<Attribute Name="Cell Values" Center="Cell">
<DataItem Format="XML" Dimensions="3">
 3000 2000 1000
 </DataItem>
</Attribute>

Time

The Time element is a child of the Grid element and specifies the temporal information for the grid. It contains a single floating point value.

So in the simplest form, specifying a single time :

<Time Value="0.1" />

Information

There is regularly code or system specific information that needs to be stored with the data that does not map to the current data model. There is an Information element. This is intended for application specific information that can be ignored. A good example might be the bounds of a grid for use in visualization. Information elements have a Name and Value attribute. If Value is nonexistent the value is in the CDATA of the element :

<Information Name="XBounds" Value="0.0 10.0"/>
<Information Name="Bounds"> 0.0 10.0 100.0 110.0 200.0 210.0 </Information>

Several items can be addressed using the Information element like time, units, descriptions, etc. without polluting the XDMF schema. If some of these get used extensively they may be promoted to XDMF elements in the future.

XML Element (Xdmf ClassName) and Default XML Attributes

  • Attribute (XdmfAttribute)
 Name            (no default)
 AttributeType   Scalar | Vector | Tensor | Tensor6 | Matrix | GlobalID
 Center          Node | Cell | Grid | Face | Edge
 
  • DataItem (XdmfDataItem)
 Name            (no default)
 ItemType        Uniform | Collection | tree | HyperSlab | coordinates | Function
 Dimensions      (no default) in KJI Order
 NumberType      Float | Int | UInt | Char | UChar
 Precision       1 | 2 (Int or UInt only) |4 | 8
 Format          XML | HDF | Binary
 Endian          Native | Big | Little (applicable only to Binary format)
 Compression     Raw|Zlib|BZip2 (applicable only to Binary format and depend on xdmf configuration)
 Seek            0 (number of bytes to skip, applicable only to Binary format with Raw compression)
  • Subset (XdmfSubset)
 As Parent and:
 SubsetDimensions (no default) in KJI Order
 SubsetStarts     (no default) in KJI Order
 SubsetStrides    (no default) in KJI Order
  • Function (XdmfFunction)
 As Parent and:
 Expression       (no default)
 VariableName     "|" (no variables specified)
  • Domain (XdmfDomain)
 Name            (no default)
  • Geometry (XdmfGeometry)
 GeometryType     XYZ | XY | X_Y_Z | VxVyVz | Origin_DxDyDz | Origin_DxDy
  • Grid (XdmfGrid)
 Name             (no default)
 GridType         Uniform | Collection
 CollectionType   Spatial | Temporal (Only Meaningful if GridType="Collection")
  • Information (XdmfInformation)
 Name              (no default)
 Value             (no default)
  • Xdmf (XdmfRoot)
 Version            Current Version | *
  • Topology (XdmfTopology)
 Name               (no default)
 TopologyType       Polyvertex | Polyline | Polygon |
                    Triangle | Quadrilateral | Tetrahedron | Pyramid| Wedge | Hexahedron |
                    Edge_3 | Triagle_6 | Quadrilateral_8 | Tetrahedron_10 | Pyramid_13 |
                    Wedge_15 | Hexahedron_20 |
                    Mixed |
                    2DSMesh | 2DRectMesh | 2DCoRectMesh |
                    3DSMesh | 3DRectMesh | 3DCoRectMesh
 NodesPerElement    (no default) Only Important for Polyvertex, Polygon and Polyline
 Dimensions         (no default)
 Order              each cell type has its own default
 BaseOffset         0 | #
  • Time
Value               (no default - Only valid for TimeType="Single")