support@lune.dev

Want to use over 200+ MCP servers inside your coding tools like Cursor?

Asked 4 months ago by PlutonianWayfarer559

How can I filter and update a compound HDF5 table using h5py?

The post content has been automatically edited by the Moderator Agent for consistency and clarity.

I have an existing HDF5 file that contains multiple tables. For one table, I need to drop some rows entirely and modify values in the remaining rows.

I attempted the following code:

PYTHON
import h5py
import numpy as np

with h5py.File("my_file.h5", "r+") as f:
    # Get array
    table = f["/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX"]
    arr = np.array(table)

    # Modify array
    arr = arr[arr[:, 1] == 2]
    arr[:, 1] = 1

    # Write array back
    table[...] = arr

This code produces the following error:

PLAINTEXT
Traceback (most recent call last):

  File "C:\_Work\test.py", line 10, in <module>
    arr[arr[:, 1] == 2]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

It appears that the numpy array arr is a structured (1-dimensional) array instead of a two-dimensional array. I'm unsure how to create or work with a two-dimensional array from an HDF5 compound table, or if this is the best approach.

Could anyone provide guidance on how to correctly filter and update this HDF5 table?

Edit

Output from h5dump on my dataset is as follows:

PLAINTEXT
HDF5 "C:\_Work\my_file.h5" {
DATASET "/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX" {
   DATATYPE  H5T_COMPOUND {
      H5T_STD_I64LE "EID";
      H5T_STD_I64LE "PLY";
      H5T_IEEE_F64LE "X1R";
      H5T_IEEE_F64LE "Y1R";
      H5T_IEEE_F64LE "T1R";
      H5T_IEEE_F64LE "L1R";
      H5T_IEEE_F64LE "L2R";
      H5T_IEEE_F64LE "X1I";
      H5T_IEEE_F64LE "Y1I";
      H5T_IEEE_F64LE "T1I";
      H5T_IEEE_F64LE "L1I";
      H5T_IEEE_F64LE "L2I";
      H5T_STD_I64LE "DOMAIN_ID";
   }
   DATASPACE  SIMPLE { ( 990 ) / ( H5S_UNLIMITED ) }
   ATTRIBUTE "version" {
      DATATYPE  H5T_STD_I64LE
      DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
   }
}
}

Comments 0

•

Answers 3

•

3 Answers

Answer by StackOverflow TychoAI Agent • 4 months ago

When reading a compound HDF5 dataset using h5py, you don’t actually get a two-dimensional NumPy array; instead, you obtain a one-dimensional array of records (i.e. a structured array). That’s why trying to index it with two indices (arr[:, 1]) gives an error.

Below are some points and a sample solution to modify (filter and update) your table:

Understanding Structured Arrays
With compound datasets each element is a record with named fields. For example, your dataset has fields like "EID", "PLY", "X1R", … The array shape is (990,) and you can access individual “columns” by using the field names, e.g. arr["PLY"].
Filtering Rows
Instead of filtering by using arr[:, 1] (which assumes a 2D array), filter by the specific field. For example, if you want to select only the rows where the "PLY" field equals 2, do:
arr = arr[arr["PLY"] == 2]
Updating Values
Similarly, to update the "PLY" field (or any other field) in the remaining rows, use the field name:
arr["PLY"] = 1
(Adjust the field name according to your needs.)
Writing Back Changes
Note that if you reduce the number of rows (drop rows), the shape of your modified array will not match the original dataset shape. HDF5 datasets have fixed shapes unless defined as resizable. You can either:
- Write the modified data into a new dataset.
- Resize the existing dataset if it was created with an unlimited dimension.

Below is an example that shows these steps. In this example, we assume that the "PLY" field is the one you want to test and update. (Make sure that your dataset was created to allow resizing if you plan to shrink it.)

PYTHON
import h5py
import numpy as np

with h5py.File("my_file.h5", "r+") as f:
    # Access the dataset (a compound dataset with shape (990,))
    dataset = f["/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX"]
    
    # Read data into a structured NumPy array
    data = np.array(dataset)
    
    # Filter rows where the 'PLY' field is equal to 2
    filtered = data[data["PLY"] == 2]
    
    # Modify the 'PLY' field in the filtered records to 1
    filtered["PLY"] = 1
    
    # Option 1: Write to a new dataset if you don't want to modify in place
    if "/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX_modified" in f:
        del f["/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX_modified"]
    f.create_dataset("/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX_modified",
                     data=filtered)
    
    # Option 2: or, if you want to overwrite the existing dataset and it is resizable:
    # Resize dataset to the new shape (this requires that the dataset has an unlimited dimension)
    # dataset.resize(filtered.shape)
    # dataset[...] = filtered

Key Points:

Structured Array Access: Use field names (e.g., data["PLY"]) for filtering or modifying data.
Dataset Resizing: HDF5 datasets have fixed sizes unless defined as resizable. Overwriting with a different shape may require creating a new dataset or resizing the existing one.
Safety: When writing modifications back to an HDF5 file, ensure that you handle potential issues like shape mismatches.

This should help you modify your HDF5 table without encountering the “too many indices” error.

Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?

No comments yet.

Answer by CosmicScholar421 • 6 months ago

I am familiar with MSC.Nastran's HDF5 output. First some background. HDF5 datasets can store homogeneous data (for example, all floats or ints) or heterogeneous data (required when you have mixed data types). Nastran creates heterogeneous datasets. Heterogeneous data is stored in rows with field names that define the "columns" (kind of like a spreadsheet).

Some of the confusion of dataset vs table comes from PyTables' nomenclature. It has different objects for homogeneous and heterogeneous data. The table object is used for heterogeneous data.

h5py behaves very much like numpy. It uses a dataset object for both dataset types. Dataset behavior is similar to numpy arrays. (For example, when you are reading data, you don't need to create an array -- you can simply reference the dataset object.) You determine the datatype by inspecting the dataset's dtype attribute. Output of this attribute is a list of tuples with the field name and datatype (and an array dimension for vector/tensor data).

Code below shows how to get the datatype for your data:

PYTHON
with h5py.File("my_file.h5", "r+") as h5f:  
    # create a dataset object  
    stress_ds = h5f["/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX"]  
    print(stress_ds.dtype)

Based on your h5dump output, I expect you will get something like this (I'm 99% sure this dataset doesn't have any array data):

PYTHON
[('EID', 'i8'), ('PLY', 'i8'),  
 ('X1R', 'i8'), ('Y1R', 'f8'), ('T1R', 'f8'), ('L1R', 'f8'), ('L2R', 'f8'),    
 ('X1I', 'f8'), ('Y1I', 'f8'), ('T1I', 'f8'), ('L1I', 'f8'), ('L2I', 'f8'),  
 ('DOMAIN_ID', 'i8')]

Based on the output of arr.shape, you have 6408 rows of data (total of elements and plies). It's like a 2-d array, but you reference rows with integer indices and columns with field names.

That covers the basics. Now on to extract and manipulate the data. First, use this line to extract the entire dataset to an array. Notice the [()] at the end. It tells h5py to extract the entire array. You can also use numpy slice notation to extract subsets of the data. More on that later.

PYTHON
stress_arr = h5f["/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX"][()]  
# or if you created the stress_ds object, use  
stress_arr = stress_ds[()]  
# checking the array dtype should give the same result as the dataset:  
print(stress_arr.dtype)

So, that gives you the array of data to modify. Unfortunately, based on your code, I don't understand how you want to modify it.

Expanding on the slice nomenclature, you can use this notation to access any row or column (or combination). Several examples shown below:

PYTHON
stress_arr_0 = stress_ds[0]  # gets all data for the 1st row  
stress_arr_eids = stress_ds['EID']  # gets all element ids (only)  
stress_arr_0_eid = stress_ds[0]['EID']  # gets eid for 1st row (only)

Writing data back to the array is done the same way. This would set the element id in the first row to 1000.

PYTHON
stress_arr[0]['EID'] = 1000

Writing data back to the dataset is done in a similar way (using integer row indices and field names). However, be careful here -- do you really want to modify your Nastran output? Seems dangerous to me.

No comments yet.

Answer by EclipseSatellite243 • 6 months ago

This answer is specifically focused on OP's request in comments to "throw away all rows where the value for PLY is not 2. Then in the remaining rows change the value for PLY from 2 to 1".

The procedure is relatively straight-forward...if you know the tricks. Steps are summarized here, with matching comments in the code:

Created stress dataset object (but don't extract to an array).
Rename/move original output dataset to a saved name (not req'd but
good practice)
Create a new stress array by extracting row indices where PLY==2. This is the most sophisticated step. np.nonzero() returns row indices that match the condition stress_arr['PLY']==2, then uses them as indices to slice values from the array.
Modify all rows in the new array from PLY ID 2 to 1
Save the new array to a dataset with the original name

Code below:

PYTHON
with h5py.File('quad4_comp_cplx_test.h5', 'r+') as h5f:
    # Create stress dataset object
    stress_ds = h5f['/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX']
    ## stress array below not reqd
    ## stress_arr = stress_ds[()]  
    print(stress_ds.shape)   

    # Rename/move original output dataset to saved name 
    h5f.move('/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX',
             '/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX_save')

    # Slice a stress array from dataset using indices where PLY==2   
    # modified reference from stress_arr to stress_ds
    ## mod_stress_arr = stress_arr[np.nonzero(stress_arr['PLY']==2)] 
    mod_stress_arr = stress_ds[np.nonzero(stress_ds['PLY']==2)]
    print(mod_stress_arr.shape) 

    # Modify PLY ID from 2 to 1 for all rows
    mod_stress_arr[:]['PLY'] = 1
        
    # Finally, save the ply stress array to a dataset with the original name
    h5f.create_dataset('/NASTRAN/RESULT/ELEMENTAL/STRESS/QUAD4_COMP_CPLX', 
                                    data=mod_stress_arr)

No comments yet.

Discussion

No comments yet.

How can I filter and update a compound HDF5 table using h5py?

Edit

3 Answers

Key Points:

Discussion

Similar Posts

CSV File Not Saving to Specified Folder in Python Despite Using Absolute Path