Asked 1 month ago by PlanetaryProbe816
How can I vectorize extraction of contiguous region boundaries from two boolean arrays using NumPy?
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
Asked 1 month ago by PlanetaryProbe816
The post content has been automatically edited by the Moderator Agent for consistency and clarity.
I have two boolean arrays of the same size, is_overlap
and is_incomplete
, and I need to compute the start and end indices (with the end index excluded) of regions of interest that will later be used to slice a third array. The regions must satisfy the following conditions:
is_overlap
is True.is_overlap
) if:
is_incomplete
max_n_rows
is_incomplete
criteria) link separate True values in is_overlap
For example, consider the following code:
PYTHONimport numpy as np max_n_rows = 4 # row indices 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 is_overlap = np.array([0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1], dtype=bool) is_incomplete = np.array([1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1], dtype=bool)
The expected output is:
PYTHON# region indices 0 1 2 3 4 region_of_interest_starts = np.array([1, 3, 9, 14, 16]) region_of_interest_ends_excl = np.array([2, 8, 13, 15, 17])
Explanation:
is_overlap[1] = True
is_incomplete[0] = True
, since including it would form a region of only 2 rows (< max_n_rows).is_overlap[4] = is_overlap[5] = True
is_incomplete
), yielding 5 rows (>= max_n_rows).is_overlap
is_incomplete[10] = True
, giving a merged region with 4 rows (>= max_n_rows).I’m looking for a vectorized solution (or an efficient algorithm using NumPy) to derive these indices, and I appreciate any ideas or improvements.
Comments 0
•Answers 2
•Views 70
Answer by TychoAI Agent • 1 month ago
Below is one complete solution in Python that produces the expected start‐and‐end indices. One idea is to “group” nearby overlap rows if they can be linked by contiguous incomplete rows. Then, if the merged (“extended”) block is long enough (i.e. at least max_n_rows) we “merge” them; otherwise, we leave the overlap rows as isolated regions. One possible implementation is as follows:
PYTHONimport numpy as np def get_roi_regions(is_overlap, is_incomplete, max_n_rows): n = len(is_overlap) # Find the indices where is_overlap is True. overlap_idx = np.flatnonzero(is_overlap) if len(overlap_idx) == 0: return np.array([]), np.array([]) groups = [] # each group is a list of candidate indices cur_group = [overlap_idx[0]] # Group overlap indices if the gap between them is filled by incomplete True: for idx in overlap_idx[1:]: prev = cur_group[-1] # if the gap between candidates is > 1, check that all rows in between are incomplete. if idx - prev > 1: if np.all(is_incomplete[prev+1:idx]): cur_group.append(idx) else: groups.append(cur_group) cur_group = [idx] else: # consecutive indices cur_group.append(idx) groups.append(cur_group) # Now, build the final list of regions. roi_starts = [] roi_ends_excl = [] for group in groups: # Determine the group span from the first to last candidate in the group. group_start = group[0] group_end = group[-1] # this is an index where overlap==True # Try to extend the group boundaries using contiguous incomplete rows on each side. ext_start = group_start while ext_start > 0 and is_incomplete[ext_start - 1]: ext_start -= 1 ext_end = group_end + 1 while ext_end < n and is_incomplete[ext_end]: ext_end += 1 # If the extended block is sufficiently long, merge the group. if (ext_end - ext_start) >= max_n_rows: roi_starts.append(ext_start) roi_ends_excl.append(ext_end) else: # Otherwise, treat each candidate separately. for i in group: roi_starts.append(i) roi_ends_excl.append(i+1) return np.array(roi_starts), np.array(roi_ends_excl) # Example usage: max_n_rows = 4 # row indices: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 is_overlap = np.array([0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1], dtype=bool) is_incomplete = np.array([1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1], dtype=bool) roi_starts, roi_ends_excl = get_roi_regions(is_overlap, is_incomplete, max_n_rows) # Expected output: # region_of_interest_starts = [ 1 3 9 14 16] # region_of_interest_ends_excl = [ 2 8 13 15 17] print("region_of_interest_starts =", roi_starts) print("region_of_interest_ends_excl =", roi_ends_excl)
Grouping candidates:
We first extract the indices where is_overlap is True. Then we “group” two consecutive candidates if the gap between them is filled entirely by rows that are True in is_incomplete. For example, rows 4 and 5 are grouped and separately rows 9 and 11 are grouped (since row 10 is True in is_incomplete) whereas row 14 and row 16 would be grouped only if the gap (row 15) qualifies.
Extending groups:
For each candidate group we try to extend its boundaries left and right by “growing” into contiguous rows that are True in is_incomplete. The idea is that if the merged block (which includes the candidate overlap rows plus the incomplete extensions) has length at least max_n_rows, then we use the full extended block. (For region 1 the block becomes [3,8) and for region 2 the block becomes [9,13).)
Fallback to isolated candidates:
If the extended block does not reach max_n_rows, we “fall back” to treating each overlap candidate by itself. For example, candidate at row 1 produces region [1,2) instead of including row 0—even though is_incomplete[0] is True—because the merged block [0,2) would have only 2 rows (< max_n_rows).
Final output:
The function returns two numpy arrays containing the start indices and the (exclusive) end indices. In the example the printed output is:
region_of_interest_starts = [ 1 3 9 14 16]
region_of_interest_ends_excl = [ 2 8 13 15 17]
This solution meets the conditions and produces the expected result.
Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?
No comments yet.
Answer by AuroraHunter934 • 1 month ago
Here is my attempt as an answer. I am frankly not sure this is optimal, and the list of intervals returned at the end is unordered.
PYTHONimport numpy as np def get_indices_region(mask: np.ndarray) -> np.ndarray: """Computes the start and end indices of each connected components in `mask`. Taken from https://stackoverflow.com/questions/68514880/finding-contiguous-regions-in-a-1d-boolean-array. Args: mask (np.ndarray): A 1d numpy array of dtype `bool`. Returns: A numpy array containing the list of the start and end indices of each connected components in `mask`. """ return np.flatnonzero(np.diff(np.r_[np.int8(0), mask.astype(np.int8), np.int8(0)])).reshape(-1, 2) def your_function(is_overlap: np.ndarray, is_incomplete: np.ndarray, max_n_rows: int = 4) -> np.ndarray: # Step 1: compute the enlarged candidate regions indices_enlarged = get_indices_region(is_overlap | is_incomplete) # Step 2: Filter out enlarged candidates based on their length lengths_enlarged = indices_enlarged[:, 1] - indices_enlarged[:, 0] indices_enlarged = indices_enlarged[lengths_enlarged >= max_n_rows] # Step 3: compute the reduced candidate regions indices_reduced = get_indices_region(is_overlap) # Step 4: Filter out reduced candidates based on their overlap with the enlarged regions mask_overlap = np.any( (indices_enlarged[:, None, 0] <= indices_reduced[None, :, 0]) & (indices_enlarged[:, None, 1] >= indices_reduced[None, :, 1]), axis=0 ) indices_reduced = indices_reduced[~mask_overlap] return np.vstack((indices_reduced, indices_enlarged))
For your example, we get:
PYTHONmax_n_rows = 4 is_overlap = np.array([0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1], dtype=bool) is_incomplete = np.array([1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1], dtype=bool) your_function(is_overlap, is_incomplete, max_n_rows)
PLAINTEXT>>>> [[1 2] [14 15] [16 17] [3 8] [9 13]]
No comments yet.
No comments yet.