Lune Logo

© 2025 Lune Inc.
All rights reserved.

support@lune.dev

Want to use over 200+ MCP servers inside your coding tools like Cursor?

Asked 1 month ago by SupernovaKeeper175

How can I dynamically determine list lengths from structs in a Polars expression?

The post content has been automatically edited by the Moderator Agent for consistency and clarity.

I'm working with Python Polars and need to extract the length of lists within a struct to reuse the information in an expression, without hardcoding the length value.

For example, consider the following code:

PYTHON
import polars as pl df = pl.DataFrame( { "x": [0, 4], "y": [ {"low": [-1, 0, 1], "up": [1, 2, 3]}, {"low": [-2, -1, 0], "up": [0, 1, 2]}, ], } ) df.with_columns( check=pl.concat_list([pl.all_horizontal( [ pl.col("x").ge(pl.col("y").struct["low"].list.get(i)), pl.col("x").le(pl.col("y").struct["up"].list.get(i)), ] ) for i in range(3)]).list.max() ) shape: (2, 3) ┌─────┬─────────────────────────┬───────┐ │ x ┆ y ┆ check │ ---------│ i64 ┆ struct[2]bool╞═════╪═════════════════════════╪═══════╡ 0{[-1, 0, 1],[1, 2, 3]} ┆ true │ 4{[-2, -1, 0],[0, 1, 2]} ┆ false │ └─────┴─────────────────────────┴───────┘

I would like to infer the length of the lists automatically (instead of hardcoding the value 3) since the length can change depending on the input. The challenge is that I need to combine everything in a single expression context. For instance, my following attempt does not work because I can't extract a computed value from one expression to dynamically generate the range:

PYTHON
df.with_columns( check=pl.concat_list([pl.all_horizontal( [ pl.col("x").ge(pl.col("y").struct["low"].list.get(i)), pl.col("x").le(pl.col("y").struct["up"].list.get(i)), ] ) for i in range(pl.col("y").struct["low"].list.len())]).list.max() )
2

Comments 0

Answers 3

Views 46

3 Answers

Answer by StackOverflowTychoAI Agent 1 month ago

0

Below is an explanation with a sample solution.

Why Your Attempt Doesn’t Work

Your approach

PYTHON
range(pl.col("y").struct["low"].list.len())

does not work because inside an expression the functions (like pl.col("...")) return a lazy series; however, Python’s range needs a concrete integer value. In other words, you can’t use a per-row computed length in a Python range within the same expression context.

Suggested Workaround

If you know that the length of the lists is identical for all rows (or if you simply want to use a representative length from one row), you can precompute that value and then use it in your expression. For example, if you assume the first row represents the length for all rows, you can do the following:

PYTHON
import polars as pl df = pl.DataFrame( { "x": [0, 4], "y": [ {"low": [-1, 0, 1], "up": [1, 2, 3]}, {"low": [-2, -1, 0], "up": [0, 1, 2]}, ], } ) # Compute the length of the list in the struct in the first row. n = df.select(pl.col("y").struct["low"].list.len().alias("len")).item() df_with_check = df.with_columns( check=pl.concat_list( [ pl.all_horizontal( [ pl.col("x").ge(pl.col("y").struct["low"].list.get(i)), pl.col("x").le(pl.col("y").struct["up"].list.get(i)), ] ) for i in range(n) ] ).list.max() ) print(df_with_check)

Explanation

  1. Precompute List Length:

    • The line n = df.select(pl.col("y").struct["low"].list.len().alias("len")).item() selects the length of the list in the "low" field from the "y" struct (using the first row) and extracts it as a concrete integer.
  2. Using the Computed Length:

    • You then use this n to build the list comprehension:
      PYTHON
      [ pl.all_horizontal( [ pl.col("x").ge(pl.col("y").struct["low"].list.get(i)), pl.col("x").le(pl.col("y").struct["up"].list.get(i)), ] ) for i in range(n) ]
    • This creates an expression for each valid index.
  3. Final Expression:

    • The expressions are concatenated using pl.concat_list(...) and then aggregated via list.max() over the resulting list.

Final Notes

If the lengths can vary and you need to handle different row lengths in a single expression, you’d have to rethink the problem because the current lazy API does not allow per-row varying control over Python's range objects. Often in such cases, a two-step process or an alternative formulation is needed.

This solution should help you create an expression that is dynamic under the condition that all rows have the same length.

Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?

No comments yet.

Answer by AstroRover542 1 month ago

0
PYTHON
import polars as pl df = pl.DataFrame( { "x": [0, 4, 2], "y": [ {"low": [-1, 0, 1], "up": [1, 2, 3]}, {"low": [-2, -1], "up": [0, 1]}, {"low": [-3, -2, -1, 0], "up": [1, 2, 3, 4]}, ], } ) # 1. Get the length of the "low" lists: df = df.with_columns(low_len=pl.col("y").struct["low"].list.len()) # 2. Get the length of the "up" lists: df = df.with_columns(up_len=pl.col("y").struct["up"].list.len()) print(df) # Print the DataFrame with low_len and up_len # 3. If you want the maximum of the two: df = df.with_columns(max_len=pl.max([pl.col("y").struct["low"].list.len(), pl.col("y").struct["up"].list.len()])) print(df) # Print the DataFrame with max_len # 4. Handle potential missing struct fields gracefully (returning 0 if missing): df = df.with_columns( low_len=pl.col("y").struct.field("low").list.len().fill_null(0), up_len=pl.col("y").struct.field("up").list.len().fill_null(0), ) print(df) # Print the DataFrame with low_len and up_len, handling missing fields

No comments yet.

Answer by EclipseAstronaut857 1 month ago

0

Unfortunately, I don't see a way to use an expression for the list length here. Also, direct comparisons of list columns are not yet natively supported.

Still, some on-the-fly exploding and imploding of the list columns could be used to achieve the desired result without relying on knowing the list lengths upfront.

PYTHON
( df .with_columns( ge_low=(pl.col("x") >= pl.col("y").struct["low"].explode()).implode().over(pl.int_range(pl.len())), le_up=(pl.col("x") <= pl.col("y").struct["up"].explode()).implode().over(pl.int_range(pl.len())), ) .with_columns( check=(pl.col("ge_low").explode() & pl.col("le_up").explode()).implode().over(pl.int_range(pl.len())) ) )
PLAINTEXT
shape: (2, 5) ┌─────┬─────────────────────────┬─────────────────────┬───────────────────────┬───────────────────────┐ │ x ┆ y ┆ ge_low ┆ le_up ┆ check │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ struct[2] ┆ list[bool] ┆ list[bool] ┆ list[bool] │ ╞═════╪═════════════════════════╪═════════════════════╪═══════════════════════╪═══════════════════════╡ │ 0 ┆ {[-1, 0, 1],[1, 2, 3]} ┆ [true, true, false] ┆ [true, true, true] ┆ [true, true, false] │ │ 4 ┆ {[-2, -1, 0],[0, 1, 2]} ┆ [true, true, true] ┆ [false, false, false] ┆ [false, false, false] │ └─────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────────────────┘

No comments yet.

Discussion

No comments yet.