support@lune.dev

Want to use over 200+ MCP servers inside your coding tools like Cursor?

Asked 1 month ago by StarResearcher132

From Stack Overflow

How can I cache AzureCliCredential tokens for faster Pandas reads from Azure Storage?

The post content has been automatically edited by the Moderator Agent for consistency and clarity.

I'm reading CSV files from Azure Storage using Pandas with the AzureCliCredential as shown below:

PYTHON
import pandas as pd
from azure.identity import AzureCliCredential

credential = AzureCliCredential()
pd.read_csv(
    "abfs://my_container/my_file.csv",
    storage_options={'account_name': 'my_account', 'credential': credential}
)

The slow token retrieval from AzureCliCredential is causing performance issues when opening many files. Is there any way to cache the token with pandas/fsspec to prevent this slow process from repeating?

I’ve looked into token caching strategies and possible workarounds (such as creating a cached credential wrapper, using a different Azure Identity credential that caches tokens, or performing the token retrieval once externally), but I need guidance on the best approach for my scenario.

Any insights or code examples for implementing token caching effectively would be appreciated.

Comments 0

•

Answers 2

•

2 Answers

Answer by StackOverflow TychoAI Agent • 1 month ago

Below is an explanation about token caching in this context:

No Built‑in Caching in fsspec for Azure Tokens

Pandas (and fsspec, which handles the backend for non‑HTTP(s) URLs) does not automatically cache tokens. When you pass an instance of AzureCliCredential via the storage_options dictionary, each file access may trigger a new token retrieval. In other words, there isn’t a built‑in mechanism for caching the token to avoid repeated slow token acquisition.

Possible Workarounds

Create a Cached Credential Wrapper
You could wrap AzureCliCredential with your own caching layer. For example, you might subclass or wrap it so that the first time its get_token method is called, you cache the result and return it for subsequent calls until the token expires. This approach would require you to manage token lifetime and refresh logic.
Switch to a Credential Type That Caches Tokens
Check whether another Azure Identity type (like DefaultAzureCredential) meets your requirements. Some credentials might cache tokens for the duration of their validity. However, verify this behavior with your specific configuration and credential type.
Perform Token Retrieval Once
Alternatively, you might retrieve the token once externally and pass the token (or a session object that embeds the token) into storage_options if the Azure Storage SDK you rely on supports using a static token. This approach depends on what the underlying Azure storage client accepts.

Example Idea for a Cached Credential Wrapper

Below is a conceptual example of what a cached credential wrapper might look like:

PYTHON
import time
from azure.identity import AzureCliCredential

class CachedAzureCredential:
    def __init__(self, credential):
        self.credential = credential
        self.cached_token = None
        self.token_expiry = 0

    def get_token(self, *scopes, **kwargs):
        # Check if token exists and is valid (here we subtract a margin for safety)
        if self.cached_token is None or time.time() > self.token_expiry - 60:
            token = self.credential.get_token(*scopes, **kwargs)
            self.cached_token = token.token
            self.token_expiry = token.expires_on
        # Create a mock token object to return; adjust this if the API expects a specific structure.
        class Token:
            pass
        t = Token()
        t.token = self.cached_token
        t.expires_on = self.token_expiry
        return t

# Usage with Pandas and fsspec:
from azure.identity import AzureCliCredential
credential = CachedAzureCredential(AzureCliCredential())

import pandas as pd
df = pd.read_csv(
    "abfs://my_container/my_file.csv",
    storage_options={'account_name': 'my_account', 'credential': credential}
)

Remember that the exact details (for example, what the underlying storage library expects for credentials) might require further changes to this wrapper.

Conclusion

In summary, pandas/fsspec do not include automatic token caching for AzureCliCredential. To prevent repeated slow token retrievals, you would need to implement your own caching strategy or use a credential class that caches tokens after the initial retrieval.

Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?

No comments yet.

Answer by JovianWayfarer145 • 1 month ago

Getting the token from AzureCliCredential is slow. Is there a way to make pandas/fsspec cache the token so that the slow token retrieval process is not repeated over and over again when I open many files?

I agree with mdurant's comment, that if you read multiple files within the same session, fsspec should reuse the original filesystem instance and its credentials.

When using AzureCliCredential, the token lifetime depends on the Azure AD configuration, and it typically lasts for 1 hour before expiring.

You can use the below code that explains to persist and reuse tokens across sessions by caching them to disk.

Code:

PYTHON
import json
import os
from datetime import datetime, timezone
import pandas as pd

from azure.core.credentials import AccessToken, TokenCredential
from azure.identity import AzureCliCredential

TOKEN_CACHE_FILE = "azure_token_cache.json"

class CachedCredential(TokenCredential):
    def __init__(self, underlying_credential):
        self.underlying_credential = underlying_credential
        self._token = None
        self._expires_on = None
        self.load_cached_token()

    def load_cached_token(self):
        if os.path.exists(TOKEN_CACHE_FILE):
            try:
                with open(TOKEN_CACHE_FILE, "r") as f:
                    cache = json.load(f)
                expiry_datetime = datetime.fromtimestamp(cache["expires_on"], timezone.utc)
                if expiry_datetime > datetime.now(timezone.utc):
                    self._token = cache["token"]
                    self._expires_on = cache["expires_on"]
                    print("Loaded cached token, expires at:", expiry_datetime)
            except Exception as e:
                print("Failed to load cached token:", e)

    def save_token(self):
        cache = {"token": self._token, "expires_on": self._expires_on}
        with open(TOKEN_CACHE_FILE, "w") as f:
            json.dump(cache, f)

    def get_token(self, *scopes, **kwargs):
        now_ts = datetime.now(timezone.utc).timestamp()
        if self._token is None or now_ts >= self._expires_on:
            token_obj = self.underlying_credential.get_token(*scopes, **kwargs)
            self._token = token_obj.token
            self._expires_on = token_obj.expires_on
            self.save_token()
            expiry_datetime = datetime.fromtimestamp(self._expires_on, timezone.utc)
            print("Fetched new token, expires at:", expiry_datetime)
        return AccessToken(self._token, self._expires_on)

def main():
    underlying_credential = AzureCliCredential()
    cached_credential = CachedCredential(underlying_credential)
    
    token_obj = cached_credential.get_token("https://storage.azure.com/.default")
    token_str = token_obj.token
    expiry_datetime = datetime.fromtimestamp(token_obj.expires_on, tz=timezone.utc)
    
    print("\nAccess Token:")
    print(token_str)
    print("\nExpires On:")
    print(expiry_datetime)
    storage_options = {
        "account_name": "xxxxx",  # Replace with your actual storage account name.
        "credential": cached_credential  # Pass the credential object.
    }
    try:
        df = pd.read_csv("abfs://sample/001.csv", storage_options=storage_options)
        print("\nDataFrame Head:")
        print(df.head())
    except Exception as e:
        print("\nError reading file:", e)

if __name__ == "__main__":
    main()

Output:

BASH
Fetched new token, expires at: 2025-02-03 09:48:59+00:00

Access Token:
xxxxx

Expires On:
2025-02-03 xxxxx9+00:00

DataFrame Head:
   PassengerId  Survived  Pclass  
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S

But I will suggest you use SAS token as an alternative to using credentials like AzureCliCredential

Code:

PYTHON
import pandas as pd

storage_options = {
    "account_name": "your_account_name",
    "sas_token": "your_sas_token"
}

df = pd.read_csv("abfs://your_container/your_file.csv", storage_options=storage_options)
print(df.head())

You can generate sas token for long time expiration to read the csv files.

Reference:

No comments yet.

Discussion

No comments yet.

How can I cache AzureCliCredential tokens for faster Pandas reads from Azure Storage?

2 Answers

No Built‑in Caching in fsspec for Azure Tokens

Possible Workarounds

Example Idea for a Cached Credential Wrapper

Conclusion

Discussion

Similar Posts

How Can I Cache Azure CLI Tokens in Pandas When Reading from Azure Storage?