Hugging Face Cache Directory Configuration

Problem Statement

When working with Hugging Face Transformers and related libraries, models, datasets, and other resources are automatically downloaded and cached on your system. The default cache location is typically in your home directory (~/.cache/huggingface/ on Linux/macOS or C:\Users\username\.cache\huggingface\ on Windows). This can become problematic when:

Your home directory has limited disk space
You want to organize cache files across multiple storage devices
You need to manage different types of cache separately
You're working in containerized environments like Docker

Alternative Approaches

Specific Library Cache Control

For finer-grained control over different cache types:

bash

export HF_HOME=/my_drive/hf/misc
export HF_DATASETS_CACHE=/my_drive/hf/datasets
export TRANSFORMERS_CACHE=/my_drive/hf/models

Version Compatibility

TRANSFORMERS_CACHE is deprecated in newer versions (v4.36.0+) and will be removed in v5. Use HF_HOME for future-proof configuration.

Per-Model Cache Directory

Specify cache location when loading specific models:

python

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained(
    "bert-base-uncased", 
    cache_dir="/specific/cache/path"
)

tokenizer = AutoTokenizer.from_pretrained(
    "bert-base-uncased",
    cache_dir="/specific/cache/path"
)

Symbolic Links (Fallback Solution)

If environment variables don't work in your setup:

bash

# Move existing cache (if it exists)
mv ~/.cache/huggingface /desired/cache/path

# Create symbolic links
ln -s /desired/cache/path ~/.cache/huggingface

# Or link specific subdirectories
ln -s /desired/cache/hub ~/.cache/huggingface/hub
ln -s /desired/cache/modules ~/.cache/huggingface/modules

Docker Configuration

For containerized environments:

bash

# Create host directory
mkdir ~/my_hf_cache

# Mount as volume with environment variable
docker run -v ~/my_hf_cache:/app/cache \
           -e HF_HOME="/app/cache" \
           <image_name>

Environment Variable Priority

Hugging Face libraries check environment variables in this order of priority:

Library-specific variables (TRANSFORMERS_CACHE, HF_DATASETS_CACHE)
HF_HOME
XDG_CACHE_HOME + /huggingface
Default system cache location (~/.cache/huggingface)

Best Practices

Organization

Separate different cache types for easier management:

Models: HF_HOME/models or TRANSFORMERS_CACHE
Datasets: HF_DATASETS_CACHE
Miscellaneous: HF_HOME for other Hub resources

Token Storage

Your Hugging Face Hub access token is stored at <HF_HOME>/token by default. If you need to preserve tokens when clearing cache, set:

bash

export HF_TOKEN_PATH=$HOME/.huggingface_token

Permanent Changes

For persistent configuration, add environment variables to your shell startup files (~/.bashrc, ~/.zshrc, etc.) rather than setting them temporarily in each session.

Verification

To confirm your cache configuration is working:

python

from transformers import cached_path

# Check where files will be cached
print(f"Cache directory: {cached_path('https://huggingface.co/bert-base-uncased')}")

Troubleshooting

If changes don't take effect:

Ensure environment variables are set before importing Hugging Face libraries
Restart your Python interpreter or terminal session after making permanent changes
Check for conflicting environment variables
Verify directory permissions allow read/write access

By properly configuring your Hugging Face cache directory, you can efficiently manage disk space while maintaining optimal performance across all Hugging Face libraries.

Related Posts

Hugging Face Cache Directory Configuration

Problem Statement

Recommended Solution: Environment Variables

Bash/Linux/macOS

Windows

Python Script

Alternative Approaches

Specific Library Cache Control

Per-Model Cache Directory

Symbolic Links (Fallback Solution)

Docker Configuration

Environment Variable Priority

Best Practices

Verification

Troubleshooting

Related Posts

Hugging Face Cache Directory Configuration ​

Problem Statement ​

Recommended Solution: Environment Variables ​

Bash/Linux/macOS ​

Windows ​

Python Script ​

Alternative Approaches ​

Specific Library Cache Control ​

Per-Model Cache Directory ​

Symbolic Links (Fallback Solution) ​

Docker Configuration ​

Environment Variable Priority ​

Best Practices ​

Verification ​

Troubleshooting ​

Hugging Face Cache Directory Configuration

Problem Statement

Recommended Solution: Environment Variables

Bash/Linux/macOS

Windows

Python Script

Alternative Approaches

Specific Library Cache Control

Per-Model Cache Directory

Symbolic Links (Fallback Solution)

Docker Configuration

Environment Variable Priority

Best Practices

Verification

Troubleshooting