Skip to content

Hugging Face Cache Directory Configuration

Problem Statement

When working with Hugging Face Transformers and related libraries, models, datasets, and other resources are automatically downloaded and cached on your system. The default cache location is typically in your home directory (~/.cache/huggingface/ on Linux/macOS or C:\Users\username\.cache\huggingface\ on Windows). This can become problematic when:

  • Your home directory has limited disk space
  • You want to organize cache files across multiple storage devices
  • You need to manage different types of cache separately
  • You're working in containerized environments like Docker

The most comprehensive approach is to use environment variables, specifically HF_HOME, which controls the cache location for all Hugging Face libraries (Transformers, Datasets, Hub, etc.).

Bash/Linux/macOS

bash
# Set for current session
export HF_HOME=/path/to/your/cache/directory

# Make permanent by adding to ~/.bashrc, ~/.zshrc, or ~/.profile
echo 'export HF_HOME=/path/to/your/cache/directory' >> ~/.bashrc

Windows

bash
# Command Prompt
set HF_HOME=E:\huggingface_cache

# PowerShell
$env:HF_HOME = "E:\huggingface_cache"

# Make permanent (PowerShell)
[Environment]::SetEnvironmentVariable("HF_HOME", "E:\huggingface_cache", "User")

Python Script

python
import os
os.environ['HF_HOME'] = '/path/to/your/cache/directory'

# Import transformers AFTER setting the environment variable
from transformers import AutoModel, AutoTokenizer

Alternative Approaches

Specific Library Cache Control

For finer-grained control over different cache types:

bash
export HF_HOME=/my_drive/hf/misc
export HF_DATASETS_CACHE=/my_drive/hf/datasets
export TRANSFORMERS_CACHE=/my_drive/hf/models

Version Compatibility

TRANSFORMERS_CACHE is deprecated in newer versions (v4.36.0+) and will be removed in v5. Use HF_HOME for future-proof configuration.

Per-Model Cache Directory

Specify cache location when loading specific models:

python
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained(
    "bert-base-uncased", 
    cache_dir="/specific/cache/path"
)

tokenizer = AutoTokenizer.from_pretrained(
    "bert-base-uncased",
    cache_dir="/specific/cache/path"
)

If environment variables don't work in your setup:

bash
# Move existing cache (if it exists)
mv ~/.cache/huggingface /desired/cache/path

# Create symbolic links
ln -s /desired/cache/path ~/.cache/huggingface

# Or link specific subdirectories
ln -s /desired/cache/hub ~/.cache/huggingface/hub
ln -s /desired/cache/modules ~/.cache/huggingface/modules

Docker Configuration

For containerized environments:

bash
# Create host directory
mkdir ~/my_hf_cache

# Mount as volume with environment variable
docker run -v ~/my_hf_cache:/app/cache \
           -e HF_HOME="/app/cache" \
           <image_name>

Environment Variable Priority

Hugging Face libraries check environment variables in this order of priority:

  1. Library-specific variables (TRANSFORMERS_CACHE, HF_DATASETS_CACHE)
  2. HF_HOME
  3. XDG_CACHE_HOME + /huggingface
  4. Default system cache location (~/.cache/huggingface)

Best Practices

Organization

Separate different cache types for easier management:

  • Models: HF_HOME/models or TRANSFORMERS_CACHE
  • Datasets: HF_DATASETS_CACHE
  • Miscellaneous: HF_HOME for other Hub resources

Token Storage

Your Hugging Face Hub access token is stored at <HF_HOME>/token by default. If you need to preserve tokens when clearing cache, set:

bash
export HF_TOKEN_PATH=$HOME/.huggingface_token

Permanent Changes

For persistent configuration, add environment variables to your shell startup files (~/.bashrc, ~/.zshrc, etc.) rather than setting them temporarily in each session.

Verification

To confirm your cache configuration is working:

python
from transformers import cached_path

# Check where files will be cached
print(f"Cache directory: {cached_path('https://huggingface.co/bert-base-uncased')}")

Troubleshooting

If changes don't take effect:

  1. Ensure environment variables are set before importing Hugging Face libraries
  2. Restart your Python interpreter or terminal session after making permanent changes
  3. Check for conflicting environment variables
  4. Verify directory permissions allow read/write access

By properly configuring your Hugging Face cache directory, you can efficiently manage disk space while maintaining optimal performance across all Hugging Face libraries.