Hugging Face Cache Directory Configuration
Problem Statement
When working with Hugging Face Transformers and related libraries, models, datasets, and other resources are automatically downloaded and cached on your system. The default cache location is typically in your home directory (~/.cache/huggingface/
on Linux/macOS or C:\Users\username\.cache\huggingface\
on Windows). This can become problematic when:
- Your home directory has limited disk space
- You want to organize cache files across multiple storage devices
- You need to manage different types of cache separately
- You're working in containerized environments like Docker
Recommended Solution: Environment Variables
The most comprehensive approach is to use environment variables, specifically HF_HOME
, which controls the cache location for all Hugging Face libraries (Transformers, Datasets, Hub, etc.).
Bash/Linux/macOS
# Set for current session
export HF_HOME=/path/to/your/cache/directory
# Make permanent by adding to ~/.bashrc, ~/.zshrc, or ~/.profile
echo 'export HF_HOME=/path/to/your/cache/directory' >> ~/.bashrc
Windows
# Command Prompt
set HF_HOME=E:\huggingface_cache
# PowerShell
$env:HF_HOME = "E:\huggingface_cache"
# Make permanent (PowerShell)
[Environment]::SetEnvironmentVariable("HF_HOME", "E:\huggingface_cache", "User")
Python Script
import os
os.environ['HF_HOME'] = '/path/to/your/cache/directory'
# Import transformers AFTER setting the environment variable
from transformers import AutoModel, AutoTokenizer
Alternative Approaches
Specific Library Cache Control
For finer-grained control over different cache types:
export HF_HOME=/my_drive/hf/misc
export HF_DATASETS_CACHE=/my_drive/hf/datasets
export TRANSFORMERS_CACHE=/my_drive/hf/models
Version Compatibility
TRANSFORMERS_CACHE
is deprecated in newer versions (v4.36.0+) and will be removed in v5. Use HF_HOME
for future-proof configuration.
Per-Model Cache Directory
Specify cache location when loading specific models:
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained(
"bert-base-uncased",
cache_dir="/specific/cache/path"
)
tokenizer = AutoTokenizer.from_pretrained(
"bert-base-uncased",
cache_dir="/specific/cache/path"
)
Symbolic Links (Fallback Solution)
If environment variables don't work in your setup:
# Move existing cache (if it exists)
mv ~/.cache/huggingface /desired/cache/path
# Create symbolic links
ln -s /desired/cache/path ~/.cache/huggingface
# Or link specific subdirectories
ln -s /desired/cache/hub ~/.cache/huggingface/hub
ln -s /desired/cache/modules ~/.cache/huggingface/modules
Docker Configuration
For containerized environments:
# Create host directory
mkdir ~/my_hf_cache
# Mount as volume with environment variable
docker run -v ~/my_hf_cache:/app/cache \
-e HF_HOME="/app/cache" \
<image_name>
Environment Variable Priority
Hugging Face libraries check environment variables in this order of priority:
- Library-specific variables (
TRANSFORMERS_CACHE
,HF_DATASETS_CACHE
) HF_HOME
XDG_CACHE_HOME
+/huggingface
- Default system cache location (
~/.cache/huggingface
)
Best Practices
Organization
Separate different cache types for easier management:
- Models:
HF_HOME/models
orTRANSFORMERS_CACHE
- Datasets:
HF_DATASETS_CACHE
- Miscellaneous:
HF_HOME
for other Hub resources
Token Storage
Your Hugging Face Hub access token is stored at <HF_HOME>/token
by default. If you need to preserve tokens when clearing cache, set:
export HF_TOKEN_PATH=$HOME/.huggingface_token
Permanent Changes
For persistent configuration, add environment variables to your shell startup files (~/.bashrc
, ~/.zshrc
, etc.) rather than setting them temporarily in each session.
Verification
To confirm your cache configuration is working:
from transformers import cached_path
# Check where files will be cached
print(f"Cache directory: {cached_path('https://huggingface.co/bert-base-uncased')}")
Troubleshooting
If changes don't take effect:
- Ensure environment variables are set before importing Hugging Face libraries
- Restart your Python interpreter or terminal session after making permanent changes
- Check for conflicting environment variables
- Verify directory permissions allow read/write access
By properly configuring your Hugging Face cache directory, you can efficiently manage disk space while maintaining optimal performance across all Hugging Face libraries.