Hub documentation
Bucket Integrations
Bucket Integrations
Storage Buckets can be read and written from many Python data libraries using hf://buckets/ paths, backed by the huggingface_hub filesystem interface.
For the underlying access mechanisms — mounts, volume mounts, and fsspec — see Access Patterns.
pandas
import pandas as pd
df = pd.read_parquet("hf://buckets/username/my-bucket/data.parquet")
df.to_parquet("hf://buckets/username/my-bucket/output.parquet")Dask
import dask.dataframe as dd
df = dd.read_parquet("hf://buckets/username/my-bucket/data.parquet")PyArrow
import pyarrow.parquet as pq
table = pq.read_table("hf://buckets/username/my-bucket/data.parquet")PySpark
With pyspark_huggingface installed:
df = (
spark.read.format("huggingface")
.option("data_files", '["data.parquet"]')
.load("buckets/username/my-bucket")
)See PySpark on the Hub for more.
🤗 Datasets
from datasets import load_dataset
ds = load_dataset("buckets/username/my-bucket", data_files=["data.parquet"])Filesystem operations
For direct file operations, huggingface_hub exposes a pre-instantiated filesystem object, hffs:
from huggingface_hub import hffs
with hffs.open("buckets/username/my-bucket/hello.txt", "w") as f:
f.write("Hello world!")
hffs.cp("buckets/username/my-bucket/hello.txt", "buckets/username/my-bucket/hello2.txt")
hffs.rm("buckets/username/my-bucket/hello2.txt")
files = hffs.ls("buckets/username/my-bucket")
text_files = hffs.glob("buckets/username/my-bucket/*.txt")Other languages
OpenDAL provides a similar filesystem interface for Rust, Java, Go, JavaScript, and more.
Coming soon
Support for more libraries is on the way — including Polars, DuckDB (native hf:// URL support), Daft, and webdataset.