Our database is composed by 17 tables (see UML diagram to visualize the main components). The following table shows how many records has each database table.
(All data in this page is from June 2024)
table name | number of records |
---|---|
author | 570,492 |
commits | 7,803,628 |
commit_parents | 10,951 |
conflicting_files_discussion | 121,744 |
dataset | 149,828 |
datasets_in_space | 3,381 |
discussion | 273,191 |
discussion_event | 518,924 |
files_in_commit | 21,055,405 |
model | 681,682 |
models_in_space | 183,120 |
modified_file | 21,259,405 |
repo_file | 63,039,567 |
repository | 1,088,879 |
space | 257,342 |
tag | 66,921 |
tags_in_repo | 5,111,538 |
From the UML diagram, we have defined the corresponding database schema. In the database, the Repository inheritance has been mapped using the concrete table inheritance method, resulting into three tables (model, dataset and repository, space does not have an own class as it does not contain specific information). Attribute and table names are mapped from the attribute names presented in the Hugging Face Hub library.
We deployed our database in a MariaDB server. We offer the database as a compressed dump file. Note: Due to the high volume of data introduced in v1.1, we only populated the modified_file
for models. One can also use the HFC extractor to enrich the last HFC dump. After the release of June 2024, we will not populate the modified_file
table anymore due to size restrictions.
Date | Download link |
---|---|
October 2024 | |
June 2024a | |
October 2023 | |
September 2023 | |
August 2023 | |
July 2023 | |
June 2023 | |
May 2023 | |
April 2023 | |
March 2023 | |
November 2022 | |
= v1.0 aThis release contains the populated modified_file table.
|
= v1.1