Script docs
Script to retrieve HFH and Git data, and to create and to populate a MariaDB database.
- databaseImport.check_database_schema(cursor, database)[source]
Auxiliar function to check if all the tables required are in the database.
- Parameters:
cursor (cursor.MySQLCursor) – The MySQL connection cursor to execute operations such as SQL statements.
database (str) – MariaDB database name.
- databaseImport.create_connection_mysql()[source]
Configures a database connection to a MySQL/MariaDB database. Database configuration file must be located in the path as a JSON file called hfc.config.
- Returns:
Tuple containing the MySQL connector to the database specified in the configuration file and the database name.
- Return type:
(connection.MySQLConnection, str)
- databaseImport.create_schema_mysql(cursor)[source]
Creates the database schema, following this ER diagram.
- Parameters:
cursor (cursor.MySQLCursor) – The MySQL connection cursor to execute operations such as SQL statements.
- databaseImport.onerror(func, path, exc_info)[source]
Error handler for
shutil.rmtree
. Intended for Windows usage (e.g., Acces Denied)If the error is due to an access error (read only file) it attempts to add write permission and then retries.
If the error is for another reason it re-raises the error.
Usage :
shutil.rmtree(path, onerror=onerror)
- databaseImport.populate_commits(cursor, conn, repo_name, repo_type)[source]
Importation of commit information using PyDriller. It inserts commit information into
commits
,author
,modified_file
andfiles_in_commit
tables.- Parameters:
cursor (cursor.MySQLCursor) – The MySQL connection cursor to execute operations such as SQL statements.
conn (connection.MySQLConnection) – The MySQL connector to the database specified in the configuration file. Used to commit changes to fulfill FK restrictions. Used to commit changes to fulfill FK restrictions.
repo_name (str) – The full name (i.e., “owner/repo_name”) of the repository.
repo_type (str) – The type of the repository (i.e., model, dataset or space)
- databaseImport.populate_datasets(cursor, conn, api, limit_index, limit_date)[source]
Importation of the information of datasets. It retrieves the whole set of models from HFH or, optionally, the first
limit_index
elements. It inserts dataset information intorepository
,dataset
andauthor
tables, and calls the rest of populate methods to fill the remaining tables (not includingpopulate_models
andpopulate_spaces
).- Parameters:
cursor (cursor.MySQLCursor) – The MySQL connection cursor to execute operations such as SQL statements.
conn (connection.MySQLConnection) – A MySQL connector to the database specified in the configuration file. Used to commit changes to fulfill FK restrictions.
api (huggingface_hub.HfApi) – The huggingface_hub API object pointer.
limit_index (int) – limit_index bound of the slicing of the set of datasets
limit_date (datetime) – Date from which it starts to update the database (e.g., update the database values of the datasets that have been modified in the last month).
- databaseImport.populate_discussions(cursor, conn, api, repo_name, repo_type)[source]
Importation of discussions information. It inserts discussion information into
discussion
,author
,conflicting_files_discussion
anddiscussion_event
tables.- Parameters:
cursor (cursor.MySQLCursor) – The MySQL connection cursor to execute operations such as SQL statements.
conn (connection.MySQLConnection) – A MySQL connector to the database specified in the configuration file. Used to commit changes to fulfill FK restrictions.
api (huggingface_hub.HfApi) – The huggingface_hub API object pointer.
repo_name (str) – The full name (i.e., “owner/repo_name”) of the repository.
repo_type (str) – The type of the repository (i.e., model, dataset or space)
- databaseImport.populate_files(cursor, api, repo_name, repo_type)[source]
Importation of file information. It inserts file information into
repo_file
table.- Parameters:
cursor (cursor.MySQLCursor) – The MySQL connection cursor to execute operations such as SQL statements.
repo_name (str) – The full name (i.e., “owner/repo_name”) of the repository.
repo_type (str) – The type of the repository (i.e., model, dataset or space)
- databaseImport.populate_models(cursor, conn, api, limit_index, limit_date)[source]
Importation of the information of models. It retrieves the whole set of models from HFH or, optionally, the first
limit_index
elements. It inserts model information intorepository
,model
andauthor
tables, and calls the rest of populate methods to fill the remaining tables (not includingpopulate_datasets
andpopulate_spaces
).- Parameters:
cursor (cursor.MySQLCursor) – The MySQL connection cursor to execute operations such as SQL statements.
conn (connection.MySQLConnection) – A MySQL connector to the database specified in the configuration file. Used to commit changes to fulfill FK restrictions.
api (huggingface_hub.HfApi) – The huggingface_hub API object pointer.
limit_index (int) – limit_index bound of the slicing of the set of models
limit_date (datetime) – Date from which it starts to update the database (e.g., update the database values of the models that have been modified in the last month).
- databaseImport.populate_space_dependencies(cursor, models, datasets, repo_name)[source]
Importation of space dependency information. It inserts the models and datasets used information into the
models_in_space
anddatasets_in_space
tables.- Parameters:
cursor (cursor.MySQLCursor) – The MySQL connection cursor to execute operations such as SQL statements.
models (list[str]) – A list of model ids.
datasets (list[str]) – A list of dataset ids.
repo_name (str) – The name of the space repository.
- databaseImport.populate_spaces(cursor, conn, api, limit_index, limit_date)[source]
Importation of the information of spaces. It retrieves the whole set of models from HFH or, optionally, the first
limit_index
elements. It inserts model information intorepository
,space
andauthor
tables, and calls the rest of populate methods to fill the remaining tables (not includingpopulate_models
andpopulate_datasets
).- Parameters:
cursor (cursor.MySQLCursor) – The MySQL connection cursor to execute operations such as SQL statements.
conn (connection.MySQLConnection) – A MySQL connector to the database specified in the configuration file. Used to commit changes to fulfill FK restrictions.
api (huggingface_hub.HfApi) – The huggingface_hub API object pointer.
limit_index (int) – limit_index bound of the slicing of the set of spaces
limit_date (datetime) – Date from which it starts to update the database (e.g., update the database values of the spaces that have been modified in the last month).
- databaseImport.populate_tags(cursor, conn, tags, repo_name, repo_type)[source]
Importation of tag information. It inserts tag information into the
tag
andtags_in_repo
tables.- Parameters:
cursor (cursor.MySQLCursor) – The MySQL connection cursor to execute operations such as SQL statements.
conn (connection.MySQLConnection) – The MySQL connector to the database specified in the configuration file. Used to commit changes to fulfill FK restrictions.
tags (list[str]) – A list of tag names.
repo_name (str) – The full name (i.e., “owner/repo_name”) of the repository.
repo_type (str) – The type of the repository (i.e., model, dataset or space)