History¶

3.0.0 (2020-07-15)¶

pandas==1.* is now required.
For google_pandas_load.loader_quick_setup.LoaderQuickSetup, the parameter dataset_id is replaced by the parameter dataset_name. The reason for this choice is explained in the Notes section below.

For google_pandas_load.loader.Loader.load(), when the parameter destination is set to ‘bq’ and the parameter source is set to ‘gs’ or ‘local’, the bq_schema parameter is not required anymore. If it is not passed, it falls back to an inferred value from the CSV with google.cloud.bigquery.job.LoadJobConfig.autodetect.

The method google.cloud.bigquery.job.QueryJob.result() is used again to wait for a google job to be completed. The timeout bug described in the previous “bugfixes” seems to be due to a Docker configuration problem.
The end of a step “query_to_bq” produced the log: “Ended source to bq”. It has been corrected to “Ended query to bq”.

The parameters delete_in_bq, delete_in_gs and delete_in_local of of google_pandas_load.loader.Loader.load() do not exist anymore. There were used to choose if data had to be deleted once loaded to the next location. The new behavior is now the following:
- The data is kept in the source.
- The data is deleted in transitional locations after being transferred.
This behavior is safer, simpler to understand and fits to the common use.
The destination parameter of google_pandas_load.loader.Loader.load() can no longer be set to ‘query’ since it appeared to be useless. It is now restricted to ‘bq’, ‘gs’, ‘local’ or ‘dataframe’.
The gs_dir_path_in_bucket parameter of google_pandas_load.loader.Loader has been renamed gs_dir_path.
google_pandas_load.loader.Loader has now the following getter functions: bq_client, dataset_ref, bucket, gs_dir_path and local_dir_path. They return the homonym arguments of the class.
google_pandas_load.loader_quick_setup.LoaderQuickSetup has three new getter functions: project_id, dataset_id and bucket_name. They return the homonym arguments of the class.

The method google.cloud.bigquery.job.QueryJob.result() was used to wait for a google job to be completed. It appeared it could lead to a timeout if the google job was too long to run and is threfore no longer used. Instead, the google job is reloaded every second until it is completed.