Bulk download from GCS

Frequently Asked Questions
Post Reply
quinten
Posts: 991
Joined: Tue Mar 03, 2015 8:13 am

Bulk download from GCS

Post by quinten »

Google hosts a copy of the Landsat and Sentinel-2 archives on their Cloud Storage service, and you can access those data with gsutil. It generally takes 1 week or longer for the data to appear on GCS, so for near realtime data it is better to use EarthExplorer or Copernicus.

The bucket for Landsat is gcp-public-data-landsat and for Sentinel-2 is gcp-public-data-sentinel-2

You can e.g. list tiles:

From the L8 collection 1 "combined OLI and TIRS" data for a path/row (here 199/024):

gsutil ls gs://gcp-public-data-landsat/LC08/01/199/024


All available Sentinel-2 tiles:

gsutil ls gs://gcp-public-data-sentinel-2/tiles

Or download the full tile index to the current working directory:
Landsat:
gsutil cp gs://gcp-public-data-landsat/index.csv.gz ./
or S2:
gsutil cp gs://gcp-public-data-sentinel-2/index.csv.gz ./

Or get a tile to the current working dir:
gsutil cp -r gs://gcp-public-data-landsat/LC08/01/199/024/LC08_L1TP_199024_20180903_20180912_01_T1/ ./
note the -r flag for recursive download, since this is a unzipped tile directory.

You could do bulk sync of tiles as well; e.g.
gsutil cp -r gs://gcp-public-data-landsat/LC08/01/199/024 ./199/024
oneLaker
Posts: 18
Joined: Thu May 10, 2018 1:39 pm

Re: Bulk download from GCS

Post by oneLaker »

Hi Quinten,

I compared the data from google cloud storage with the data from USGS, it seems that the data of gcs were compressed and became the tiled image. Therefore, the data cannot be identified by the SeaDAS l2gen. I do believe that Acolite will support them very well.

Regards,
Zhigang
ags-tolson
Posts: 22
Joined: Wed Oct 04, 2017 9:14 pm

Re: Bulk download from GCS

Post by ags-tolson »

If you want to see what programmatic support for sentinel-2 in google storage might look like in a python context, here's how we built it into our software.

We wrote a class that adds gs support to our existing architecture, in which each data source is a sort of 'driver':

https://github.com/Applied-GeoSolutions ... re.py#L104

Here's a sample of how it's leveraged for searching for data in google storage (ie querying) and downloading it (ie fetching):

https://github.com/Applied-GeoSolutions ... l2.py#L406

https://github.com/Applied-GeoSolutions ... l2.py#L471

I'm not sure we knew you could do acolite runs with google-storage-downloaded data, so that's good to know.
quinten
Posts: 991
Joined: Tue Mar 03, 2015 8:13 am

Re: Bulk download from GCS

Post by quinten »

oneLaker wrote: Sun Jan 20, 2019 4:12 pm Hi Quinten,

I compared the data from google cloud storage with the data from USGS, it seems that the data of gcs were compressed and became the tiled image. Therefore, the data cannot be identified by the SeaDAS l2gen. I do believe that Acolite will support them very well.

Regards,
Zhigang

Hi Zhigang

I found that to process the gcs scenes with l2gen you need to convert all the GeoTIFF files in the bundle to strip format: geotifcp -s

Quinten
oneLaker
Posts: 18
Joined: Thu May 10, 2018 1:39 pm

Re: Bulk download from GCS

Post by oneLaker »

Thanks so much, Quinten. Sorry for replying so late. SeaDAS works very well used the converted files.

Regards,
Zhigang
IsabelBrand
Posts: 45
Joined: Wed Apr 19, 2017 9:37 am

Re: Bulk download from GCS

Post by IsabelBrand »

Hi Quinten,

I was trying to follow your suggestions to bulk download data from GCS, but I couldn't apply them to sentinel2.

I tried to get a list of tiles (gsutil ls gs://gcp-public-data-sentinel-2/tiles/31/U/ET) and bulk syn of tiles to the current dir (gsutil cp -r gs://gcp-public-data-sentinel-2/tiles/31/U/ET/S2A_MSIL1C_20150706T105016_N0204_R051_T31UET_20150706T105351.SAFE ./) but I get an error message: ServiceException: 401 Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object.

Any ideas what might be causing this error? How can I bulk download tiles to a dir?

Happy to hear from you,

Isabel
quinten
Posts: 991
Joined: Tue Mar 03, 2015 8:13 am

Re: Bulk download from GCS

Post by quinten »

Hi Isabel

Did you set up/authenticate your Google account in gcloud/gsutil? See the credentials part on this page: https://cloud.google.com/storage/docs/gsutil_install

Quinten
IsabelBrand
Posts: 45
Joined: Wed Apr 19, 2017 9:37 am

Re: Bulk download from GCS

Post by IsabelBrand »

Hi Quinten,

I was wondering how can one resume a download from GCS in case of a lost internet connection.

I'm downloading Sentinel data from GCS using gsutil (>> gsutil cp -r gs://gcp-public-data-sentinel-2/tiles/31/U/ET/ ./) and I realized that after my internet connection is lost, the download restarts again.

Any suggestions on this?

Thanks in advance,

Isabel
quinten
Posts: 991
Joined: Tue Mar 03, 2015 8:13 am

Re: Bulk download from GCS

Post by quinten »

Hi Isabel

Indeed, cp (re)starts the transfer every time. It is probably better to use rsync rather than cp, perhaps with the -m flag for bulk downloading.

Did you try:

Code: Select all

gsutil -m rsync -r gs://gcp-public-data-sentinel-2/tiles/31/U/ET/S2A_MSIL1C_20150706T105016_N0204_R051_T31UET_20150706T105351.SAFE ./
Or for the whole tileset:

Code: Select all

gsutil -m rsync -r gs://gcp-public-data-sentinel-2/tiles/31/U/ET/ ./
Quinten
Post Reply