Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download_ee_image_tiles_parallel error #2146

Closed
lwq-star opened this issue Oct 2, 2024 · 3 comments · Fixed by #2158
Closed

download_ee_image_tiles_parallel error #2146

lwq-star opened this issue Oct 2, 2024 · 3 comments · Fixed by #2158
Labels
bug Something isn't working

Comments

@lwq-star
Copy link

lwq-star commented Oct 2, 2024

Environment Information

image

Description

When I parallelized the image processing, I got this error: BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

When I use geemap.download_ee_image, the image size is 65G
466c9c81b61929c59ffbc36a64a6051

What I Did

Here is my code

region = geemap.shp_to_ee(r"C:\Users\A1827\Desktop\东部沙地\沙地分块合并.shp")
# fishnet = geemap.fishnet(roi, rows=4, cols=4)
geemap.download_ee_image_tiles_parallel(msavi, region, out_dir=r"C:\Users\A1827\Desktop\东部沙地\image", prefix="msavi", 
                                        scale=10, crs='EPSG:4326')

This is an error

---------------------------------------------------------------------------
_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "d:\Application\miniforge3\envs\gee\Lib\site-packages\joblib\externals\loky\process_executor.py", line 426, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "d:\Application\miniforge3\envs\gee\Lib\multiprocessing\queues.py", line 122, in get
    return _ForkingPickler.loads(res)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'DateRange' on <module 'ee' from 'd:\\Application\\miniforge3\\envs\\gee\\Lib\\site-packages\\ee\\__init__.py'>
"""

The above exception was the direct cause of the following exception:

BrokenProcessPool                         Traceback (most recent call last)
Cell In[10], [line 3](vscode-notebook-cell:?execution_count=10&line=3)
      [1](vscode-notebook-cell:?execution_count=10&line=1) region = geemap.shp_to_ee(r"C:\Users\A1827\Desktop\东部沙地\沙地分块合并.shp")
      [2](vscode-notebook-cell:?execution_count=10&line=2) # fishnet = geemap.fishnet(roi, rows=4, cols=4)
----> [3](vscode-notebook-cell:?execution_count=10&line=3) geemap.download_ee_image_tiles_parallel(msavi, region, out_dir=r"C:\Users\A1827\Desktop\东部沙地\image", prefix="msavi", 
      [4](vscode-notebook-cell:?execution_count=10&line=4)                                         scale=10, crs='EPSG:4326')

File d:\Application\miniforge3\envs\gee\Lib\site-packages\geemap\common.py:13134, in download_ee_image_tiles_parallel(image, features, out_dir, prefix, crs, crs_transform, scale, resampling, dtype, overwrite, num_threads, max_tile_size, max_tile_dim, shape, scale_offset, unmask_value, column, job_args, **kwargs)
  [13114](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/geemap/common.py:13114)     download_ee_image(
  [13115](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/geemap/common.py:13115)         image,
  [13116](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/geemap/common.py:13116)         filename,
   (...)
  [13130](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/geemap/common.py:13130)         **kwargs,
  [13131](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/geemap/common.py:13131)     )
  [13133](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/geemap/common.py:13133) with joblib.Parallel(**job_args) as parallel:
> [13134](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/geemap/common.py:13134)     parallel(joblib.delayed(download_data)(index) for index in range(count))
  [13136](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/geemap/common.py:13136) end = time.time()
  [13137](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/geemap/common.py:13137) print(f"Finished in {end - start} seconds.")

File d:\Application\miniforge3\envs\gee\Lib\site-packages\joblib\parallel.py:2007, in Parallel.__call__(self, iterable)
   [2001](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:2001) # The first item from the output is blank, but it makes the interpreter
   [2002](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:2002) # progress until it enters the Try/Except block of the generator and
   [2003](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:2003) # reaches the first `yield` statement. This starts the asynchronous
   [2004](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:2004) # dispatch of the tasks to the workers.
   [2005](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:2005) next(output)
-> [2007](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:2007) return output if self.return_generator else list(output)

File d:\Application\miniforge3\envs\gee\Lib\site-packages\joblib\parallel.py:1650, in Parallel._get_outputs(self, iterator, pre_dispatch)
   [1647](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1647)     yield
   [1649](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1649)     with self._backend.retrieval_context():
-> [1650](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1650)         yield from self._retrieve()
   [1652](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1652) except GeneratorExit:
   [1653](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1653)     # The generator has been garbage collected before being fully
   [1654](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1654)     # consumed. This aborts the remaining tasks if possible and warn
   [1655](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1655)     # the user if necessary.
   [1656](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1656)     self._exception = True

File d:\Application\miniforge3\envs\gee\Lib\site-packages\joblib\parallel.py:1754, in Parallel._retrieve(self)
   [1747](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1747) while self._wait_retrieval():
   [1748](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1748) 
   [1749](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1749)     # If the callback thread of a worker has signaled that its task
   [1750](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1750)     # triggered an exception, or if the retrieval loop has raised an
   [1751](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1751)     # exception (e.g. `GeneratorExit`), exit the loop and surface the
   [1752](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1752)     # worker traceback.
   [1753](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1753)     if self._aborting:
-> [1754](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1754)         self._raise_error_fast()
   [1755](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1755)         break
   [1757](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1757)     # If the next job is not ready for retrieval yet, we just wait for
   [1758](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1758)     # async callbacks to progress.

File d:\Application\miniforge3\envs\gee\Lib\site-packages\joblib\parallel.py:1789, in Parallel._raise_error_fast(self)
   [1785](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1785) # If this error job exists, immediately raise the error by
   [1786](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1786) # calling get_result. This job might not exists if abort has been
   [1787](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1787) # called directly or if the generator is gc'ed.
   [1788](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1788) if error_job is not None:
-> [1789](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:1789)     error_job.get_result(self.timeout)

File d:\Application\miniforge3\envs\gee\Lib\site-packages\joblib\parallel.py:745, in BatchCompletionCallBack.get_result(self, timeout)
    [739](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:739) backend = self.parallel._backend
    [741](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:741) if backend.supports_retrieve_callback:
    [742](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:742)     # We assume that the result has already been retrieved by the
    [743](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:743)     # callback thread, and is stored internally. It's just waiting to
    [744](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:744)     # be returned.
--> [745](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:745)     return self._return_or_raise()
    [747](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:747) # For other backends, the main thread needs to run the retrieval step.
    [748](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:748) try:

File d:\Application\miniforge3\envs\gee\Lib\site-packages\joblib\parallel.py:763, in BatchCompletionCallBack._return_or_raise(self)
    [761](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:761) try:
    [762](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:762)     if self.status == TASK_ERROR:
--> [763](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:763)         raise self._result
    [764](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:764)     return self._result
    [765](file:///D:/Application/miniforge3/envs/gee/Lib/site-packages/joblib/parallel.py:765) finally:

BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
@lwq-star lwq-star added the bug Something isn't working label Oct 2, 2024
@giswqs
Copy link
Member

giswqs commented Oct 3, 2024

This is probably an issue specific to Windows. Try it on Google Colab to see if you encounter the same issue.

@lwq-star
Copy link
Author

There is another problem with running the code in Google Colab.

'''

_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/joblib/externals/loky/process_executor.py", line 463, in _process_worker
r = call_item()
File "/usr/local/lib/python3.10/dist-packages/joblib/externals/loky/process_executor.py", line 291, in call
return self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.10/dist-packages/joblib/parallel.py", line 598, in call
return [func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/joblib/parallel.py", line 598, in
return [func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/geemap/common.py", line 12671, in download_data
ee_initialize(opt_url="https://earthengine-highvolume.googleapis.com/")
File "/usr/local/lib/python3.10/dist-packages/geemap/coreutils.py", line 155, in ee_initialize
ee.Initialize(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/ee/_utils.py", line 38, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/ee/init.py", line 181, in Initialize
raise EEException(NO_PROJECT_EXCEPTION) from None
ee.ee_exception.EEException: ee.Initialize: no project found. Call with project= or see http://goo.gle/ee-auth.
"""

The above exception was the direct cause of the following exception:

EEException Traceback (most recent call last)
in <cell line: 4>()
2 fishnet = geemap.fishnet(roi, rows=10, cols=10)
3 out_dir = r"/content/drive/MyDrive/东部沙地"
----> 4 geemap.download_ee_image_tiles_parallel(msavi, fishnet, out_dir=out_dir, prefix="msavi", scale=10, crs='EPSG:4326')

6 frames
/usr/local/lib/python3.10/dist-packages/joblib/parallel.py in _return_or_raise(self)
761 try:
762 if self.status == TASK_ERROR:
--> 763 raise self._result
764 return self._result
765 finally:

EEException: ee.Initialize: no project found. Call with project= or see http://goo.gle/ee-auth.`

'''

Below is my test code
https://colab.research.google.com/drive/1RZEto7tEZyGGMaNJtEHpOCrtDASnW5Ji?usp=sharing
You can modify it, and how can I download large images faster?

@giswqs
Copy link
Member

giswqs commented Oct 17, 2024

This issue has been resolved in #2158. Run geemap.update_package() and restart the kernel. Specify your cloud project ID when using the function.

geemap.download_ee_image_tiles_parallel(ee_init=True, project_id="YOUR-PROJECT-ID")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants