Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ deps =
# Pandas functionality we use requires lxml, but it's not yet available as optional extras for 1.5.2
oldestdeps: lxml
# Oldest lsdb is not compatible with the versions above, we skip lsdb notebooks for oldestdeps job
# oldestdeps: lsdb==0.6.6
# oldestdeps: lsdb==0.8.1

# Ugly workaround for the custom install_command to ensure the the arguments are properly passed into pip
!oldestdeps: pip
Expand Down
3 changes: 2 additions & 1 deletion tutorial_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ pyarrow>=10.0.1
hpgeom
pandas[xml]>=1.5.2
dask[distributed]
lsdb>=0.6.6,<0.8
# lsdb<0.8 returns wrong results for some catalogs, e.g. ZTF DR24 Lightcurves
lsdb>=0.8.1
psutil
ray
s3fs
Expand Down
18 changes: 13 additions & 5 deletions tutorials/techniques-and-tools/irsa-hats-with-lsdb.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ We will use lsdb to leverage HATS partitioning for performing fast spatial queri

```{code-cell} ipython3
# Uncomment the next line to install dependencies if needed.
# !pip install s3fs "lsdb>=0.6.6,<0.8" pyarrow pandas numpy astropy dask matplotlib
# !pip install s3fs "lsdb>=0.8.1" pyarrow pandas numpy astropy dask matplotlib
```

```{code-cell} ipython3
Expand Down Expand Up @@ -392,9 +392,10 @@ Since ZTF objects are defined per band, setting `n_neighbors=1` means this is on
```{code-cell} ipython3
euclid_x_ztf = euclid_cone.crossmatch(
ztf_cone,
suffixes=("_euclid", "_ztf"), # to distinguish columns from the two catalogs
n_neighbors=1, # default is 1 too, can be tweaked
radius_arcsec=1 # default is 1 arcsec too, can be tweaked
radius_arcsec=1, # default is 1 arcsec too, can be tweaked
suffixes=("_euclid", "_ztf"), # to distinguish columns from the two catalogs
suffix_method="all_columns",
)
euclid_x_ztf
```
Expand Down Expand Up @@ -562,10 +563,17 @@ ztf_lcs
```

As earlier, this creates a lazy catalog object with the partition(s) that contains our IDs.
We can load the light curves data into a DataFrame by using the `compute()` method:
We can load the light curves data into a DataFrame by using the `compute()` method.
Note: You may see a memory warning from lsdb which is expected due to the large size of Lightcurve data.

```{code-cell} ipython3
ztf_lcs_df = ztf_lcs.compute() # ID search runs out of memory if we try to parallelize it with Dask client
with Client(n_workers=get_nworkers(ztf_lcs),
threads_per_worker=1,
memory_limit=None # to prevent it from running out of memory
) as client:
print(f"This may take more than a few minutes to complete. You can monitor progress in Dask dashboard at {client.dashboard_link}")
ztf_lcs_df = ztf_lcs.compute()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing circleci to run out of memory and fail. I'll try running it when I'm back at a computer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAM usage seems to peak a few times in the latest run, but I have never seen this dask traceback before, so it's a bit different right now.


ztf_lcs_df
```

Expand Down
Loading