Feat transpose multiindex#90
Conversation
Documentation build overview
81 files changed ·
|
75ede5a to
e2fd7e8
Compare
|
Thanks for the PRs! I don't think I'll be able to review them this week but hopefully the next one. I am also happy to have users contributing as it has been a while since I used it regularly so I am maintaining and making releases but it is a bit frozen. |
|
No problem, I am currently integrating this package into my project and had to make a fork with several adjustments to make it work for my use case, but I'd prefer to be in sync with upsteam :) There is no hurry for me, just thought it would be nice to contribute. |
OriolAbril
left a comment
There was a problem hiding this comment.
Sorry for the extreme slowness. I added a comment on dim->coord pairing behaviour and suggested an extension to the test but all the other fixes should definitely go in without extra changes.
| if dims is None: | ||
| dims = _attempt_default_dims("matrix_transpose", da.dims) | ||
| dim1, dim2 = dims | ||
| return da.swap_dims({dim1: dim2, dim2: dim1}).transpose(..., *dims) |
There was a problem hiding this comment.
I do think that the function should work even when the dimensions to transpose have a multiindex, but I am not sure about the handling of coords. I would be very interested in having your feedback and usecases. I had only considered the possibility of using this function for square matrices where there is basically a repeated dimension so I had dim+dim2 or xyz+xyz_bis with same length and same coordinate values.
The function does already work on non multiindex but different length dimensions, and had I realized before, my intuition would have been to remove the coordinate values from the output in such cases. This is what sort does.
What do/would you use this changing of dim->coords pairings for?
For example, if I add coordiate values to one of the example datasets:
matrices = tutorial.generate_matrices_dataarray()
matrices = matrices.assign_coords({name: [f"{name}{i}" for i in range(matrices.sizes[name])] for name in matrices.dims})<xarray.DataArray (batch: 10, experiment: 3, dim: 4, dim2: 4)> Size: 4kB
0.4345 2.96 0.2627 0.4519 0.5507 0.1642 ... 0.2144 0.9432 0.5874 0.5385 2.987
Coordinates:
* batch (batch) <U6 240B 'batch0' 'batch1' ... 'batch8' 'batch9'
* experiment (experiment) <U11 132B 'experiment0' 'experiment1' 'experiment2'
* dim (dim) <U4 64B 'dim0' 'dim1' 'dim2' 'dim3'
* dim2 (dim2) <U5 80B 'dim20' 'dim21' 'dim22' 'dim23'
and after matrix_transpose (no multiindex, I don't think it is relevant in this particular case):
matrix_transpose(matrices, dims=["batch", "experiment"])<xarray.DataArray (dim: 4, dim2: 4, batch: 3, experiment: 10)> Size: 4kB
0.4345 0.4027 3.489 0.3505 0.2236 0.744 ... 0.1844 3.124 0.4926 0.255 2.987
Coordinates:
* dim (dim) <U4 64B 'dim0' 'dim1' 'dim2' 'dim3'
* dim2 (dim2) <U5 80B 'dim20' 'dim21' 'dim22' 'dim23'
* batch (batch) <U11 132B 'experiment0' 'experiment1' 'experiment2'
* experiment (experiment) <U6 240B 'batch0' 'batch1' ... 'batch8' 'batch9'
Having the batch dimension now hold the experiment coordinates seems strange, independently of them having a multiindex. I was also curious about the constraint of both multiindexes having the same levels, I guess this is so the levels can also be renamed, any thoughts on swapping only the top multiindex name and not the levels?
|
|
||
| def test_transpose_multiindex(self, matrices): | ||
| stacked = matrices.stack(batch_experiment=("batch", "experiment"), dim_dim2=("dim", "dim2")) | ||
| matrix_transpose(stacked, dims=("batch_experiment", "dim_dim2")) |
There was a problem hiding this comment.
| matrix_transpose(stacked, dims=("batch_experiment", "dim_dim2")) | |
| out = matrix_transpose(stacked, dims=("batch_experiment", "dim_dim2")) | |
| assert out.sizes["batch_experiment"] == stacked.sizes["dim_dim2"] | |
| assert stacked.sizes["batch_experiment"] == out.sizes["dim_dim2"] |
I would add some assertions in addition to checking this works.
Thank you for the amazing package! I was happy to see the linalg operations magically working xarray style with minimal effort, even with stacked dimensions! Unfortunately that was not the case for
matrix_transpose, so here is a fix that makes it not error out when the dimensions being swapped both use aMultiIndex.Additionally, the method was the only one, where
dimswere acutally not optional, so I fixed this.Also fixed a small annoyance where
pinvandmatmulwere not members of the accessor.