Skip to content

Add a by attribute to data_tabulate() objects when by is present#690

Open
elinw wants to merge 11 commits into
easystats:mainfrom
elinw:by
Open

Add a by attribute to data_tabulate() objects when by is present#690
elinw wants to merge 11 commits into
easystats:mainfrom
elinw:by

Conversation

@elinw

@elinw elinw commented Jun 12, 2026

Copy link
Copy Markdown

I wasn't sure if you would want an additional test; all the current tests are still passing.
The attributes aren't listing in the documentation so I didn't change anything there.

Closes #688

@etiennebacher etiennebacher left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yes please add at least one test for this.

Comment thread R/data_tabulate.R Outdated
class(out) <- c("datawizard_tables", "list")
} else {
class(out) <- c("datawizard_crosstabs", "list")
attr(out, "by") <- gsub('\\"', "", by_name, fixed = TRUE)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the gsub() necessary?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in this case (but not the other) the name returns e.g. "/"Island"/". I'm not sure why or if there is a way to get the name without the escapes. I tried some ideas but there was nothing obvious to me.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning when a quoted variable name is used for by.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, should this also remove ' if one passes 'Island' in by?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think \\ is necessary if fixed = TRUE is used

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually I stupidly followed the advice from the tester and added fixed = TRUE but that was wrong, this is not fixed it just looks like fixed. So the next push will put that back.

@elinw elinw Jun 15, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's not a bad instinct but it needs the escaping to work for whatever reason. But in the update I'm about to send you will see a change.
I just tried it again and it makes the tests fail to use fixed=TRUE

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but I don't see a case where using the backslash changes something:

x <- insight::safe_deparse("foo")
x
#> [1] "\"foo\""

gsub('"', "", x)
#> [1] "foo"
gsub('"', "", x, fixed = TRUE)
#> [1] "foo"
gsub('\\"', "", x)
#> [1] "foo"

Do you have an example where this affects the output?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes fixed.

Comment thread R/data_tabulate.R Outdated
class(out) <- c("datawizard_tables", "list")
} else {
class(out) <- c("datawizard_crosstabs", "list")
attr(out, "by") <- gsub('\\"', "", by_name, fixed = TRUE)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

Comment thread NEWS.md Outdated
@etiennebacher etiennebacher changed the title Add a by attribute to data_tabulate objects when a by is present Add a by attribute to 1data_tabulate() objects when by` is present Jun 14, 2026
@etiennebacher etiennebacher changed the title Add a by attribute to 1data_tabulate() objects when by` is present Add a by attribute to data_tabulate() objects when by is present Jun 14, 2026
@elinw

elinw commented Jun 15, 2026

Copy link
Copy Markdown
Author

Also the linter wanted me to make some changes and then not make them (when I made them it told me to change them back) unless I misunderstood.

@elinw elinw requested a review from etiennebacher June 15, 2026 19:30
Comment thread R/data_tabulate.R
if (is.null(by)) {
class(out) <- c("datawizard_tables", "list")
} else {
out <- lapply(out, structure, by = by_name)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this attribute needs to be added to each element in out. Below we add the attribute collapse to out only, why can't we do that for by since it has the same value for all elements in out anyway?

@elinw elinw Jun 15, 2026

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is what took me a long time to figure out. "by" is really a property of individual tables not of the list. This should be consistent whether you get the table through the data.frame method or through the default method or if select has one or multiple columns. Also it should be the same for grouped data frames.

It took me a while to see this, which was why I would get the tests to work on default and then they would fail or data.frame or with multiple columns.
As a practical matter, if you pull out one table from the list you should get the "by" along with it.

Comment thread tests/testthat/test-data_tabulate.R Outdated

test_that("data_tabulate data.frame by", {
data(efc, package = "datawizard")
x <- data_tabulate.data.frame(efc, "c172code", by = "e16sex")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
x <- data_tabulate.data.frame(efc, "c172code", by = "e16sex")
x <- data_tabulate(efc, "c172code", by = "e16sex")

data_tabulate() is an S3 generic so it should be dispatched automatically to data_tabulate.data.frame() when the input is a data.frame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add by attribute to cross_tab and cross_tabs objects

2 participants