Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion man/assign.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ Since \code{[.data.table} incurs overhead to check the existence and type of arg
\code{DT[a > 4, b := c]} is different from \code{DT[a > 4][, b := c]}. The first expression updates (or adds) column \code{b} with the value \code{c} on those rows where \code{a > 4} evaluates to \code{TRUE}. \code{X} is updated \emph{by reference}, therefore no assignment needed. Note that this does not apply when \code{i} is missing, i.e. \code{DT[]}.

The second expression on the other hand updates a \emph{new} \code{data.table} that's returned by the subset operation. Since the subsetted data.table is ephemeral (it is not assigned to a symbol), the result would be lost; unless the result is assigned, for example, as follows: \code{ans <- DT[a > 4][, b := c]}.
Note that \samp{:=} modifications are cumulative. When reusing a \code{data.table} in loops or multi-level tests, use \code{\link{copy}} to ensure a fresh state.
}
\value{
\code{DT} is modified by reference and returned invisibly. If you require a copy, take a \code{\link{copy}} first (using \code{DT2 = copy(DT)}).
Expand Down Expand Up @@ -180,4 +181,3 @@ system.time(for (i in 1:1000) set(DT, i, 1L, i))

}
\keyword{ data }

2 changes: 1 addition & 1 deletion man/test.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ test(num, x, y = TRUE,
\item{env}{ A named list of environment variables to set for the duration of the test, much like \code{options}. A list entry set to \code{NULL} will unset (i.e., \code{\link{Sys.unsetenv}}) the corresponding variable. }
\item{context}{ String, default \code{NULL}. Used to provide context where this is useful, e.g. in a test run in a loop where we can't just search for the test number. }
\item{requires_utf8}{ \code{FALSE} (default), \code{TRUE}, or a character string. When set, the test is skipped if UTF-8 characters cannot be represented in the native encoding. Use \code{TRUE} for default UTF-8 test characters or provide a custom string of test characters. }
\item{optimize}{ A vector of different optimization levels to test. The code in \code{x} will be run once for each optimization level, with \code{options(datatable.optimize=optimize)} set accordingly. All optimization levels must pass the test for the overall test to pass. If no \code{y} is supplied, the results from the different levels are compared to each other for equality. If a \code{y} is supplied, the results from each level are compared to \code{y}. }
\item{optimize}{ A vector of different optimization levels to test. The code in \code{x} will be run once for each optimization level, with \code{options(datatable.optimize=optimize)} set accordingly. All optimization levels must pass the test for the overall test to pass. If no \code{y} is supplied, the results from the different levels are compared to each other for equality. If a \code{y} is supplied, the results from each level are compared to \code{y}. Note that since \code{x} is evaluated multiple times, side effects like \code{:=} are cumulative; use \code{\link{copy}} if a fresh state is required for each level. }
}
\note{
\code{NA_real_} and \code{NaN} are treated as equal, use \code{identical} if distinction is needed. See examples below.
Expand Down
16 changes: 16 additions & 0 deletions vignettes/datatable-reference-semantics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,22 @@ To select a single column as a vector, remember:
- `DT[, mycol]` is safer as it always returns a new, independent copy.
- `DT$mycol` is fast but may return a reference. Use `copy(DT$mycol)` to guarantee independence.

### d) Side effects and testing

Because `:=` modifies by reference, changes are cumulative. If the same *data.table* is reused—for example, in a loop or when using `test()` with multiple `optimization` levels—subsequent runs will start with the modified table from the previous run. Use `copy()` to ensure each run starts with the same data.

```{r}
DT = data.table(a = 1L)
# Subsequent runs are cumulative
DT[, a := a + 1L][]
DT[, a := a + 1L][]

# Use copy() to isolate runs
test_expr = function(x) copy(x)[, a := a + 1L][]
test_expr(DT)
test_expr(DT)
```

## Summary

#### The `:=` operator
Expand Down
Loading