diff --git a/man/assign.Rd b/man/assign.Rd index 91c988cd8..ed5ed8be1 100644 --- a/man/assign.Rd +++ b/man/assign.Rd @@ -92,6 +92,7 @@ Since \code{[.data.table} incurs overhead to check the existence and type of arg \code{DT[a > 4, b := c]} is different from \code{DT[a > 4][, b := c]}. The first expression updates (or adds) column \code{b} with the value \code{c} on those rows where \code{a > 4} evaluates to \code{TRUE}. \code{X} is updated \emph{by reference}, therefore no assignment needed. Note that this does not apply when \code{i} is missing, i.e. \code{DT[]}. The second expression on the other hand updates a \emph{new} \code{data.table} that's returned by the subset operation. Since the subsetted data.table is ephemeral (it is not assigned to a symbol), the result would be lost; unless the result is assigned, for example, as follows: \code{ans <- DT[a > 4][, b := c]}. + Note that \samp{:=} modifications are cumulative. When reusing a \code{data.table} in loops or multi-level tests, use \code{\link{copy}} to ensure a fresh state. } \value{ \code{DT} is modified by reference and returned invisibly. If you require a copy, take a \code{\link{copy}} first (using \code{DT2 = copy(DT)}). @@ -180,4 +181,3 @@ system.time(for (i in 1:1000) set(DT, i, 1L, i)) } \keyword{ data } - diff --git a/man/test.Rd b/man/test.Rd index 651ef1d35..fea7309be 100644 --- a/man/test.Rd +++ b/man/test.Rd @@ -25,7 +25,7 @@ test(num, x, y = TRUE, \item{env}{ A named list of environment variables to set for the duration of the test, much like \code{options}. A list entry set to \code{NULL} will unset (i.e., \code{\link{Sys.unsetenv}}) the corresponding variable. } \item{context}{ String, default \code{NULL}. Used to provide context where this is useful, e.g. in a test run in a loop where we can't just search for the test number. } \item{requires_utf8}{ \code{FALSE} (default), \code{TRUE}, or a character string. When set, the test is skipped if UTF-8 characters cannot be represented in the native encoding. Use \code{TRUE} for default UTF-8 test characters or provide a custom string of test characters. } -\item{optimize}{ A vector of different optimization levels to test. The code in \code{x} will be run once for each optimization level, with \code{options(datatable.optimize=optimize)} set accordingly. All optimization levels must pass the test for the overall test to pass. If no \code{y} is supplied, the results from the different levels are compared to each other for equality. If a \code{y} is supplied, the results from each level are compared to \code{y}. } +\item{optimize}{ A vector of different optimization levels to test. The code in \code{x} will be run once for each optimization level, with \code{options(datatable.optimize=optimize)} set accordingly. All optimization levels must pass the test for the overall test to pass. If no \code{y} is supplied, the results from the different levels are compared to each other for equality. If a \code{y} is supplied, the results from each level are compared to \code{y}. Note that since \code{x} is evaluated multiple times, side effects like \code{:=} are cumulative; use \code{\link{copy}} if a fresh state is required for each level. } } \note{ \code{NA_real_} and \code{NaN} are treated as equal, use \code{identical} if distinction is needed. See examples below. diff --git a/vignettes/datatable-reference-semantics.Rmd b/vignettes/datatable-reference-semantics.Rmd index 5353440e7..c6d6b3c0e 100644 --- a/vignettes/datatable-reference-semantics.Rmd +++ b/vignettes/datatable-reference-semantics.Rmd @@ -388,6 +388,22 @@ To select a single column as a vector, remember: - `DT[, mycol]` is safer as it always returns a new, independent copy. - `DT$mycol` is fast but may return a reference. Use `copy(DT$mycol)` to guarantee independence. +### d) Side effects and testing + +Because `:=` modifies by reference, changes are cumulative. If the same *data.table* is reused—for example, in a loop or when using `test()` with multiple `optimization` levels—subsequent runs will start with the modified table from the previous run. Use `copy()` to ensure each run starts with the same data. + +```{r} +DT = data.table(a = 1L) +# Subsequent runs are cumulative +DT[, a := a + 1L][] +DT[, a := a + 1L][] + +# Use copy() to isolate runs +test_expr = function(x) copy(x)[, a := a + 1L][] +test_expr(DT) +test_expr(DT) +``` + ## Summary #### The `:=` operator