any_near0()
.NA_real
to FBM type integer on new
Macs.big_randomSVD()
and big_crossprodSelf()
(#52).backingfile
to
big_crossprodSelf()
and big_cor()
(#170).big_univLogReg()
(#137).ind.col
in
big_prodMat()
(#154).FBM.dir
(that defaults to
tempdir()
as before). This can be used to change the
default directory used to create FBMs when calling either
FBM()
, FBM.code256()
, as_FBM()
,
big_copy()
, or big_transpose()
. Note that, if
not using the temporary directory anymore, you must clean up the files
you do not want to keep.ARMA_64BIT_WORD
.$add_columns()
.as_scaling_fun()
to create
your own fun.scaling
parameters.pcor()
(with a warning).pcor()
now returns NAs (instead of 0s) for singular
systems.big_prodVec()
, big_cprodVec()
,
big_colstats()
and big_univLinReg()
have been
recoded.pcor()
for singular systems, e.g. when
x
has all the same values.summary()
and plot()
for old (<
v1.3) big_sp_list
models.pcor()
to compute partial
correlations.Add two options in big_spLinReg()
and
big_spLogReg()
; power_scale
for using a
different scaling for LASSO and power_adaptive
for using
adaptive LASSO (where larger marginal effects are penalized less). See
documentation for details.
big_(c)prodVec()
and big_(c)prodMat()
(re)gain a ncores
parameter. Note that for
big_(c)prodMat()
, it might be beneficial to use the BLAS
parallelism (with bigparallelr::set_blas_ncores()
) instead
of this parameter, especially when the matrix A
is
large-ish.
big_colstats()
can now be run in parallel
(added parameter ncores
).Functions big_(c)prodMat()
and
big_(t)crossprodSelf()
now use much less memory, and may be
faster.
Add covar_from_df()
to convert a data frame with
factors/characters to a numeric matrix using one-hot encoding.
Add a new column $all_conv
to output of
summary()
for big_spLinReg()
and
big_spLogReg()
to check whether all models have stopped
because of “no more improvement”. Also add a new parameter
sort
to summary()
.
Now warn
(enabled by default) if some models may not
have reached a minimum when using big_spLinReg()
and
big_spLogReg()
.
In .self$nrow * .self$ncol : NAs produced by integer overflow
.Make two different memory-mappings: one that is read-only (using
$address
) and one where it is possible to write (using
$address_rw
). This enables to use file permissions to
prevent modifying data.
Also add a new field $is_read_only
to be used to
prevent modifying data (at least with <-
) even when you
have write permissions to it. Functions creating an FBM now gain a
parameter is_read_only
.
Make vector accessors (e.g. X[1:10]
)
faster.
Move some code to new packages {bigassertr} and {bigparallelr}.
big_randomSVD()
gains arguments related to
matrix-vector multiplication.
assert_noNA()
is faster.
big_increment()
.In plot.big_SVD()
,
Can now plot many PCA scores (more than two) at once.
Use coord_fixed()
when plotting PCA scores because
it is good practice.
Use log-scale in scree plot to better see small differences in singular values.
Reexport cowplot::plot_grid()
to merge multiple
ggplots.
AUCBoot()
is now 6-7 times faster.center
and scale
to
products.big_univLogReg()
for variables with no
variation. IRLS was not converging, so glm()
was used
instead. The problem is that glm()
drops dimensions causing
singularities so that Z-score of the first covariate (or intercept) was
used instead of a missing value.Use mio instead of boost for memory-mapping.
Add a parameter base.row
to
predict.big_sp_list()
and automatically detect if needed
(as well as for covar.row
).
Possibility to subset a big_sp_list
without losing
attributes, so that one can access one model (corresponding to one
alpha) even if it is not the ‘best’.
Add parameters pf.X
and pf.covar
in
big_sp***Reg()
to provide different penalization for each
variable (possibly no penalization at all).
Add %*%
, crossprod
and
tcrossprod
operations for ‘double’ FBMs.
Now also returns the number of non-zero variables
($nb_active
) and the number of candidate variables
($nb_candidate
) for each step of the regularization paths
of big_spLinReg()
and big_spLogReg()
.
warn
and return.all
of
big_spLinReg()
and big_spLogReg()
are
deprecated; now always return the maximum information. Now provide two
methods (summary
and plot
) to get a quick
assessment of the fitted models.Check of missing values for input vectors (indices and targets) and matrices (covariables).
AUC()
is now stricter: it accepts only 0s and 1s for
target
.
$bm()
and $bm.desc()
have been added in
order to get an FBM
as a
filebacked.big.matrix
. This enables using {bigmemory}
functions.float
added.big_write
added.big_read
now has a filter
argument to
filter rows, and argument nrow
has been removed because it
is now determined when reading the first block of data.
Removed the save
argument from FBM
(and
others); now, you must use FBM(...)$save()
instead of
FBM(..., save = TRUE)
.
You can now fill an FBM using a data frame. Note that factors will be used as integers.
Package
{bigreadr} has been developed and is now used by
big_read
.
options(bigstatsr.downcast.warning = FALSE)
, or you can use
without_downcast_warning()
to disable this warning for one
call.big_read
so that it is faster (corresponding
vignette updated).possibility to add a “base predictor” for
big_spLinReg
and big_spLogReg
.
don’t store the whole regularization path (as a sparse
matrix) in big_spLinReg
and big_spLogReg
anymore because it caused major slowdowns.
directly average the K predictions in
predict.big_sp_best_list
.
only use the “PSOCK” type of cluster because “FORK” can leave
zombies behind. You can change this with
options(bigstatsr.cluster.type = "PSOCK")
.
Fix a bug in big_spLinReg
related to the computation
of summaries.
Now provides function plus
to be used as the
combine
argument in big_apply
and
big_parallelize
instead of '+'
.
options(bigstatsr.cluster.type = "PSOCK")
. Uses “PSOCK” in
0.4.0.big_spLinReg
and big_spLogReg
. One will be
chosen by grid-search.big_prodMat
when using a dimension of 1
or 0.big_crossprod
,
big_tcrossprod
, big_SVD
and
big_randomSVD
(before, there was no default at all)Integrate Cross-Model Selection and Averaging (CMSA)
directly in big_spLinReg
and big_spLogReg
, a
procedure that automatically chooses the value of the \(\lambda\)
hyper-parameter.
Speed up big_spLinReg
and
big_spLogReg
(issue
#12)
big.matrix
format of package
bigmemory