readRawHeader()
for TabularTextFile
would produce an obscure “Error in if (!isEmpty) { : argument is of length zero” if the file is empty. Now it detects when the file is empty and gives a more informative error message.dsApplyInPairs()
is defunct. Use future.apply::future_mapply()
instead.extract()
from the R.rsp package.dsApply()
, which has been defunct since version 2.13.0 (April 2019). Use future.apply::future_lapply()
instead.dsApplyInPairs()
is deprecated. Use future.apply::future_mapply()
instead.
dsApply()
is defunct. Use future.apply::future_lapply()
instead.
Removed defunct and hidden argument colClassPatterns
of readDataFrame()
for TabularTextFile
. Use argument colClasses
instead.
Removed defunct and hidden argument files
of extractMatrix()
for GenericTabularFileSet
. Use extractMatrix(ds[files], ...)
instead.
loadToEnv()
for RDataFile
was not declared an S3 method.Package requires R (>= 3.2.0) released April 2015.
Package no longer requires Bioconductor.
dsApply()
is now deprecated. Instead, use future.apply::future_lapply()
.
dsApply()
with .parallel = "BiocParallel::BatchJobs"
and "BatchJobs"
are now defunct. Instead, use future.apply::future_lapply()
with one of the many backends that implements the Future API.
Now getChecksum()
for ChecksumFile
defaults to not creating a checksum file (which is the default for other types of file), but instead always return the checksum of the file by only calculating and in memory. This prevents for instance the equals()
test on two different checksum files to generate another set of checksum files on themselves.
Now findByName()
for GenericDataFileSet
reports on the non-existing root paths in error messages.
GenericDataFile
and GenericDataFileSet
no longer report on memory (RAM) usage of objects.
dsApply(..., .parallel = "future")
now used future_lapply()
of the future package internally. dsApply()
will soon be deprecated (see below).Argument colClassPatterns
of readDataFrame()
for TabularTextFile
is now defunct. Use colClasses
instead.
Argument files
of extractMatrix()
for GenericTabularFileSet
is defunct.
dsApply()
with either .parallel = "BiocParallel::BatchJobs"
or "BatchJobs"
is deprecated. Instead, use future::future_lapply()
with whatever choice of future::plan()
preferred.
Defunct argument aliased
of getDefaultFullName()
for GenericDataFile
and defunct argument alias
of GenericDataFileSet()
have been removed.
Now file sizes are reported using IEC binary prefixes, i.e. bytes, KiB, MiB, GiB, TiB, …, YiB.
Added hasChecksumFile()
for GenericDataFile
.
hasBeenModified()
for GenericDataFile
gained argument update
.
na.omit()
for GenericDataFileSet
; the default one in the stats package works equally well.Arguments$getTags()
failed to drop missing values.
equals(df, other)
for GenericDataFile
would give an error if other
was not a GenericDataFile
.
dropTags()
would drop name if a tag had the same name.
getOneFile()
on a GenericDataFileSet
with a single missing file would give an error, now it gives a file with an NA pathname.
Preparing to make the default pathname for GenericDataFile()
to become NA_character_
. It is currently NULL, but the goal is to enforce length(pathname)
to be one.
extractMatrix(ds, files, ...)
for GenericTabularFileSet
is deprecated. Use extractMatrix(ds[files], ...)
instead.
dsApply(..., .parallel = "future")
, which utilizes the future package.Added support for sortBy(..., by = "mixedroman")
for GenericDataFileSet
.
Now commentChar = ""
and commentChar = FALSE
also disables searching for comment characters (just as commentChar = NULL
) for TabularTextFile
.
readDataFrame()
for TabularTextFile
with column-names translators set, could give an error “Number of read data columns does not match the number of column headers: …”. This was due to an update in utils::read.table()
as of R v3.2.1 svn rev 68831.lapply()
, dsApply()
returns a list with names corresponds to the full names of the data set.getFullNames(..., onRemapping)
to GenericDataFileSet
to warn/err on full-name translations that generates inconsistent fullname-to-index maps before and after.linkTo(..., skip = TRUE)
would give error “No permission to modify existing file: …” also in the case when the proper link already exists and there is no need to create a new one.
Now getReadArguments()
for TabularTextFile
let duplicated named colClasses
entries override earlier ones, e.g. colClasses=c("*" = NA, "*" = "NULL", a = "integer")
is effectively the same as colClasses=c("*" = "NULL", a = "integer")
. Added package test.
nchar(..., type = "chars")
is used internally for all file and directory names (including tags).as.character()
for GenericDataFile
with a missing (NA) pathname on recent R-devel (>= 2015-04-23) related to an update on how nchar()
handles missing values.Now [[
for GenericDataFileSet
returns a GenericDataFile
not only if a numeric index is given but also if a character string is given.
Now argument idx
for getFile()
for GenericDataFileSet
can also be a character string, in which case the file returned is identified using indexOf(..., pattern = idx, by = "exact", onMissing = "error")
.
Added RDataFile
and RDataFileSet
classes for *.RData
files.
requireNamespace()
instead of require()
internally.as.character()
for ChecksumFile
gave an error when the checksum files was missing.Added support for sortBy(..., by = "filesize")
and sortBy(..., decreasing = TRUE)
for GenericDataFileSet
.
Added rep()
for GenericDataFileSet
.
NOTES:
readDataFrame()
would ignore argument colClasses
iff it had no names. Added package system test for this case.commentChar = NULL
for TabularTextFile
:s failed.readChecksums()
for ChecksumFileSet
.byPath()
for GenericDataFileSet
would output verbose message enumerating files loaded to stdout instead of stderr.dsApply()
for GenericDataFileSet
would coerce argument verbose
to logical before applying the function.sep
for readDataFrame()
would only work for ,
and \t
; now it works for any separator.Now indexOf()
first searched by exact names, then as before, i.e. by regular expression and fixed pattern matching. Added package system tests that contains particularly complicated cases for this. This was triggered by a rare but real use case causing an error in aroma.affymetrix. Thanks Benilton Carvalho for reporting on this.
Added argument by
to indexOf()
for GenericDataFileSet
|List.
Added SuggestsNote
field to DESCRIPTION with list of packages that are recommended for the most common use cases.
Bumped package dependencies.
ds[[idx]]
instead of getFile(ds, idx)
where possible.dsApply(..., .parallel = "none")
would lower the verbose threshold before applying the function resulting is less verbose output in the non-parallel case.GenericDataFile
would fail with linkTo()
on Windows systems without necessary privileges. Made the test less conservative. Also, added an Rd section on privileges required on Windows for linkTo()
to work. Thanks to Brian Ripley for reporting on this.NOTES:
readColumns()
for TabularTextFile
handles also header-less files.copyTo()
for GenericDataFileSet
no longer passes ...
to byPath()
when constructing the return data set.renameTo()
passes ...
to R.utils::renameFile()
making it possible to also overwrite existing files.Added is.na()
for GenericDataFile
and GenericDataFileSet
and na.omit()
for the latter, which already supports anyNA()
.
Added linkTo()
for GenericDataFile
, which create a symbolic link at a given destination pathname analogously to how copyTo()
creates a file copy at a given destination pathname.
copyTo()
for GenericDataFile
passes ...
to R.utils::copyFile()
.copyTo()
and renameTo()
for GenericDataFile
had verbose output enabled by default.digest2()
is now defunct.Added duplicated()
, anyDuplicated()
, and unique()
for GenericDataSet
, which all compare GenericDataFile
:s using the equals()
method.
Now c()
for GenericDataFileSet
also works to append GenericDataFile
:s. Added package system test for common use cases of c()
.
Added nbrOfColumns()
for GenericTabularFile
, which, if the number of columns cannot be inferred from the column names, will fall back to read the first row of data and use that as the number of columns.
Now nbrOfColumns()
for ColumnNamesInterface
returns NA if column names cannot be inferred and hence not be counted.
Now readDataFrame(..., header = FALSE)
works as expected for tabular text files without headers.
Now getReadArguments()
for TabularTextFile
returns a colClasses
vector of the correct length also in the case when there are no column names.
loadRDS()
available for plain files and RdsFile
:s.RdsFile
and RdsFileSet
objects for handling *.rds
file sets.GenericSummary
.ChecksumFile
and ChecksumFileSet
.extract()
for GenericDataFileSet
also handles when the data set to be extracted is empty, e.g. extract(GenericDataFileSet(), NA_integer_)
. Also, added support for argument onMissing = "dropall"
, which drops all files if one or more missing files where requested. Added package system tests for these case.GenericDataFileSet$byPath(..., recursive = TRUE)
would be very slow setting up the individual files, especially for large data sets. Now it’s only slow for the first file.Added "[["(x, i)
for GenericDataFileSet
, which gets a GenericDataFile
by index i
in [1,length(x)]
. When i
is non-numeric, the next "[["(x, i)
method in the class hierarchy is used, e.g. the one for Object
:s.
Added gzip()
/gunzip()
for GenericDataFileSet
.
Added anyNA()
to GenericDataFileSet
to test whether any of the pathnames are NA, or not.
getChecksum()
on GenericDataFile
:s and GenericDataFileSet
:s.append()
to become a generic function does now call base::append()
in the default, instead of copy the latter. All this will eventually be removed, when proper support for c
, [
, [[
, etc. has been added everywhere.getChecksum()
from R.cache instead of creating its own. This solves the problem of the default getChecksum()
of R.cache not being found.readDataFrame()
for TabularTextFile
subsets by row, before reparsing numerical columns that were quoted.autoload()
:s used internally.Deprecated digest2()
and deprecated -> defunct -> dropped.
Now GenericDataFileSet()
gives an error informing that argument alias
is defunct.
Now no generic functions are created for defunct methods.
R.filesets
Package
object is also available when the package is only loaded (but not attached).cat()
from R.utils.SPEEDUP: Package no longer uses R.utils::whichVector()
, which use to be 10x faster, but since R 2.11.0 which()
is 3x times again.
Package no longer utilizes import()
, only importFrom()
:s.
WORKAROUND: For now, package attaches the R.oo package. This is needed due to what appears to be a bug in how R.oo finalizes Object
:s assuming R.oo is/can be attached. Until that is resolved, we make sure R.oo is attached.
Forgot to import R.methodsS3::appendVarArgs()
.
[()
and c()
for GenericDataFileSet
.private = FALSE
to byPath()
of GenericDataFileSet
.isGzipped()
ignores the case of the filename extension when testing whether the file is gzipped or not.rm()
calls with NULL assignments.digest2()
, which soon will be deprecated.\usage{}
lines are at most 90 characters long.In addition to a fixed integer, argument skip
for readDataFrame()
(default and for TabularTextFile
) may also specify a regular expression matching the first row of the data section.
Now argument skip
to TabularTextFile
and readDataFrame()
for that class causes the parser to skip that many lines including commented lines, whereas before it did not count commented lines.
Added a default readDataFrame()
for reading data from one or more tabular text files via the TabularTextFile
/TabularTextFileSet
classes.
colClassPatterns
of readDataFrame()
for TabularTextFile
has been renamed to colClasses
.startupMessage()
of R.oo.indexOf()
for GenericDataFileSet
throws an exception if user tries to pass an argument names
.Added head()
and tail()
for GenericTabularFile
.
Added subsetting via [()
to GenericTabularFile
.
nbrOfRows()
for TabularTextFile
forgot to exclude comment rows in the file header.
readColumns()
for GenericTabularFile would not preserve the order of the requested columns
.
Added getOneFile()
for GenericDataFileSet
, which returns the first GenericDataFile
with a non-missing pathname.
Added argument absolute = FALSE
to getPathname()
for GenericDataFile
.
GenericDataFile
stores the absolute pathname of the file, even if a relative pathname is given. This makes sure that the file is found also when the working directory is changed.equals()
for GenericDataFileSet
would only compare the first GenericDataFile
in each set.isGzipped()
to GenericDataFile
.writeColumnsToFiles()
to GenericTabularFile
. Used to be available only for TabularTextFile
.getDefaultColumnNames()
for TabularTextFile
did not use columnNames
if it was set when creating the TabularTextFile
object.
Now getReadArguments()
for TabularTextFile
drops arguments that are NULL, because they could cause errors downstreams, e.g. readDataFrame()
calling read.table(..., colClasses = NULL)
=> rep_len(NULL, x)
=> “Error in rep_len(colClasses, cols) : cannot replicate NULL to a non-zero length”.
as.list()
for GenericDataSet
to return a named list of GenericDataFile
:s (previously it had no names). The names are the (translated) full names of the GenericDataFile
:s.lapply()
and sapply()
for GenericDataSet
, because the corresponding functions in the base package utilizes as.list()
.Now GenericDataFile()
retrieves the file time stamps such that hasBeenModified()
returns a correct value also when first called, and not only TRUE just in case. This has the effect that getChecksum()
will detected cached results already at the second call as long as the file has to been modified. Previously it took two calls to getChecksum()
for it to be properly cached.
Now declaring more internal and temporary Object
fields as “cached”, which means they will be cleared if clearCache()
or gc()
is called on the corresponding object.
Added further verbose output to TabularTextFileSet
.
DOCUMENTATION: Minor corrections to help pages.
NOTES:
TabularTextFile
to ignore header comment arguments when inferring column names and classes.clearCache()
for GenericDataFileSet
relies on ditto of Object
to clear all cached fields (= with field modifier "cached"
).{get,set}Label()
for GenericDataFile
and {get,set}Alias()
for GenericData{File,FileSet}
. Related arguments such at alias
to GenericDataFileSet
and aliased
to getDefaultFullName()
for GenericDataFile
are also deprecated.seq_along(x)
instead of seq(along = x)
everywhere. Similarly, seq(ds)
where ds
is GenericDataFileSet
is now replaced by seq_along(ds)
. Likewise, seq_len(x)
replaces seq(length = x)
, and length(ds)
replaces nbrOfFiles(ds)
.Now TabularTextFile()
tries to infer whether the data section contains column names or not. This is done by comparing to the optional columnNames
header argument. If that is not available, it will (as before) assume there are column names.
Now readDataFrame()
acknowledge header comment arguments columnNames
and columnClasses
if specified in the file.
Now getDefaultColumnNames()
for TabularTextFile
falls back to header comment argument columnNames
, if there are no column names in the actual data table.
Now readRawHeader()
for TabularTextFile
also parses and returns header comment arguments.
ColumnNamesInterface
which GenericTabularFile
now implements. Classes inheriting from GenericTabularFile
should rename any getColumnNames()
method to getDefaultColumnNames()
.whichVector()
with which()
, because the latter is now the fastest again.setColumnNames()
for GenericTabularFile
, which utilizes setColumnNamesTranslator()
.{get,set}ColumnNameTranslator()
in favor of {get,set}ColumnNamesTranslator()
; note the plural form.readDataFrame()
for TabularTextFile
no longer returns attribute fileHeader
, unless argument debug
is TRUE.validate()
to GenericDataFileSet
, which iteratively calls validate()
on all the GenericDataFile
:s in the set. The default is to return NA, indicating that no validation was done.Arguments$getReadablePath()
instead of filePath(..., expandLinks = "any")
.Arguments$getFilename()
below....
to NextMethod()
, cf. R-devel thread ‘Do not pass’…’ to NextMethod() - it’ll do it for you; missing documentation, a bug or just me?’ on Oct 16, 2012.Arguments$getFilename()
from this package to R.utils v1.17.0.fromFiles()
for GenericDataFileSet
is now defunct in place for byName()
, which has been recommended since January
Now readDataFrame()
for TabularTextFile
defaults to read strings as characters rather than as factors. To read strings as factors, just pass argument stringsAsFactors = TRUE
.
Added readDataFrame()
for TabularTextFileSet
.
ROBUSTNESS: Now getHeader()
for TabularTextFile
checks if the file has been modified before returned cached results.
trim()
being overridden by ditto from the IRanges package, iff loaded.extractMatrix()
for GenericTabularFile
adds column names just as ditto for GenericTabularFileSet
does..Internal()
calls.GenericDataFile
and GenericDataFileSet
handle so called “empty” files, which are files with NULL pathnames.getCommentChar()
to TabularTextFile
and argument commentChar
to its constructor. This allows to use custom comment characters other than just "#"
.GenericDataFileSet$byName(..., subdirs)
would throw Error in strsplit(subdirs, split = "/\\")
iff subdirs != NULL
.
Improved the handling of the newly introduced depth
parameter, e.g. by making it optional/backward compatible.
GenericDataFileSet
, such that one can correctly infer fullname and subdirs from the path.named
to getTags()
for FullNameInterface
. If TRUE, tags of format "<name>=<value>"
will be parsed and returned as a named "<value>"
, e.g. "foo,n=23,bar,n=42"
is returned as c("foo", "n"="23", "bar", "n"="42")
.readDataFrame(..., colClasses = ..., trimQuotes = TRUE)
of TabularTextFile
will read numeric columns that are quoted. This is done by first reading them as quoted character strings, dropping the quotes, and then rereading them as numeric values..fileClass
to appendFiles()
for GenericDataFileSet
.ROBUSTNESS: Now appendFiles()
for GenericDataFileSet
asserts that all files to be appended are instances of the file class of this set as given by the static getFileClass()
.
ROBUSTNESS: Added argument .assertSameClass
to appendFiles()
for GenericDataFileSet
, which if TRUE asserts that the files to be appended inherits from the same class as the existing files. Before this test was mandatory.
getChecksum()
to GenericDataFileSet
, which calculates the checksum of the object returned by the protected getChecksumData()
. Use with care, because what objects should be the basis of the checksum is not clear, e.g. should it be only the file system checksum, or should things such as translated fullnames be included as well?equals()
for GenericDataFile
would consider two files not to be equal only if their checksums was equal, and vice verse. Also, when creating the message string explaining why they differ an error would have been thrown.hpaste()
internally wherever applicable.appendFullNameTranslatorBy<what>()
for <character>
and <function>
assert that the translator correctly returns exactly one string. This has the effect that setFullName()
and friends are also tested.Added =
to the list of safe characters for Arguments$getFilename()
.
Added fullname()
, name()
, tags()
, and dropTags()
.
findByName()
for GenericDataFileSet
it would throw “<simpleError in paths[sapply(rootPaths, FUN = isDirectory)]: invalid subscript type ‘list’>” in case no matching root path directories existed.Added dropRootPathTags()
.
GENERALIZATION: Added support to findByName()
for GenericDataFileSet
such that root paths also can be specified by simple regular expression (still via argument paths
). Currently it is only the last subdirectory that can be expanded, e.g. foo/bar/data(,.*)/
.
GENERALIZATION: Now byName()
for GenericDataFileSet
will try all possible data set directories located when trying to setup a data set. Before it only tried the first one located. This new approach is equally fast for the first data set directory as before. The advantage is that it adds further flexibilities, e.g. the first directory may not be what we want but the second, which can be further tested by the byPath()
and downstream methods such as the constructor.
ROBUSTNESS: Now writeColumnsToFiles()
for TabularTextFile
writes files atomically, which should minimize the risk for generating incomplete files.
getTags()
for Arguments
from aroma.core package.fromFiles()
of GenericDataFileSet
has been deprecated, if still called by someone.GENERALIZATION: Now append()
for GenericDataFileSet
tries to also append non-GenericDataFileSet
object by passing them down to appendFiles()
assuming they are GenericDataFile
:s.
GENERALIZATION: Now appendFiles()
for GenericDataFileSet
also accepts a single item. Thus, there is no longer a need to wrap up single items in a list.
ROBUSTNESS: Now GenericDataFileSet$byName()
asserts that arguments name
and tags
contain only valid characters. This will for instance prevent passing paths or pathnames by mistake.
Now appendFullNameTranslator(..., df)
for FullNameInterface
takes either pattern
or fixed
translations in data.frame.
Added sortBy()
to GenericDataFileSet
, which sorts files either in a lexicographic or a mixedsort order.
DOCUMENTATION: Added more Rd help pages.
DOCUMENTATION: Removed any duplicated \usage{}
statements from the Rd documentation.
indexOf()
for GenericDataFileSet
/List would return NA if the search pattern/string contained parentheses. The reason is that such have a special meaning in regular expression. Now indexOf()
first search by regular expression patterns, then by fixed strings. Thanks Johan Staaf at Lund University and Larry(?) for reporting on this issue.Now GenericDataFileSet$findByName(..., mustExist = FALSE)
do no longer throw an exception even if there is no existing root path.
Added argument firstOnly = TRUE
to findByName()
for GenericDataFileSet
.
Added appendFullNameTranslatorBy...()
methods to the FullNameInterface
class for data frames, TabularTextFile
:s, and TabularTextFileSet
:s.
"NA"
to the default na.strings
returned by getReadArguments()
for TabularTextFile
.NOTES:
.onUnknownArgs
to GenericDataFile()
and GenericDataFileSet()
. As before, the default is to throw an exception if there are unknown arguments. However, in certain case it is useful to allow (and ignore) “stray” arguments.indexOf()
of GenericDataFileSet
and GenericDataFileSetList
did not handle names with regular expression symbols +
and *
. Thanks to Randy Gobbel for the initial error report.GenericDataFile
and GenericDataFileSet
.fromFiles()
of GenericDataSet
. Use byPath()
instead.files
is logical, then extract()
of GenericDataFileSet
and GenericDataFileSetList
now asserts that the length of files
matches the number of available files.exData/
.readColumns(..., column=<string>)
on a TabularTextFile
would give “Error … object ‘columnNames’ not found”.default = "\\.([^.]+)$"
to getExtensionPattern()
of GenericDataFile
. Before the default value was hard coded inside this function.setExtensionPattern(..., pattern = NULL)
of GenericDataFile
works.Added protected as.data.frame()
to GenericDataFileSet
List.
Now GenericDataFile(NA, mustExist = FALSE
) is a valid object. Made all methods aware of such missing files.
Now extract(ds, c(1, 2, NA, 4), onMissing = "NA")
returns a valid GenericDataFileSet
where missing files are returned as missing GenericDataFile
:s.
Added na.rm = TRUE
to all getTags()
so that it returns NULL in case the file is missing.
copyTo()
of GenericDataFileSet
quietly ignores missing files.
Added Rd help for indexOf()
of GenericDataFileSet
.
ROBUSTNESS: Using new Arguments$getInstanceOf()
were possible.
Now all index arguments are validated correctly using the new max
argument of Arguments$getIndices()
. Before the case where max == 0
was not handled correctly.
Changed the default to parent = 0
for getDefaultFullName()
of GenericDataFileSet
to be consistent with the documentation.
Now GenericDataFile(pathname)
throws an error if pathname
is referring to a directory.
getPath()
and getDefaultFullName()
of GenericDataFileSet
would return a logical instead of character value.
indexOf(ds, names)
of GenericDataFileSet
would return a logical instead of an integer vector of NA:s if none of the names existed.
translateFullName()
of FullNameInterface
and translateColumnNames()
of GenericTabularFile
throw an exception if some fullnames were translated into NA. They also assert that no names were dropped or added in the process.After doing append()
to a GenericDataFileSet
, the total file size reported would remain the same.
Appending empty data sets using append()
of GenericDataFileSet
would give error ‘Error in this$files[[1]] : subscript out of bounds’.
Added {get,set}ExtensionPattern
() to FullNameInterface
.
Added getExtension()
to GenericDataFile
.
appendFullNameTranslatorBylist()
which makes it possible to do setup a sequence of fullnames translators fnt1
, fnt2
, fnt3
by calling setFullNameTranslator(..., list(fnt1, fnt2, fnt3))
.Added support for having a sequence of fullname translator functions. These can be added using appendFullNameTranslator()
.
Added an example()
to FullNameInterface
.
[()
to TabularTextFile
.Added the FullNameInterface
, which is the interface class that defines what fullnames, names, tags etc are.
Now setFullName*s*Translator()
for GenericDataFileSet
dispatches on the by
argument. If that is not possible, it call setFullNameTranslator()
for each file in the set (as before).
GenericDataFile
and GenericDataFileSet
implements the FullNameInterface
, which mean less redundant code.fromFiles()
to byPath()
. For backward compatibility the former calls the latter.findByName()
of GenericDataFileSet
follows Windows Shortcut links also for subdirectories.Analogously to the method for a GenericDataFile
, the setFullNameTranslator()
method for GenericDataFileSet
now assumes that the fullname translator function accepts also argument set
.
Added argument .fileSetClass
to GenericDataFileSet()
.
GenericDataFile
should accept any number of arguments. The first argument will always be (an unnamed) argument containing the name (or names) to be translated. If the translator is for a GenericDataFile
, an additional argument file
will also be passed. This allows the translator function to for instance read the file header and infer the name that way.Extracted several classes and methods from the aroma.core package.
Created package.