Extracting a Data Portion

The {portion} R package offers convenient tools to extract data portions from common R objects:

works for vector, matrix, data.frame, and list objects
the relative portion size can be selected
allows extracting first, last, random, similar, or dissimilar data
can portion row- and column-wise
provides selected indices as an attribute
preserves object attributes

Installation

You can install the released version from CRAN with:

install.packages("portion")

Examples

Portion a vector by selecting similar or dissimilar values:

set.seed(1)
x <- c(1, 1, 2, 2)
portion(x, proportion = 0.5, how = "similar")
#> [1] 1 1
#> attr(,"indices")
#> [1] 1 2
portion(x, proportion = 0.5, how = "dissimilar")
#> [1] 1 2
#> attr(,"indices")
#> [1] 1 3

Portion a matrix row-wise or column-wise:

x <- matrix(LETTERS[1:24], nrow = 4)
portion(x, proportion = 0.5, how = "first")
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] "A"  "E"  "I"  "M"  "Q"  "U" 
#> [2,] "B"  "F"  "J"  "N"  "R"  "V" 
#> attr(,"indices")
#> [1] 1 2
portion(x, proportion = 0.5, how = "first", byrow = FALSE)
#>      [,1] [,2] [,3]
#> [1,] "A"  "E"  "I" 
#> [2,] "B"  "F"  "J" 
#> [3,] "C"  "G"  "K" 
#> [4,] "D"  "H"  "L" 
#> attr(,"indices")
#> [1] 1 2 3

Portion a data.frame at random. The selected row or column indices are stored in the "indices" attribute.

set.seed(1)
x <- as.data.frame(diag(8))
portion(x, proportion = 0.3, how = "random")
#>   V1 V2 V3 V4 V5 V6 V7 V8
#> 1  1  0  0  0  0  0  0  0
#> 4  0  0  0  1  0  0  0  0
#> 8  0  0  0  0  0  0  0  1
portion(x, proportion = 0.3, how = "random", byrow = FALSE)
#>   V2 V3 V5
#> 1  0  0  0
#> 2  1  0  0
#> 3  0  1  0
#> 4  0  0  0
#> 5  0  0  1
#> 6  0  0  0
#> 7  0  0  0
#> 8  0  0  0

For clustering data frame rows, all non-ignored columns must be numeric or logical. Use ignore to exclude identifiers, labels, or grouping columns from the clustering data while keeping them in the returned object.

x <- data.frame(value = c(1, 1, 5, 5), group = c("a", "a", "b", "b"))
portion(x, proportion = 0.5, how = "similar", ignore = 2)
#>   value group
#> 1     1     a
#> 2     1     a

Portion each element of a list:

x <- list(1:5, diag(3), data.frame(1:3, 2:4))
portion(x, proportion = 0.5, how = "last")
#> [[1]]
#> [1] 3 4 5
#> attr(,"indices")
#> [1] 3 4 5
#> 
#> [[2]]
#>      [,1] [,2] [,3]
#> [1,]    0    1    0
#> [2,]    0    0    1
#> attr(,"indices")
#> [1] 2 3
#> 
#> [[3]]
#>   X1.3 X2.4
#> 2    2    3
#> 3    3    4