Detects top discriminators that contribute to group separation based on the fixation index (Fst).

top.discriminator(
  cluster.obj = NULL,
  group1,
  group2,
  bim.file,
  use.node.number = FALSE,
  num.top = 100,
  percentile = 0.9,
  use.percentile = FALSE,
  use.path = FALSE,
  result.path = NULL
)

Arguments

cluster.obj

The object which is returned from ipcaps. This parameter is used when use.path is FALSE.

group1

To specify the first group number to be compared. (also see use.node.number)

group2

To specify the second group number to be compared. (also see use.node.number)

bim.file

Option: In case that SNP information is not provided to ipcaps, an absolute path of SNP information file is required. It needs to be in PLINK format (bim). See more details at: http://zzz.bwh.harvard.edu/plink/data.shtml.

use.node.number

To specify whether a group number or a node number is be used. If TRUE, a node nubmer is used instead. Default = FALSE.

num.top

A number of top Fst SNPs to be returned. This parameter is used when percentile is FALSE. Default = 100.

percentile

A percentile for top Fst SNPs to be returned. This parameter is used when percentile is TRUE. Default = 0.999.

use.percentile

A logical value to indicate whether percentile is used instead of num.top. This parameter is used when percentile is TRUE. Default = FALSE.

use.path

A logical value to indicate whether result.path is used instead of cluster.obj. Importantly, result.path needs to be set. This parameter only work with the IPCAPS's result from version 1.1.7 onward. Default = FALSE.

result.path

A path to an result directory of IPCAPS. This parameter is used when use.path is TRUE. This parameter only work with the IPCAPS's result from version 1.1.7 onward.

Value

The returned value is a data.frame of SNP information sorting by Fst in descending order, which contains 7 columns, chr, SNP, centimorgans, position, allele1, allele2, and Fst. The column 1-6 are SNP information from the bim file. The column Fst contains estimated Fst between group1 and group2.

Examples

# Importantly, bed file, bim file, and fam file are required # Use the example files embedded in the package BED.file <- system.file("extdata","ipcaps_example.bed",package="IPCAPS") LABEL.file <- system.file("extdata","ipcaps_example_individuals.txt.gz",package="IPCAPS") my.cluster <- ipcaps(bed=BED.file,label.file=LABEL.file,lab.col=2,out=tempdir())
#> Running ... IPCAPS #> output: /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk #> label file: /private/var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T/RtmpYTlCbr/temp_libpath1351142c69ef5/IPCAPS/extdata/ipcaps_example_individuals.txt.gz #> label column: 2 #> threshold: 0.18 #> minimum Fst: 8e-04 #> maximum thread: 1 #> method: mix #> bed: /private/var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T/RtmpYTlCbr/temp_libpath1351142c69ef5/IPCAPS/extdata/ipcaps_example.bed #> minimum in group: 20 #> data type: snp #> missing: NA #> In preprocessing step #> Note: the directory '/var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk' is existed. #> The result files will be saved to this directory: /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8 #> Input data: 103 individuals, 2000 markers #> Start calculating #> Node 1: Start the process #> Node 1: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 1: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/condition.RData #> Node 1: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/rawdata.RData #> Node 1: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/node1.RData #> Node 1: Check for splitting #> Node 1: Reducing matrix #> Node 1: EigenFit = 0.652486925995028, Threshold = 0.18, no. significant PCs = 3 #> Node 1: Start clustering #> Node 1: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/node1.RData #> Node 1: done for clustering #> Time difference of 0.1103411 secs #> Node 1: Checking Fst of group 1 and group 2 #> Node 1: Checking Fst of group 1 and group 3 #> Node 1: Checking Fst of group 1 and group 4 #> Node 1: Checking Fst of group 2 and group 3 #> Node 1: Checking Fst of group 2 and group 4 #> Node 1: Checking Fst of group 3 and group 4 #> Node 1 times took 0.34123706817627 secs #> Node 1: 103 individuals were splitted into: 4 groups #> Node 1: Return status 0 #> Node 1: Split to sub-nodes #> Node 1: Saving /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/node2.RData #> Node 1: Saving /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/node3.RData #> Node 1: Saving /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/node4.RData #> Node 1: Saving /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/node5.RData #> Node 1: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 1: Updating /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 1: Done! #> Time difference of 0.484498 secs #> Node 2: Start the process #> Node 2: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 2: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/condition.RData #> Node 2: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/rawdata.RData #> Node 2: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/node2.RData #> Node 2: Check for splitting #> Node 2: A number of node is lower than the minimum number (20), therefore split was not performed #> Node 2: Return status 1 #> Node 2: No split was performed, Status=1 #> Node 2: Updating /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/leafnode.RData #> Node 2: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 2: Updating /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 2: Done! #> Time difference of 0.03994012 secs #> Node 3: Start the process #> Node 3: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 3: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/condition.RData #> Node 3: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/rawdata.RData #> Node 3: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/node3.RData #> Node 3: Check for splitting #> Node 3: A number of node is lower than the minimum number (20), therefore split was not performed #> Node 3: Return status 1 #> Node 3: No split was performed, Status=1 #> Node 3: Updating /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/leafnode.RData #> Node 3: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 3: Updating /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 3: Done! #> Time difference of 0.03689599 secs #> Node 4: Start the process #> Node 4: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 4: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/condition.RData #> Node 4: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/rawdata.RData #> Node 4: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/node4.RData #> Node 4: Check for splitting #> Node 4: A number of node is lower than the minimum number (20), therefore split was not performed #> Node 4: Return status 1 #> Node 4: No split was performed, Status=1 #> Node 4: Updating /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/leafnode.RData #> Node 4: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 4: Updating /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 4: Done! #> Time difference of 0.03421998 secs #> Node 5: Start the process #> Node 5: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 5: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/condition.RData #> Node 5: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/rawdata.RData #> Node 5: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/node5.RData #> Node 5: Check for splitting #> Node 5: Reducing matrix #> Node 5: EigenFit = 0.343228072167861, Threshold = 0.18, no. significant PCs = 3 #> Node 5: Start clustering #> Node 5: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/node5.RData #> Node 5: done for clustering #> Time difference of 0.09402514 secs #> Node 5: Checking Fst of group 1 and group 2 #> Node 5 times took 0.126171827316284 secs #> Node 5: 100 individuals were splitted into: 2 groups #> Node 5: Return status 0 #> Node 5: Split to sub-nodes #> Node 5: Saving /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/node6.RData #> Node 5: Saving /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/node7.RData #> Node 5: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 5: Updating /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 5: Done! #> Time difference of 0.2726109 secs #> Node 6: Start the process #> Node 6: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 6: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/condition.RData #> Node 6: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/rawdata.RData #> Node 6: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/node6.RData #> Node 6: Check for splitting #> Node 6: Reducing matrix #> Node 6: EigenFit = 0.0367738545252401, Threshold = 0.18, no. significant PCs = #> Node 6: No split was performed because EigenFit is lower than threshold #> Node 6: Return status 1 #> Node 6: No split was performed, Status=1 #> Node 6: Updating /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/leafnode.RData #> Node 6: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 6: Updating /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 6: Done! #> Time difference of 0.1290462 secs #> Node 7: Start the process #> Node 7: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 7: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/condition.RData #> Node 7: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/rawdata.RData #> Node 7: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/node7.RData #> Node 7: Check for splitting #> Node 7: Reducing matrix #> Node 7: EigenFit = 0.0221170820186458, Threshold = 0.18, no. significant PCs = #> Node 7: No split was performed because EigenFit is lower than threshold #> Node 7: Return status 1 #> Node 7: No split was performed, Status=1 #> Node 7: Updating /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/leafnode.RData #> Node 7: Loading /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 7: Updating /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/RData/tree.RData #> Node 7: Done! #> Time difference of 0.1021471 secs #> In post process step #> Exporting node 2 as group 1 #> Exporting node 3 as group 2 #> Exporting node 4 as group 3 #> Exporting node 6 as group 4 #> Exporting node 7 as group 5 #> Note: save as /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8/groups.txt
#> The result files were saved at: /var/folders/sp/hhmj9xvx53z4g4dktf5f503r0000gp/T//Rtmpvv9uwk/cluster_output8 #> Total runtime is 2.11641001701355 sec
table(my.cluster$cluster$label,my.cluster$cluster$group)
#> #> 1 2 3 4 5 #> outlier3 1 1 1 0 0 #> pop1 0 0 0 0 50 #> pop2 0 0 0 50 0
# 1 2 3 4 5 6 # outlier4 5 4 1 0 0 0 # pop1 0 0 0 0 250 0 # pop2 0 0 0 0 0 250 # pop3 0 0 0 250 0 0 #Identify top discriminators between groups, for example, group 4 and group 5 top.snp <-top.discriminator(my.cluster,4,5) #or, specify the bim file #top.snp <-top.discriminator(my.cluster,4,5,bim.file="ipcaps_example.bim") head(top.snp)
#> chr SNP centimorgans position allele1 allele2 Fst #> V1628 1 marker1628 0 16280000 A T 0.2285229 #> V1843 1 marker1843 0 18430000 A T 0.2203918 #> V111 1 marker111 0 1110000 A T 0.2129322 #> V1817 1 marker1817 0 18170000 A T 0.1669118 #> V1350 1 marker1350 0 13500000 A T 0.1665826 #> V1974 1 marker1974 0 19740000 A T 0.1562058
# chr SNP centimorgans position allele1 allele2 Fst #V5452 1 marker5452 0 54520000 A T 0.11337260 #V2348 1 marker2348 0 23480000 A T 0.11194490 #V8244 1 marker8244 0 82440000 A T 0.09556580 #V5972 1 marker5972 0 59720000 A T 0.08747794 #V3561 1 marker3561 0 35610000 A T 0.08725860 #V8419 1 marker8419 0 84190000 A T 0.08293494 #Alternatively, specify the previous result directory of IPCAPS and identify #top discriminators between groups, for example, group 4 and group 5 previous.res.path <- my.cluster$output.dir top.snp <-top.discriminator(result.path = previous.res.path, use.path = TRUE, group1 = 4, group2 = 5) head(top.snp)
#> chr SNP centimorgans position allele1 allele2 Fst #> V1628 1 marker1628 0 16280000 A T 0.2285229 #> V1843 1 marker1843 0 18430000 A T 0.2203918 #> V111 1 marker111 0 1110000 A T 0.2129322 #> V1817 1 marker1817 0 18170000 A T 0.1669118 #> V1350 1 marker1350 0 13500000 A T 0.1665826 #> V1974 1 marker1974 0 19740000 A T 0.1562058
#Identify top discriminators between groups, for example, group 4 and group 5 top.snp <-top.discriminator(my.cluster,4,5) #or, specify the bim file #top.snp <-top.discriminator(my.cluster,4,5,bim.file="ipcaps_example.bim") dim(top.snp)
#> [1] 100 7
head(top.snp)
#> chr SNP centimorgans position allele1 allele2 Fst #> V1628 1 marker1628 0 16280000 A T 0.2285229 #> V1843 1 marker1843 0 18430000 A T 0.2203918 #> V111 1 marker111 0 1110000 A T 0.2129322 #> V1817 1 marker1817 0 18170000 A T 0.1669118 #> V1350 1 marker1350 0 13500000 A T 0.1665826 #> V1974 1 marker1974 0 19740000 A T 0.1562058
#Here, it is possible to select the top Fst SNPs based on a percentile. top.snp <-top.discriminator(my.cluster,4,5, percentile = 0.9, use.percentile = TRUE) dim(top.snp)
#> [1] 200 7
head(top.snp)
#> chr SNP centimorgans position allele1 allele2 Fst #> V1628 1 marker1628 0 16280000 A T 0.2285229 #> V1843 1 marker1843 0 18430000 A T 0.2203918 #> V111 1 marker111 0 1110000 A T 0.2129322 #> V1817 1 marker1817 0 18170000 A T 0.1669118 #> V1350 1 marker1350 0 13500000 A T 0.1665826 #> V1974 1 marker1974 0 19740000 A T 0.1562058
#Identify top discriminators between groups, for example, node 7 and node 8 top.snp2 <-top.discriminator(my.cluster,7,8,use.node.number=TRUE) head(top.snp2)
#> chr SNP centimorgans position allele1 allele2 Fst #> V1 1 marker1 0 10000 A T NaN #> V2 1 marker2 0 20000 A T NaN #> V3 1 marker3 0 30000 A T NaN #> V4 1 marker4 0 40000 A T NaN #> V5 1 marker5 0 50000 A T NaN #> V6 1 marker6 0 60000 A T NaN
# chr SNP centimorgans position allele1 allele2 Fst #V5452 1 marker5452 0 54520000 A T 0.11337260 #V2348 1 marker2348 0 23480000 A T 0.11194490