RIdeogram is a R package to draw SVG (Scalable Vector Graphics) graphics to visualize and map genome-wide data on idiograms.
If you use this package in a published paper, please cite this paper:
Hao Z, Lv D, Ge Y, Shi J, Weijers D, Yu G, Chen J. 2020. RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms. PeerJ Comput. Sci. 6:e251 http://doi.org/10.7717/peerj-cs.251
This is a simple package with only three functions ideogram
, convertSVG
and GFFex
.
First, you need to load the package after you installed it.
Then, you need to load the data from the RIdeogram package.
data(human_karyotype, package="RIdeogram")
data(gene_density, package="RIdeogram")
data(Random_RNAs_500, package="RIdeogram")
You can use the function “head()” to see the data format.
head(human_karyotype)
#> Chr Start End CE_start CE_end
#> 1 1 0 248956422 122026459 124932724
#> 2 2 0 242193529 92188145 94090557
#> 3 3 0 198295559 90772458 93655574
#> 4 4 0 190214555 49712061 51743951
#> 5 5 0 181538259 46485900 50059807
#> 6 6 0 170805979 58553888 59829934
Specifically, the ‘karyotype’ file contains the karyotype information and has five columns (or three, see below). The first column is Chromosome ID, the second and thrid columns are start and end positions of corresponding chromosomes and the fourth and fifth columns are start and end positions of corresponding centromeres.
head(gene_density)
#> Chr Start End Value
#> 1 1 1 1000000 65
#> 2 1 1000001 2000000 76
#> 3 1 2000001 3000000 35
#> 4 1 3000001 4000000 30
#> 5 1 4000001 5000000 10
#> 6 1 5000001 6000000 10
The ‘mydata’ file contains the heatmap information and has four columns. The first column is Chromosome ID, the second and thrid columns are start and end positions of windows in corresponding chromosomes and the fourth column is a characteristic value in corresponding windows, such as gene number.
head(Random_RNAs_500)
#> Type Shape Chr Start End color
#> 1 tRNA circle 6 69204486 69204568 6a3d9a
#> 2 rRNA box 3 68882967 68883091 33a02c
#> 3 rRNA box 5 55777469 55777587 33a02c
#> 4 rRNA box 21 25202207 25202315 33a02c
#> 5 miRNA triangle 1 86357632 86357687 ff7f00
#> 6 miRNA triangle 11 74399237 74399333 ff7f00
The ‘mydata_interval’ file contains the label information and has six columns. The first column is the label type, the second column is the shape of label with three available options of box, triangle and circle, the third column is Chromosome ID, the fourth and fifth columns are the start and end positions of corresponding labels in the chromosomes and the sixth column is the color of the label.
Or, you can also load your own data by using the function read.table
, such as
human_karyotype <- read.table("karyotype.txt", sep = "\t", header = T, stringsAsFactors = F)
gene_density <- read.table("data_1.txt", sep = "\t", header = T, stringsAsFactors = F)
Random_RNAs_500 <- read.table("data_2.txt", sep = "\t", header = T, stringsAsFactors = F)
The “karyotype.txt” file contains karyotype information; the “data_1.txt” file contains heatmap data; the “data_2.txt” contains track label data.
In addition, we also provide a simple function GFFex
for the heatmap information (like gene density) extraction from a GFF file. First, you need to download the GFF file of one species genome, for example, human genome annotation file from GENCODE (ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.annotation.gff3.gz). Then, you need to prepare the karyotype file with the format same as the one mentioned above. The only thing you need to notice is that the chromosome ID at the first column in the karyotype file must be the same as that in the gff file (in this case, like chr1, chr2,…). Next, you can run the following code:
You can use the argument “feature” (default value is “gene”) to select the feature you want to extract from the GFF file and the argument “window” (default value is “1000000”) to set the window size.
Now, you can visualize these information using the ideogram
function.
Basic usage
ideogram(karyotype, overlaid = NULL, label = NULL, label_type = NULL, synteny = NULL, colorset1, colorset2, width, Lx, Ly, output = "chromosome.svg")
convertSVG(svg, device, width, height, dpi)
Now, let’s begin.
First, we draw a idiogram with no mapping data.
Then, you will find a SVG file and a PNG file in your Working Directory.
Next, we can map genome-wide data on the chromosome idiogram. In this case, we visulize the gene density across the human genome.
ideogram(karyotype = human_karyotype, overlaid = gene_density)
convertSVG("chromosome.svg", device = "png")
Alternatively, we can map some genome-wide data with track labels next to the chromosome idiograms.
ideogram(karyotype = human_karyotype, label = Random_RNAs_500, label_type = "marker")
convertSVG("chromosome.svg", device = "png")
We can also map the overlaid heatmap and track labels on the chromosome idiograms at the same time.
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500, label_type = "marker")
convertSVG("chromosome.svg", device = "png")
If you want to change the color of heatmap, you can modify the argument ‘colorset1’ (default set is colorset1 = c(“#4575b4”, “#ffffbf”, “#d73027”)). You can use either color names as listed by colors()
or hexadecimal strings of the form “#rrggbb” or “#rrggbbaa”.
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500, label_type = "marker", colorset1 = c("#fc8d59", "#ffffbf", "#91bfdb"))
convertSVG("chromosome.svg", device = "png")
If you don not know the centromere information in your species, you don not need to modify the script. In this case, the ‘karyotype’ file has only three columns.
To simulate this case, we deleted the last two columns of the ‘human_karyotype’ file.
human_karyotype <- human_karyotype[,1:3]
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500, label_type = "marker")
convertSVG("chromosome.svg", device = "png")
If there are only ten chromosomes in your species, maybe you need to motify the argument ‘width’ (default value is “170”).
To simulate this case, we only keep the first ten columns of the ‘human_karyotype’ file.
Before
human_karyotype <- human_karyotype[1:10,]
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500, label_type = "marker")
convertSVG("chromosome.svg", device = "png")
After
human_karyotype <- human_karyotype[1:10,]
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500, label_type = "marker", width = 100)
convertSVG("chromosome.svg", device = "png")
If you want to move the Legend, then you need to modify the arguments ‘Lx’ and ‘Ly’(default values are “160” and “35”, separately).
‘Lx’ means the distance between upper-left point of the Legend and the left margin; ‘Ly’ means the distance between upper-left point of the Legend and the upper margin.
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = Random_RNAs_500, label_type = "marker", width = 100, Lx = 80, Ly = 25)
convertSVG("chromosome.svg", device = "png")
We also provide other types of label, like “heatmap”, “line” and “polygon”. For heatmap label, you can use the following scripts to map and visualize these data on idiograms.
data(human_karyotype, package="RIdeogram") #reload the karyotype data
ideogram(karyotype = human_karyotype, overlaid = gene_density, label = LTR_density, label_type = "heatmap", colorset1 = c("#f7f7f7", "#e34a33"), colorset2 = c("#f7f7f7", "#2c7fb8")) #use the arguments 'colorset1' and 'colorset2' to set the colors for gene and LTR heatmaps, separately.
convertSVG("chromosome.svg", device = "png")
For one-line label,
data(liriodendron_karyotype, package="RIdeogram") #load the karyotype data
data(Fst_between_CE_and_CW, package="RIdeogram") #load the Fst data for overlaid heatmap
data(Pi_for_CE, package="RIdeogram") #load the Pi data for one-line label
head(Pi_for_CE) #this data has a similar format with the heatmap data with additional column of "Color" which indicate the color for the line.
#> Chr Start End Value Color
#> 1 1 1 2000000 0.00273566 fc8d62
#> 2 1 1000001 3000000 0.00239580 fc8d62
#> 3 1 2000001 4000000 0.00319407 fc8d62
#> 4 1 3000001 5000000 0.00286900 fc8d62
#> 5 1 4000001 6000000 0.00186596 fc8d62
#> 6 1 5000001 7000000 0.00186182 fc8d62
ideogram(karyotype = liriodendron_karyotype, overlaid = Fst_between_CE_and_CW, label = Pi_for_CE, label_type = "line", colorset1 = c("#e5f5f9", "#99d8c9", "#2ca25f"))
convertSVG("chromosome.svg", device = "png")
For two-line label,
data(liriodendron_karyotype, package="RIdeogram") #load the karyotype data
data(Fst_between_CE_and_CW, package="RIdeogram") #load the Fst data for overlaid heatmap
data(Pi_for_CE_and_CW, package="RIdeogram") #load the Pi data for two-line label
head(Pi_for_CE_and_CW) #this data has a similar format with the one for one-line label with additional two columns indicating the second feature you want to show. When you prepare your own data, please keep the exact same column names.
#> Chr Start End Value_1 Color_1 Value_2 Color_2
#> 1 1 1 2000000 0.00273566 fc8d62 0.00385702 8da0cb
#> 2 1 1000001 3000000 0.00239580 fc8d62 0.00331109 8da0cb
#> 3 1 2000001 4000000 0.00319407 fc8d62 0.00374530 8da0cb
#> 4 1 3000001 5000000 0.00286900 fc8d62 0.00339141 8da0cb
#> 5 1 4000001 6000000 0.00186596 fc8d62 0.00305246 8da0cb
#> 6 1 5000001 7000000 0.00186182 fc8d62 0.00323655 8da0cb
ideogram(karyotype = liriodendron_karyotype, overlaid = Fst_between_CE_and_CW, label = Pi_for_CE_and_CW, label_type = "line", colorset1 = c("#e5f5f9", "#99d8c9", "#2ca25f"))
convertSVG("chromosome.svg", device = "png")
For one-polygon label,
data(liriodendron_karyotype, package="RIdeogram") #load the karyotype data
data(Fst_between_CE_and_CW, package="RIdeogram") #load the Fst data for overlaid heatmap
data(Pi_for_CE, package="RIdeogram") #load the Pi data for one-polygon label
ideogram(karyotype = liriodendron_karyotype, overlaid = Fst_between_CE_and_CW, label = Pi_for_CE, label_type = "polygon", colorset1 = c("#e5f5f9", "#99d8c9", "#2ca25f"))
convertSVG("chromosome.svg", device = "png")
For two-polygon label,
data(liriodendron_karyotype, package="RIdeogram") #load the karyotype data
data(Fst_between_CE_and_CW, package="RIdeogram") #load the Fst data for overlaid heatmap
data(Pi_for_CE_and_CW, package="RIdeogram") #load the Pi data for two-polygon label
ideogram(karyotype = liriodendron_karyotype, overlaid = Fst_between_CE_and_CW, label = Pi_for_CE_and_CW, label_type = "polygon", colorset1 = c("#e5f5f9", "#99d8c9", "#2ca25f"))
convertSVG("chromosome.svg", device = "png")
Comparing with the two-line label plot, we shift all x coordinates of the second polygon labels to right with a 0.2X chromosome width for better visualization.
In addition, you can use the argument “device” (default value is “png”)to set the format of output file, such as, “tiff”, “pdf”, “jpg”, etc. And, you can use the argument “dpi” (default value is “300”) to set the resolution of the output image file.
Also, there are four shortcuts to convert the SVG images to these optional image formats with no necessary to set the argument “device”, such as
svg2tiff("chromosome.svg")
svg2pdf("chromosome.svg")
svg2jpg("chromosome.svg")
svg2png("chromosome.svg")
For genome synteny analysis, we can use the ideogram
function to visualize the genome synteny results between two or three genomes.
For dual genome comparison, load the example data first,
data(karyotype_dual_comparison, package="RIdeogram")
head(karyotype_dual_comparison)
#> Chr Start End fill species size color
#> 1 I 1 23037639 969696 Grape 12 252525
#> 2 II 1 18779884 969696 Grape 12 252525
#> 3 III 1 19341862 969696 Grape 12 252525
#> 4 IV 1 23867706 969696 Grape 12 252525
#> 5 V 1 25021643 969696 Grape 12 252525
#> 6 VI 1 21508407 0ab276 Grape 12 252525
table(karyotype_dual_comparison$species)
#>
#> Grape Populus
#> 19 19
data(synteny_dual_comparison, package="RIdeogram")
head(synteny_dual_comparison)
#> Species_1 Start_1 End_1 Species_2 Start_2 End_2 fill
#> 1 1 12226377 12267836 2 5900307 5827251 cccccc
#> 2 15 5635667 5667377 17 4459512 4393226 cccccc
#> 3 9 7916366 7945659 3 8618518 8486865 cccccc
#> 4 2 8214553 8242202 18 5964233 6027199 cccccc
#> 5 13 2330522 2356593 14 6224069 6138821 cccccc
#> 6 11 10861038 10886821 10 8099058 8011502 cccccc
If you want to import your own data, using read.table
function as mentioned above. One thing you need to notice is that the format of karyotype for genome synteny visualization is a little bit different: First three columns are the same, the fourth is the color you want to fill the idiograms, the fifth is the name of species, the rest two columns are the size and color of species name. This karyotype file contains information of two genomes (species A: Grape and species B: Populus) with species A being sorted to the front. And, for dual genome synteny file: the first three columns are position information in species A (Grape) and the next three columns are position information in species B (Populus) of corresponding synteny blocks, the last column is the color of the bezier curves which link corresponding synteny blocks. Please sort the colourful lines to the last as possiable as you can.
Then, run the code as folloing
ideogram(karyotype = karyotype_dual_comparison, synteny = synteny_dual_comparison)
convertSVG("chromosome.svg", device = "png")
For ternary genome comparison, load the example data first,
data(karyotype_ternary_comparison, package="RIdeogram")
head(karyotype_ternary_comparison)
#> Chr Start End fill species size color
#> 1 NA 1 15980527 fcb06b Amborella 10 fcb06b
#> 2 NA 1 11522362 fcb06b Amborella 10 fcb06b
#> 3 NA 1 11085951 fcb06b Amborella 10 fcb06b
#> 4 NA 1 10537363 fcb06b Amborella 10 fcb06b
#> 5 NA 1 9585472 fcb06b Amborella 10 fcb06b
#> 6 NA 1 9414115 fcb06b Amborella 10 fcb06b
table(karyotype_ternary_comparison$species)
#>
#> Amborella Grape Liriodendron
#> 100 19 19
data(synteny_ternary_comparison, package="RIdeogram")
head(synteny_ternary_comparison)
#> Species_1 Start_2 End_2 Species_2 Start_1 End_1 fill type
#> 1 1 4761181 2609697 1 342802 981451 cccccc 1
#> 2 6 6344197 8074393 1 15387184 16716190 cccccc 1
#> 3 10 6457890 9052487 1 11224953 14959548 cccccc 1
#> 4 13 6318795 1295413 1 20564870 21386271 cccccc 1
#> 5 16 1398101 2884119 1 21108654 22221088 cccccc 1
#> 6 16 1482529 2093625 1 21864494 22364888 cccccc 1
tail(synteny_ternary_comparison, n = 20)
#> Species_1 Start_2 End_2 Species_2 Start_1 End_1 fill type
#> 571 16 19278042 20828694 2 95267449 93334736 cccccc 3
#> 572 12 20546006 22461088 2 22647943 18365764 cccccc 3
#> 573 4 22259262 23453956 2 15068249 17839485 cccccc 3
#> 574 14 22377895 23821929 2 97299880 96033346 cccccc 3
#> 575 6 1538773 2808373 1 91285578 95681546 cccccc 3
#> 576 11 3381792 4954528 1 67689752 75286468 cccccc 3
#> 577 9 4814481 6975840 1 69506847 76015710 cccccc 3
#> 578 10 7091825 9742616 1 19333526 24516133 cccccc 3
#> 579 13 22063957 23402389 1 95843870 92195256 cccccc 3
#> 580 7 679765 1881756 6 7365421 7531534 e41a1c 1
#> 581 7 679765 2752867 13 501561 766473 e41a1c 1
#> 582 7 679765 3012501 8 7406703 8222490 e41a1c 1
#> 583 7 2049369 2942034 14 29350547 34369929 e41a1c 2
#> 584 7 2075095 1538540 10 28985737 30815217 e41a1c 2
#> 585 13 531939 834472 14 28866243 35278211 e41a1c 3
#> 586 8 7427221 8894821 14 28632063 34805893 e41a1c 3
#> 587 6 7567597 7690342 14 32050301 34913801 e41a1c 3
#> 588 13 501561 876423 10 30496700 27874100 e41a1c 3
#> 589 6 7171014 7815454 10 31408837 27660041 e41a1c 3
#> 590 8 5773528 9346871 10 31408837 26585934 e41a1c 3
The format of karyotype file for ternary genome synteny visualization is similar to that of dual genome syteny visualization, containing one more species karyotype information and being sorted in the order of species A (Amborella), B (Grape) and C (Liriodendron). However, the synteny file is different from that of dual genome syteny visualization. Because this synteny file contains three comparisons, i.e., species A_vs_B, species A_vs_C and species B_vs_C, we add one additional column with the number “1” being representative of the species A_vs_B, “2” being representative of the species A_vs_C and “3” being representative of the species B_vs_C. Also, please sort the colourful lines to the last as possiable as you can.
Then, run the code as following
ideogram(karyotype = karyotype_ternary_comparison, synteny = synteny_ternary_comparison)
convertSVG("chromosome.svg", device = "png")
In addition, if you want use gradient color for the bezier curves which you want to highlighted (red lines in the above picture), just replace the red color “e41a1c” with “gradient” in the seventh column (as like in the example data of “synteny_ternary_comparison_graident”). Here, we first load the example data and visualize the ternary genome syteny using ideogram
function. And since R graphics does not support the SVG element of gradient fill, we use the rsvg_pdf
function from rsvg package to convert this svg file into a pdf file directly. So, maybe you need to install the rsvg package if you want to show the gradient fill or you can also open the svg file with Inkscape and then save as a pdf file.