A toolkit for analysing large-scale plant small RNA datasets

发布时间:2013-03-10 19:06:32   来源:文档文库   
字号:
BIOINFORMATICS APPLICATIONS NOTE Vol.24no.192008,pages2252–2253doi:10.1093/bioinformatics/btn428 Genome analysisA toolkit for analysing large-scale plant small RNA datasetsSimon Moxon1,†,Frank Schwach1,†,Tamas Dalmay2,Dan MacLean3,David J.Studholme3and Vincent Moulton1,∗1School of Computing Sciences,2School of Biological Sciences,University of East Anglia,Norwich,NR47TJ and3The Sainsbury Laboratory,Colney Lane,Norwich,NR47UH,UKReceived on May27,2008;revised on July14,2008;accepted on August10,2008Advance Access publication August19,2008Associate Editor:Ivo HofackerABSTRACTSummary:Recent developments in high-throughput sequencing technologies have generated considerable demand for tools to analyse large datasets of small RNA sequences.Here,we describe a suite of web-based tools for processing plant small RNA datasets. Our tools can be used to identify micro RNAs and their targets, compare expression levels in sRNA loci,andfind putative trans-acting siRNA loci.Availability:The tools are freely available for use at http://srna-tools.cmp.uea.ac.ukContact:vincent.moulton@cmp.uea.ac.uk1INTRODUCTIONSeveral classes of small(20–30nt)non-coding RNAs(sRNAs)can be distinguished by biogenesis and function in post-transcriptional gene regulation and epigenetic control in plants,animals and fungi(for reviews see:Brodersen and V oinnet,2006;Lippman and Martienssen,2004).Micro RNAs(miRNAs)and trans-acting siRNAs(ta-siRNAs)are two important classes of sRNAs that both induce post-transcriptional silencing of target genes. Computationally,miRNAs can be identified by their characteristic fold-back precursors,while ta-siRNA are found by a‘phased’alignment pattern at their genomic regions of origin(Axtell et al., 2006).Novel high-throughput sequencing technologies greatly facilitate small RNA detection and analysis(Hafner et al.,2007).However, the lack of supporting data analysis tools presents a major bottleneck. Here,we present an easy-to-use web-based toolkit that is specifically geared towards the analysis of large-scale plant sRNA datasets. Plant specific tools are necessary due to important differences in the biogenesis and mode of action between plant and animal sRNAs (Millar and Waterhouse,2005).2DESCRIPTION OF THE TOOLS2.1miRCat:miRNA detectionmiRCat identifies mature miRNAs and their precursors.Users upload a FASTAfile of sRNA sequences,which are mapped to ∗To whom correspondence should be addressed.†The authors wish it to be known that,in their opinion,thefirst two authors should be regarded as joint First Authors.a plant genome using PatMaN(Prüfer et al.,2008)and grouped into loci.To enrich for miRNA candidates,a number of empirical and published criteria for bonafide miRNA loci are applied by the software(Jones-Rhoades et al.,2006,details listed on the tool’s website).In brief,the program searches for a two-peak alignment pattern of sRNAs on one strand of the locus and assesses the secondary structures of a series of putative precursor transcripts using the RNAfold(Hofacker et al.,1994)and randfold(Bonnet et al.,2004)programs.As a result,miRCat produces threefiles: (i)a comma-separated text(csv)file with the details for predicted miRNA candidates,(ii)the RNAfold output for candidate precursors and(iii)a FASTAfile of predicted mature miRNA sequences. miRCat has been tested on several high-throughput plant sRNA datasets and shows a high level of sensitivity and specificity. When tested on a publicly available Arabidopsis leaf sRNA dataset (GEO accession GSM118373;Rajagopalan et al.,2006)containing 186899sRNA sequences,miRCat predicted89miRNA loci using default parameters.Eighty-three of these predictions were known miRNA sequences and6novel miRNA loci were predicted(Fig.1a). There were91known miRNA loci with an sRNA abundance offive or more(default threshold for miRCat)in the dataset.This shows 91.2%sensitivity and,even if all novel predictions would have been false positives,this would give a specificity of99.93%(8362loci tested).As a web-based tool,miRCat complements related software developed for local installation and command line use,such as a recently published program for discovering miRNAs in animal datasets(Friedländer et al.,2008).2.2SiLoCo:sRNA locus expression comparisonHigh-throughput sequencing can be used to compare sRNA expression profiles under varying conditions or between mutants and wild-type to gain insights into the biogenesis and function of sRNAs.Plant sRNA populations are highly complex with many genomic loci producing highly diverse sRNA populations.In such cases,individual sequences may not be found more than once even in very large datasets,thus making it necessary to group sRNAs by their locus of origin in the genome and compare expression levels on a locus,rather than individual sequence levels.Such an approach also needs to take into account the degree of repetitiveness of sRNA matches to the genome.SiLoCo identifies sRNA loci on plant genomes from two sRNA datasets,which can be uploaded by the user and/or selected from publicly available datasets.SiLoCo maps sRNA sequences to the genome using PatMaN(Prüfer et al.,2008)2252©The Author2008.Published by Oxford University Press.All rights reserved.For Permissions,please email:journals.permissions@oxfordjournals.org at Harbin Medical Uniersity on December 12, 2010 bioinformatics.oxfordjournals.org Downloaded from

本文来源:https://www.2haoxitong.net/k/doc/d13b1afff61fb7360a4c6503.html

《A toolkit for analysing large-scale plant small RNA datasets.doc》
将本文的Word文档下载到电脑,方便收藏和打印
推荐度:
点击下载文档

文档为doc格式