文档文库

手机版

投诉建议

首页 > 新技术介绍优缺点

新技术介绍优缺点

发布时间：2019-09-22 01:03:58 来源：文档文库

小中大

字号：

手机查看

Update on Gene Expression Analysis,Proteomics,and Network Discovery Gene Expression Analysis,Proteomics,and

Network Discovery1

Sacha Baginsky,Lars Hennig,Philip Zimmermann,and Wilhelm Gruissem*

Department of Biology and Zurich-Basel Plant Science Center,ETH Zurich Universita¨tstrasse2, 8129Zurich,Switzerland

Technological advances in biological experimenta-tion are now enabling researchers to investigate living systems on an unprecedented scale by studying ge-nomes,proteomes,or molecular networks in their entirety.Genomics technologies have led to a para-digm shift in biological experimentation because they measure(proﬁlemost or even all components of one class(e.g.transcripts,proteins,etc.in a highly parallel way.Whether gene expression analysis using micro-arrays,proteome and metabolome analysis using mass spectrometry,or large-scale screens for genetic interactions,high-throughput proﬁling technologies provide a rich source of quantitative biological information that allows researchers to move beyond a reductionist approach by both integrating and un-derstanding interactions between multiple compo-nents in cells and organisms(Fig.1;for a recent update of bioinformatics tools,see Pitzschke and Hirt,2010.Currently,most genomics experiments involve proﬁling transcripts,proteins,or metabolites. Increasing efforts to complement molecular data with phenotypic information will further advance our un-derstanding of the quantitative relationships between molecules in directing systems behavior and function. In the following Update we will brieﬂy review recent advances in theﬁeld and highlight advantages and limitations of current approaches to develop models of genetic and molecular networks that aim to describe emergent properties of plant systems.

GENOMICS TECHNOLOGIES:THE POWER OF GENOME-SCALE QUANTITATIVE DATA RESOLUTION PROFILING TRANSCRIPTOMES Transcript proﬁling offers the largest coverage and a wide dynamic range of gene expression information and can often be performed genome wide.Micro-arrays are currently most popular for transcript proﬁling and can be readily afforded by many laboratories.Various commercial and academic micro-array platforms exist that vary in genome coverage, availability,speciﬁcity,and sensitivity(Table I.Micro-arrays manufactured by Affymetrix are probably most commonly used in plant biology(Redman et al.,2004; Rehrauer et al.,2010,but commercial arrays from Agilent or arrays from the academic Complete Arabi-dopsis Transcriptome Micro(CATMAconsor-tium are often used as well(for review,see Busch and Lohmann,2007.Serial analysis of gene expression (SAGEand massively parallel signature sequencing (MPSSare well-established alternatives to microar-rays.Both techniques can be superior to microarrays because they do not depend on prior probe selection. More recently,direct sequencing of transcripts by high-throughput sequencing technologies(RNA-Seq has become an additional alternative to microarrays and is superseding SAGE and MPSS(Busch and Lohmann,2007.Like SAGE and MPPS,RNA-Seq does not depend on genome annotation for prior probe selection and avoids biases introduced dur-ing hybridization of microarrays.On the other hand, RNA-Seq poses novel algorithmic and logistic chal-lenges,and current wet-lab RNA-Seq strategies re-quire lengthy library preparation procedures. Therefore,RNA-Seq is the method of choice in projects using nonmodel organisms and for transcript discov-ery and genome annotation.Because of their robust sample processing and analysis pipelines,often micro-arrays are still a preferable choice for projects that involve large numbers of samples for proﬁling tran-scripts in model organisms with well-annotated ge-nomes.Tools such as Genevestigator(Hruz et al.,2008 and MapMan(Usadel et al.,2009allow researchers to organize large gene expression datasets and analyze them for relational networks within a single experi-ment or across many experiments(contextual meta-analysis.

PROFILING EPIGENOMES AND TRANSCRIPTION FACTOR BINDING

Much control of gene expression occurs at the level of transcription,and information on genome-wide chromatin proﬁles(epigenomesand transcription factor binding to promoters is needed to decipher

1This work was supported by the European Union(EU Framework Program6,AGRON-OMICS;grant no.LSHG–CT–2006–037704,the Swiss National Science Foundation,CTI(Swiss Innovation Promotion Agency,ETH Zurich,and the Functional Genomics Center Zurich for our proﬁling experiments.

*Corresponding author;e-mail wgruissem@ethz.ch.

The author responsible for distribution of materials integral to the ﬁndings presented in this article in accordance with the policy described in the Instructions for Authors(www.plantphysiol.orgis: Wilhelm Gruissem(wgruissem@ethz.ch.

www.plantphysiol.org/cgi/doi/10.1104/pp.109.150433

the inherent logic of transcriptional regulation.Chromatin immunoprecipitation (ChIPcoupled to microarray analysis (ChIP-chipor high-throughput sequencing (ChIP-Seqcan generate such data.In plants,DNA methylation,repressive and activating chromatin marks,as well as histone variants have been mapped onto the genome (for review,see Zhang,2008,but because such marks are expected to differ between cell types and developmental stages,more targeted epigenome proﬁling is needed in the future.Targeted analysis of DNA methylation during seed development,for instance,revealed unexpected genome-wide demethylation (Gehring et al.,2009;Hsieh et al.,2009.ChIP-chip was also used for global mapping of binding sites of transcription factors such as TGA2and SEPALLATA3and to reﬁne deﬁnitions of binding motifs that were previously determined by in vitro experiments (Thibaud-Nissen et al.,2006;Kaufmann et al.,2009.It was found that SEPAL-LATA3is a key component in the regulatory tran-scriptional network underlying the formation of ﬂoral organs.In a comparative experiment ChIP-chip and ChIP-Seq gave very similar results (Kaufmann et al.,2009.This is encouraging because bias introduced by the proﬁling technology seems not to severely con-found studies on global protein-binding proﬁles.Cur-rently,work is going on in several laboratories to establish a compendium of transcription factor binding sites in Arabidopsis (Arabidopsis thaliana .Thus,more genome-wide data sets are in reach that could provide causal explanations for transcriptional proﬁles.

PROFILING PROTEOMES

Gene expression is a highly regulated,multistep process,and it is impossible to predict the exact protein concentration or activity from the measure-ment of mRNA levels.Proteomics has therefore be-come a key tool in systems biology because it provides quantitative and structural information about pro-teins,which are the major functional determinants of cells.Phenotypic alterations associated with genetic perturbations often result from changes in protein accumulation or stability,or changes in protein posttranslational modiﬁcations,which can disrupt protein-protein interactions and network connectivity (Gstaiger and Aebersold,2009.Quantitative protein information complements data from transcriptional proﬁling and metabolomics.It represents a key link between different levels of gene expression regulation and provides insights into their causal relationships.Unlike transcriptional proﬁling,however,comprehen-sive proteome analysis remains challenging,and in-formation about proteome complexity and dynamics is far from complete (Cox and Mann,2007.Moreover,the rate of metabolite synthesis is often controlled by regulatory posttranslational modiﬁcations of enzymes and not only by their abundance.Information about quantitative relationships between RNA and protein accumulation,posttranslational protein modiﬁcations,and metabolite levels is therefore required to fully understand regulatory circuits that control systems behavior and function.

Protein quantiﬁcation can be absolute or relative (Table I.While relative protein quantiﬁcation mostly depends on stable isotopes,absolute quantiﬁcation of comprehensive protein sets is much more difﬁcult.Recent improvements in statistical data evaluation and increasing accuracy of mass spectrometry instru-ments allow quantifying large numbers of proteins in shotgun-type experiments on the basis of spectral counting (Lu et al.,2007.This method is reliable and comparable to most other quantiﬁcation methods,including two-dimensional PAGE-based protein stain-ing;however,the protein dataset must be very large.More accurate information about the exact in vivo concentration of individual proteins requires special-ized targeted approaches.

Current methods for absolute protein quantiﬁcation include isotope dilution strategies using isotopically labeled peptides as internal standards (for a

compre-Figure 1.Relationships between supracellular com-ponents (biological systems,intracellular compo-nents,and the function and behavior of these components are revealed by the interaction of indi-vidual components.Systems biological approaches aim at modeling these interactions to ﬁnd primary relationships and to distinguish causality and effect.The understanding of how these interactions are regulated allows making predictions on function,behavior,and survival.

Gene Expression Analysis,Proteomics,and Network Discovery

hensive review,see Brun et al.,2009.Signature pep-tides for internal standardization are characteristic for a protein of interest,and are often referred to as proteotypic peptides(PTPs.In AQUA,PTPs are added to analytical protein samples in known concen-trations.The protein samples are subsequently scan-ned for PTPs of interest.Using the extracted ion chromatograms the native peptide can then be quan-tiﬁed relative to the added PTP(Kuster et al.,2005.A modiﬁcation of this strategy accounts for quantiﬁca-tion errors derived from incomplete tryptic digest of the analytical sample.In QconCAT(for quantiﬁcation concatamer,a synthetic protein with concatenated,isotopically labeled PTPs is expressed as recombinant protein in a biological system,added to the sample prior to Trypsin treatment and carried through the digestion procedure,such that losses from incomplete tryptic digestion will also affect the quantity of the PTPs.Both the AQUA and the QconCAT strategies are incompatible with upstream fractionation techniques, which is a potential problem in biomarker quantiﬁ-cation.A way around this constraint is offered by the protein standard absolute quantiﬁcation strategy, which uses isotopically labeled protein standards that are added to the sample prior to fractionation. Several prediction tools exist that help to deﬁne the

Table I.Advantages and disadvantages of various technologies for the measurement of transcript and protein abundance

A systematic performance assessment for the different protein quantiﬁcation techniques was recently conducted(Turck et al.,2007and a detailed description of the different quantiﬁcation techniques along with examples for application in the plantﬁeld is available(Baginsky,2009.

Technologies Advantages Disadvantages Transcripts

MPSS Sequences do not need to be known in

advance

Relatively expensive,laborious

Microarrays Genome wide,relatively cheap,

streamlined handling,oligos Sequences must be known in advance; limited sensitivity due to hybridization

Quantitative reverse transcription-PCR High precision and high sensitivity

Increasingly multiplexed Not genome wide;data normalization sensitive to method/choice of reference genes

High-throughput sequencing Sequences do not need to be known in

advance;possibility to sequence very

short sequences Expensive at the moment,few solutions for downstream analysis;direct read out

Proteins

Relative quantiﬁcation via iTRAQ Established labeling protocol with stable

isotopes,good reproducibility,relevant

regulation factor can be determined

from the data,multiplexing to up to

eight samples,produces good quality

tandem mass spectrometry spectra Cost and effort,the analysis software is still not optimal,ﬂuctuations between different softwares possible

Relative quantiﬁcation via stable isotope labeling with amino acids in cell culture Established protocol for the labeling

of cell culture proteins,reliable

quantiﬁcation possible

Restricted to cell culture

Relative quantiﬁcation via extracted ion chromatograms Comes at no additional costs,software

tools for alignment and normalization

are available(e.g.SuperHirn;

Mueller et al.,2007

Only applicable to very similar

samples and very similar liquid

chromatography-mass spectrometry

runs,done within a small time

window,baseline normalization is

sometimes a problem

Absolute quantiﬁcation via AQUA peptides Highly sensitive absolute quantiﬁcation

on the basis of isotopically labeled

PTPs,targeted analyses possible

via speciﬁc scan methods(e.g.SRMFinding suitable PTPs and characteristic parent to daughter ion transitions

not straightforward,selectivity of the PTP transitions not always unambiguous

Absolute quantiﬁcation via QconCAT Excellent for the quantiﬁcation of protein

complex stoichiometry,lower cost

compared to AQUA,PTPs are

synthesized in a biological system Unsuitable for the quantiﬁcation of posttranslational modiﬁcations, optimization necessary,exact quantiﬁcation of the standard

is vital,incompatible with sample fractionation

Absolute quantiﬁcation via protein standard absolute quantiﬁcation Excellent for the quantiﬁcation of

individual,low abundance proteins,

compatible with fractionation

Restricted to few proteins,up scaling

difﬁcult,quantiﬁcations of

posttranslational modiﬁcations not

possible

Absolute quantiﬁcation via normalized spectral counting(APEX;Lu et al.,2007No additional costs,produces reliable

results with large-scale datasets

Quantiﬁcation of individual proteins

must be validated by additional tools,

unreliable for small datasets

Baginsky et al.

most suitable PTPs for the detection and quantiﬁcation of speciﬁc proteins.However,only experimental data provide the necessary reliability for PTP selection because in practice PTP prediction often deviates from experimental observations.Therefore,efforts are under way to catalogue PTPs for model organism proteomes.Proteome maps for Arabidopsis generated PTPs for4,105proteins,many of which may be opti-mal for the detection of proteins in different organs (Baerenfaller et al.,2008.

Similar quantitative approaches are also used for metabolites,because in addition to RNA and protein levels,understanding the function and behavior of metabolic networks requires global information about metabolite concentrations andﬂuxes as well.In recent years,much progress has been made in metabolic proﬁling,and the interested reader is referred to recent reviews(e.g.Issaq et al.,2009,and refs.therein.

TRANSCRIPTS AND MORE TRANSCRIPTS:

WHAT CAN WE LEARN FROM GENE EXPRESSION ANALYSIS?

During the analysis of large gene expression data-sets the researcher is often confronted with several questions.How do we interpret a mathematical rela-tionship between genes or between genes and condi-tions?For example,does a high correlation between two genes mean that they are coregulated,or could one of them be the positive regulator of the other?Or can we assume that they are involved in the same pathway or biological process?Although it is not possible to answer these questions conclusively from gene expression data alone,a number of parallel approaches can be useful to distinguish between dif-ferent scenarios.For example,Gene Ontology enrich-ment analysis can provide conﬁdence that a given gene cluster is enriched in genes that are known to have a common function,cellular location,or biolog-ical process.Similarly,conserved cis-regulatory ele-ments in the promoters of genes from the same cluster indicate that they are likely coregulated.Although these methods do not establish proof of the nature of the relationship between genes,they allow formulat-ing hypotheses that can be tested in the laboratory.In summary,although gene expression analysis by itself is rather descriptive(i.e.describing how genes re-spond to various test conditions or tissues,it is a valuable validation tool and an excellent starting point to study novel cellular process and to formulate novel hypotheses.

A major challenge of genome-scale transcription analysis is the very large number of predictors(genes compared to a generally small number of measure-ments(microarrays.Without appropriate statistical measures to correct for multiple testing and including false discovery rates,almost any approach will yield signiﬁcant genes,including many false positives.The creation of large databases in recent years has brought an additional layer of complexity and precautions to take(see Table II.For example,large databases such as Genevestigator(Hruz et al.,2008not only proﬁle a large number of genes,but also allow contextual meta-

Table II.Overview of some of the most popular plant gene expression microarray platforms and the number of available experiments

in Express

The Arabidopsis ATH1array is the most frequently used microarray,followed by the CATMA25k and23k arrays.In all,approximately750 Arabidopsis microarray experiments have been published so far.Rice(Oryza sativaand barley(Hordeum vulgareare the second and third plant species in terms of microarray experiments published.Soybean(Glycine maxalso has a high number of arrays,but this is due to a single very large experiment containing2,521arrays.IPK,Leibniz Institute of Plant Genetics and Crop Plant Research;TIGR,The Institute for Genomic Research.

Species Provider

Format

Name Experiments s

Arabidopsis Affymetrix8K AG41352

Affymetrix22K ATH15548,895

Agilent22K Arabidopsis234253

Agilent44K Arabidopsis3760

CATMA25K CATMA2_URGV to CATMA2.3_URGV83851

CATMA23K CATMA Arabidopsis23K array501,290

TIGR26K TIGR Arabidopsis whole genome6264 Rice Affymetrix57K GeneChip Rice Genome 29418

Agilent21K Agilent Rice Oligo Microarray22164 Barley Affymetrix22K GeneChip Barley Genome 351,165

IPK6K+4K IPK barley PGRC1_A and B7324 Medicago Affymetrix61K GeneChip Medicago Genome 19218 Maize Affymetrix17K GeneChip Maize Genome 22370 Soybean Affymetrix61K GeneChip Soybean Genome 223,236 Tomato(Solanum

lycopersicum

Affymetrix10K GeneChip Tomato Genome 6127 Grape(Vitis viniferaAffymetrix16K GeneChip Vitis vinifera Genome 6239 Wheat(Triticum aestivumAffymetrix61K GeneChip Wheat Genome 25811 Total96819,037

Gene Expression Analysis,Proteomics,and Network Discovery

analysis of several hundred conditions,each of which is covered by only a small number of replicates(usu-ally3–5.While some genes will respond to a small number of conditions and therefore their expression is easier to contextualize and interpret,other genes will respond to dozens or hundreds of conditions.It is often very difﬁcult to distinguish primary effects from secondary effects,because the intensity of the effect does not necessarily relate to the direct involvement of the corresponding condition in regulating a speciﬁc target gene.Breaking down these effects into local patterns(e.g.by using a biclustering algorithm;Prelic et al.,2006helps inﬁnding out conditions that are more directly linked to the gene of interest.

APPROACHING THE TARGET:FROM ORGANS TO TISSUES AND CELLS

Most transcript and protein proﬁling experiments analyze mixtures of tissues containing different cell types and organelles.This approach reveals certain global patterns,but quantitative analyses and model-ing is limited with such complex data.Therefore meth-ods for organ(or bettercell-type-speciﬁc transcript and protein proﬁling as well as for organelle-speciﬁc proteomics are needed.Four types of approaches are now commonly used to sample RNA and/or proteins from selected cell types:(1micropipetting,(2laser capture microdissection(LCM,(3protoplasting and sorting,and(4polysome immunopuriﬁcation(for review,see Zanetti et al.,2005;Hennig,2007;Nelson et al.,2008.

Micropipetting using microcapillaries directly ex-tracts the contents from selected cells.It has been successfully applied to various leaf cell types and for phloem but extraction is more difﬁcult from internal cells.LCM involves sectioning of frozen or embedded tissue,and subsequent dissection of the region of interest using laser excision.Applications of LCM include studies of vascular tissue,epidermis,and pericycle in maize(Zea maysand seed development in Arabidopsis.Micropipetting and LCM are usually very labor intensive and difﬁcult for isolation of small cells such as in meristems.Because of the limited amount of material that can be captured,they work well for transcript proﬁling,which can use ampliﬁca-tion steps,but provide only a very small coverage of the proteome.As an alternative,protoplasting and cell sorting offers rapid and accurate isolation of RNA from small cells.Speciﬁc tissues or cell types that are labeled by expression of GFP are isolated by proto-plasting and sorted through aﬂuorescence-activated cell sorter.Millions of cells can be processed within 1to2h,but care has to be taken to exclude changes in gene expression proﬁles by sample processing. This technique was successfully applied to measure genome-wide expression proﬁles in more than15root regions,establishing a compendium of digital in situ data(Birnbaum et al.,2003;Cartwright et al.,2009.It will be interesting to test whether this approach can also be used for protein proﬁling.Polysome immuno-puriﬁcation is based on the tissue-speciﬁc expression of the FLAG-tagged ribosomal protein L18in trans-genic plants(Zanetti et al.,2005.In contrast to micro-pipetting,LCM,and sorting of protoplasts,which all can be used to isolate total cellular RNA,polysome immunopuriﬁcation can be used to isolate transcripts that are associated with ribosomes(translatome.Dis-crepancies between total RNA levels and representa-tion translatome can reveal regulation at the level of translation(Mustroph et al.,2009.In the future,trans-latome datasets,which bridge transcriptomics and proteomics,can help to interpret unusual transcript-to-protein ratios(see below.

Alternatively,it is possible to identify cell-type-speciﬁc transcripts and proteins by comparing wild-type plants with mutants that lack speciﬁc cells or tissue types.In Arabidopsis,for instance,a series of homeotic mutants that lack variousﬂoral organs was used to identify several hundreds ofﬂoral organ-speciﬁc genes(Wellmer et al.,2004.If no appropriate mutants exist,speciﬁc cell types can be genetically ablated by expression of a cell-autonomous toxin,such as diphtheria toxin subunit A or RNase,under the control of cell-type-speciﬁc promoters.Again,these approaches have been proven to work for transcript proﬁling(Tung et al.,2005but it remains to be tested whether they could be useful for protein proﬁling.

DECREASING COMPLEXITY BY ORGANIZING ORGAN AND SUBCELLULAR PROTEOMES Systematic analysis of accurate protein localization is essential to understand cellular networks in the context of compartmentalization,which is a funda-mental design principle of eukaryotic cells.Organelle proteomics has therefore become a very active re-searchﬁeld.Until recently,the protein inventory of cell organelles was based on proteins from isolated organ-elles,such as mitochondria,chloroplast,and peroxi-somes(Lilley and Dupree,2007;Baginsky,2009.This approach has limitations because true low-abundant organelle proteins often cannot be distinguished from contaminating proteins.Two approaches have been used to deal with this problem.First,a recently reported isolation procedure for mitochondria used the electrostatic characteristics of the mitochondrial surface to separate mitochondria from other organelles in an electricﬁeld.This procedure results in mito-chondria preparations with higher purity,but the yield is low(Eubel et al.,2007.Second,information about the quantitative distribution of proteins along density gradients has been used to determine if a protein was enriched by the organellar isolation procedure.In practice,the abundance distribution proﬁle of un-known proteins is compared to known organelle marker proteins.This strategy is referred to as protein correlation proﬁling(Foster et al.,2006or LOPIT (Dunkley et al.,2006.

Baginsky et al.

Gene Expression Analysis, Proteomics, and Network Discovery Both procedures, however, are of limited use for the analysis of proteome dynamics in response to a stimulus because the long time that is needed to isolate and purify organelles affects their proteome properties. This is especially critical for transient posttranslational protein modiﬁcations. Thus, proteome dynamics is best analyzed at the cell or tissue level, followed by sorting of proteins into their respective organelle a posteriori. This strategy is now possible because substantial information about the protein complement of different cell organelles has accumulated (a comprehensive collection of proteome databases is for example available in Lu and Last, 2009. The SUBA database is most suitable for this purpose, because it is frequently updated and well maintained. SUBA generates lists of organelle proteins using reliability criteria, for example evidence from several different proteomics studies, targeting prediction, or GFP-localization assays, or a combination of this information (Heazlewood et al., 2007. For the chloroplast, two proteome reference tables have been established (Yu et al., 2008; Reiland et al., 2009. The overlap between these two proteome reference tables has generated a list of 1,156 proteins that can be considered high-conﬁdence chloroplast proteins. Although the number of organelle proteins is constantly increasing, it is not clear when an organelle proteome can be considered complete. Organelle proteomes are dynamic and functional organelle proteomes differ signiﬁcantly during development, in different cell types or tissues, and in different conditions. This problem can be addressed by considering organelles as cellular subnetworks and applying ﬂuxbalance modeling to assess network consistency. Initial modeling approaches with mitochondria and chloroplasts focused on a limited number of reactions, such as those of the Calvin cycle, amino acid biosynthesis, or the tricarboxylic acid cycle. Also, mitochondrial network reconstructions based on proteomics data are available and the existing models allow prediction of metabolite accumulation for a limited number of metabolites (Vo and Palsson, 2007. A recent ﬂux-balance model of the primary metabolism in Chlamydomonas reinhardtii localized reactions into chloroplasts, mitochondria, and the cytosol and assessed systematically the contribution of different organelles to biomass production (Boyle and Morgan, 2009. The above examples illustrate the excellent suitability of metabolic network reconstruction to identify gaps in existing knowledge. different levels, a comparison between transcript and protein accumulation can provide information about the rate of protein translation and the degree of posttranscriptional regulation. We have recently analyzed the correlation between protein and transcript abundance in representative samples from different plant organs and found mostly positive correlations in the range from 0.5 to 0.68 (Baerenfaller et al., 2008. The lowest correlation was observed for seeds, which accumulate stable storage proteins whose abundance is largely uncoupled from transcription. The highest correlation was obtained in leaves, suggesting that the most abundant photosynthetic proteins are predominantly regulated at the transcriptional level. It is clear that such a genome-scale analysis only offers a global view of regulatory events and does not allow a systematic assessment of individual enzyme regulation. A more reﬁned comparison of protein and transcript levels showed that the correlation between transcript and protein abundance can vary signiﬁcantly between different pathways (Kleffmann et al., 2004 and most likely also between different enzymes in the same pathway. Figure 2 shows an example of a correlation analysis of a representative leaf transcriptome and proteome for a selection of 345 genes/proteins from primary and secondary metabolism pathways. Although the data was collected from various sources and summarized (see also Baerenfaller et al., 2008, the protein-to-transcript ratio was similar for most proteins, indicating that this analysis is robust. The ma- THE CHALLENGE OF DATA INTEGRATION: GENOME-SCALE ANALYSIS OF RNA-PROTEIN CORRELATIONS Quantitative information about protein accumulation at genome scale offers entirely new insights into network function and the behavior of organs, tissues, and cells. Because gene expression is regulated at Plant Physiol. Vol. 152, 2010 Figure 2. Correlation analysis of transcript and protein abundance in Arabidopsis leaves based on 345 genes from various primary and secondary metabolism pathways. Transcript abundance was calculated as a representative expression vector derived from multiple Affymetrix ATH1 array measurements from leaf samples (data from Genevestigator, Hruz et al., 2008. The proteome data was obtained from distinct leaf samples. Approximately 20% of these genes/proteins had ratios of protein to transcript abundance deviating strongly from 1. 407

Baginsky et al. jority of proteins in most metabolic pathways were found within the typical protein-to-transcript range. Only a small fraction of proteins deviated signiﬁcantly from these ratios, both up or down and to a similar extent. This preliminary genome-scale correlation analysis revealed combined and consistent effects for particular pathways that are worth considering in further experimentation, for example by taking into account the effects of circadian rhythm, light, and nutrient status. The generation of protein and transcript data from the exact same samples is urgently needed to precisely address these questions and to understand how and under which conditions the protein-to-transcript ratio varies. FROM GENES AND PROTEINS TO NETWORK DISCOVERY Network discovery is a generic term describing the effort of elucidating the nature of relationships between molecules and associated properties emerging from of a biological network. Multiple types of networks have been described with respect to the types of molecules involved and the dimension of the molecular network (genome scale or small scale. While genome-/proteome-/metabolome-scale analyses aim at identifying novel properties of the global network, smaller scale networks usually incorporate additional data types that cannot be obtained on a global scale and use models that allow a more precise prediction of network behavior. A more recent development is the integration of various networks into an evolutionary ecology of networks, in which networks are considered as strategies that interact and possibly compete for resources (Weitz et al., 2007. Metabolic network reconstructions have received increasing attention in the last few years and several genome-scale models are now available for microorganisms and human tissues. However, none of the existing models can currently provide a full view of all reactions in a cell organelle. For example, even the most advanced genome-scale models for Saccharomyces cerevisiae only contain approximately 1,200 of the predicted approximately 2,200 metabolic genes and only approximately 70% of the modeled reactions are functional (Feist et al., 2009. The situation is worse for higher organisms, and for plants no genome-scale model exists to date although efforts to build knowledge databases are under way (Tsesmetzis et al., 2008. Progress in metabolic network reconstruction critically depends on the functional annotation of the genes that encode proteins with unknown functions, which is still the case for about 30% of the Arabidopsis genome. Accordingly, current plant metabolic network reconstruction focuses mostly on speciﬁc pathways such as fatty acid synthesis or Asp metabolism (Chen et al., 2009; Curien et al., 2009. Systematic efforts to improve genome annotation are under way, and the community is well connected via The Arabidopsis Informa408 tion Resource (www.arabidopsis.org, which supports and facilitates the exchange of material and information. One example of such a functional characterization pipeline is the Chloroplast 2010 project that was launched at Michigan State University (Lu et al., 2008. Here, homozygous knockout lines for all known chloroplast proteins are generated and phenotypically characterized, also at the metabolite level. In brief, a dramatic improvement in the functional annotation of genes is key to networks with improved quality and consistency. Similar efforts are under way to construct plant transcriptional regulatory networks, for example those that control ﬂower and root development (Grieneisen et al., 2007, photomorphogenesis (Jiao et al., 2007; Nemhauser, 2008, or the circadian clock (Zeilinger et al., 2006. On this smaller scale, graphical models, in particular Bayesian networks, are increasingly used in reverse engineering of genetic regulatory networks. Graphical models, such as sparse graphical Gaussian modeling, are powerful for a small number of genes and have been used to model the isoprene biosynthesis pathway network from temporal transcriptome data to discover new genes associated with this network (Wille et al., 2004. A similar graphical Gaussian modeling was carried out on a larger set of genes in Arabidopsis and allowed to discover novel components in various networks (Ma et al., 2007. In the future, analysis of transcriptional regulatory networks needs to incorporate also epigenome and transcription factor binding data from ChIP-chip and ChIP-Seq experiments. Regulatory network construction is also increasingly being used in plant breeding. For example, Keurentjes et al. (2007 showed that for many genes variation in expression could be explained by expression quantitative trait loci using recombinant inbred lines of Arabidopsis. By combining expression quantitative trait loci mapping and regulator candidate gene selection, gene regulatory networks for ﬂowering time could be built that were in agreement with published data. The combination of omics data with quantitative genetics data is expected to facilitate the understanding of complex regulatory networks governing important phenotypic traits such as yield, pathogen resistance, and nutrient acquisition and utilization. A variety of models exist for mapping complex traits and linking phenotypic outputs to changes in genomic regions (Hammer et al., 2006. ACKNOWLEDGMENTS We apologize to all colleagues whose work could not be cited due to space constraints. Received November 4, 2009; accepted December 6, 2009; published December 11, 2009. LITERATURE CITED Baerenfaller K, Grossmann J, Grobei MA, Hull R, Hirsch-Hoffmann M, Yalovsky S, Zimmermann P, Grossniklaus U, Gruissem W, Baginsky S Plant Physiol. Vol. 152, 2010

Gene Expression Analysis, Proteomics, and Network Discovery (2008 Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics. Science 320: 938–941 Baginsky S (2009 Plant proteomics: concepts, applications, and novel strategies for data interpretation. Mass Spectrom Rev 28: 93–120 Birnbaum K, Shasha DE, Wang JY, Jung JW, Lambert GM, Galbraith DW, Benfey PN (2003 A gene expression map of the Arabidopsis root. Science 302: 1956–1960 Boyle NR, Morgan JA (2009 Flux balance analysis of primary metabolism in Chlamydomonas reinhardtii. BMC Syst Biol 3: 4 Brun V, Masselon C, Garin J, Dupuis A (2009 Isotope dilution strategies for absolute quantitative proteomics. J Proteomics 72: 740–749 Busch W, Lohmann JU (2007 Proﬁling a plant: expression analysis in Arabidopsis. Curr Opin Plant Biol 10: 136–141 Cartwright DA, Brady SM, Orlando DA, Sturmfels B, Benfey PN (2009 Reconstructing spatiotemporal gene expression data from partial observations. Bioinformatics 25: 2581–2587 Chen M, Mooney BP, Hajduch M, Joshi T, Zhou M, Xu D, Thelen JJ (2009 System analysis of an Arabidopsis mutant altered in de novo fatty acid synthesis reveals diverse changes in seed composition and metabolism. Plant Physiol 150: 27–41 Cox J, Mann M (2007 Is proteomics the new genomics? Cell 130: 395–398 ´ Curien G, Bastien O, Robert-Genthon M, Cornish-Bowden A, Cardenas ML, Dumas R (2009 Understanding the regulation of aspartate metabolism using a model based on measured kinetic parameters. Mol Syst Biol 5: 271 Dunkley TP, Hester S, Shadforth IP, Runions J, Weimar T, Hanton SL, Grifﬁn JL, Bessant C, Brandizzi F, Hawes C, et al (2006 Mapping the Arabidopsis organelle proteome. Proc Natl Acad Sci USA 103: 6518–6523 Eubel H, Lee CP, Kuo J, Meyer EH, Taylor NL, Millar AH (2007 Free-ﬂow electrophoresis for puriﬁcation of plant mitochondria by surface charge. Plant J 52: 583–594 Feist AM, Herrgard MJ, Thiele I, Reed JL, Palsson BO (2009 Reconstruction of biochemical networks in microorganisms. Nat Rev Microbiol 7: 129–143 Foster LJ, de Hoog CL, Zhang Y, Zhang Y, Xie X, Mootha VK, Mann M (2006 A mammalian organelle map by protein correlation proﬁling. Cell 125: 187–199 Gehring M, Bubb KL, Henikoff S (2009 Extensive demethylation of repetitive elements during seed development underlies gene imprinting. Science 324: 1447–1451 Grieneisen VA, Xu J, Maree AFM, Hogeweg P, Scheres B (2007 Auxin transport is sufﬁcient to generate a maximum and gradient guiding root growth. Nature 449: 1008–1013 Gstaiger M, Aebersold R (2009 Applying mass spectrometry-based proteomics to genetics, genomics and network biology. Nat Rev Genet 10: 617–627 Hammer G, Cooper M, Tardieu F, Welch S, Walsh B, van Eeuwijk F, Chapman S, Podlich D (2006 Models for navigating biological complexity in breeding improved crop plants. Trends Plant Sci 11: 587–593 Heazlewood JL, Verboom RE, Tonti-Filippini J, Small I, Millar AH (2007 SUBA: the Arabidopsis Subcellular Database. Nucleic Acids Res 35: D213–D218 Hennig L (2007 Patterns of beauty—omics meets plant development. Trends Plant Sci 12: 287–293 Hruz T, Laule O, Szabo G, Wessendrop F, Bleuler S, Oertle L, Widmayer P, Gruissem W, Zimmermann P (2008 Genevestigator V3: a reference expression database for the meta-analysis of transcriptomes. Adv Bioinformatics 2008: 420747 Hsieh TF, Ibarra CA, Silva P, Zemach A, Eshed-Williams L, Fischer RL, Zilberman D (2009 Genome-wide demethylation of Arabidopsis endosperm. Science 324: 1451–1454 Issaq HJ, Van QN, Waybright TJ, Muschik GM, Veenstra TD (2009 Analytical and statistical approaches to metabolomics research. J Sep Sci 32: 2183–2199 Jiao Y, Lau OS, Deng XW (2007 Light-regulated transcriptional networks in higher plants. Nat Rev Genet 8: 217–230 Kaufmann K, Muino JM, Jauregui R, Airoldi CA, Smaczniak C, Krajewski P, Angenent GC (2009 Target genes of the MADS transcription factor SEPALLATA3: integration of developmental and hormonal pathways in the Arabidopsis ﬂower. PLoS Biol 7: e1000090 Keurentjes JJ, Fu J, Terpstra IR, Garcia JM, van den Ackerveken G, Snoek LB, Peeters AJ, Vreugdenhil D, Koornneef M, Jansen RC (2007 Regulatory network construction in Arabidopsis by using genomewide gene expression quantitative trait loci. Proc Natl Acad Sci USA 104: 1708–1713 Kleffmann T, Russenberger D, von Zychlinski A, Christopher W, Sjolander K, Gruissem W, Baginsky S (2004 The Arabidopsis thaliana chloroplast proteome reveals pathway abundance and novel protein functions. Curr Biol 14: 354–362 Kuster B, Schirle M, Mallick P, Aebersold R (2005 Scoring proteomes with proteotypic peptide probes. Nat Rev Mol Cell Biol 6: 577–583 Lilley KS, Dupree P (2007 Plant organelle proteomics. Curr Opin Plant Biol 10: 594–599 Lu P, Vogel C, Wang R, Yao X, Marcotte EM (2007 Absolute protein expression proﬁling estimates the relative contributions of transcriptional and translational regulation. Nat Biotechnol 25: 117–124 Lu Y, Last RL (2009 Web-based Arabidopsis functional and structural genomics resources. In The Arabidopsis Book. The American Society of Plant Biologists, Rockville, MD, doi/10.1199/tab.0118, http://www. aspb.org/publications/arabidopsis/ Lu Y, Savage LJ, Ajjawi I, Imre KM, Yoder DW, Benning C, Dellapenna D, Ohlrogge JB, Osteryoung KW, Weber AP, et al (2008 New connections across pathways and cellular processes: industrialized mutant screening reveals novel associations between diverse phenotypes in Arabidopsis. Plant Physiol 146: 1482–1500 Ma S, Gong Q, Bohnert HJ (2007 An Arabidopsis gene network based on the graphical Gaussian model. Genome Res 17: 1614–1625 Mueller LN, Rinner O, Schmidt A, Letarte S, Bodenmiller B, Brusniak MY, Vitek O, Aebersold R, Muller M (2007 SuperHirn—a novel tool for high resolution LC-MS-based peptide/protein proﬁling. Proteomics 7: 3470–3480 Mustroph A, Zanetti ME, Jang CJ, Holtan HE, Repetti PP, Galbraith DW, Girke T, Bailey-Serres J (2009 Proﬁling translatomes of discrete cell populations resolves altered cellular priorities during hypoxia in Arabidopsis. Proc Natl Acad Sci USA 106: 18843–18848 Nelson T, Gandotra N, Tausta SL (2008 Plant cell types: reporting and sampling with new technologies. Curr Opin Plant Biol 11: 567–573 Nemhauser JL (2008 Dawning of a new era: photomorphogenesis as an integrated molecular network. Curr Opin Plant Biol 11: 4–8 Pitzschke A, Hirt H (2010 Bioinformatic and systems biology tools to generate testable models of signaling pathways and their targets. Plant Physiol 152: 460–469 ¨ Prelic A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E (2006 A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22: 1122–1129 Redman JC, Haas BJ, Tanimoto G, Town CD (2004 Development and evaluation of an Arabidopsis whole genome Affymetrix probe array. Plant J 38: 545–561 Rehrauer H, Aquino C, Gruissem W, Henz S, Hilson P, Laubinger S, Naouar N, Patrignani A, Rombauts S, Shu H, et al (2010 AGRONOMICS1: a new resource for Arabidopsis transcriptome proﬁling. Plant Physiol 152: 487–499 Reiland S, Messerli G, Baerenfaller K, Gerrits B, Endler A, Grossmann J, Gruissem W, Baginsky S (2009 Large-scale Arabidopsis phosphoproteome proﬁling reveals novel chloroplast kinase substrates and phosphorylation networks. Plant Physiol 150: 889–903 Thibaud-Nissen F, Wu H, Richmond T, Redman JC, Johnson C, Green R, Arias J, Town CD (2006 Development of Arabidopsis whole-genome microarrays and their application to the discovery of binding sites for the TGA2 transcription factor in salicylic acid-treated plants. Plant J 47: 152–162 Tung CW, Dwyer KG, Nasrallah ME, Nasrallah JB (2005 Genome-wide identiﬁcation of genes expressed in Arabidopsis pistils speciﬁcally along the path of pollen tube growth. Plant Physiol 138: 977–989 Turck CW, Falick AM, Kowalak JA, Lane WS, Lilley KS, Phinney BS, Weintraub ST, Witkowska HE, Yates NA (2007 The Association of Biomolecular Resource Facilities Proteomics Research Group 2006 study: relative protein quantitation. Mol Cell Proteomics 6: 1291–1298 Tsesmetzis N, Couchman M, Higgins J, Smith A, Doonan JH, Seifert GJ, Schmidt EE, Vastrik I, Birney E, Wu G, et al (2008 Arabidopsis reactome: a foundation knowledgebase for plant systems biology. Plant Cell 20: 1426–1436 Usadel B, Poree F, Nagel A, Lohse M, Czedik-Eysenberg A, Stitt M (2009 Plant Physiol. Vol. 152, 2010 409

Baginsky et al. A guide to using MapMan to visualize and compare omics data in plants: a case study in the crop species, maize. Plant Cell Environ 32: 1211–1229 Vo TD, Palsson BO (2007 Building the power house: recent advances in mitochondrial studies through proteomics and systems biology. Am J Physiol Cell Physiol 292: C164–C177 Weitz JS, Benfey PN, Wingreen NS (2007 Evolution, interactions, and biological networks. PLoS Biol 5: e11 Wellmer F, Riechmann JL, Alves-Ferreira M, Meyerowitz EM (2004 Genome-wide analysis of spatial gene expression in Arabidopsis ﬂowers. Plant Cell 16: 1314–1326 ´ ¨ Wille A, Zimmermann P, Vranova E, Bleuler S, Furholz A, Hennig L, ´ Laule O, Prelıc A, von Rohr P, Thiele L, et al (2004 Sparse graphical gaussian modeling for genetic regulatory network inference. Genome Biology 5: R92.1–R92.13 Yu QB, Li G, Wang G, Sun JC, Wang PC, Wang C, Mi HL, Ma WM, Cui J, Cui YL, et al (2008 Construction of a chloroplast protein interaction network and functional mining of photosynthetic proteins in Arabidopsis thaliana. Cell Res 18: 1007–1019 Zanetti ME, Chang IF, Gong F, Galbraith DW, Bailey-Serres J (2005 Immunopuriﬁcation of polyribosomal complexes of Arabidopsis for global analysis of gene expression. Plant Physiol 138: 624–635 ´ Zeilinger MN, Farre EM, Taylor SR, Kay SA, Doyle FJ (2006 A novel computational model of the circadian clock in Arabidopsis that incorporates PRR7 and PRR9. Mol Syst Biol 2: 58 Zhang X (2008 The epigenetic landscape of plants. Science 320: 489–492 410 Plant Physiol. Vol. 152, 2010

本文来源：https://www.2haoxitong.net/k/doc/69b12b614793daef5ef7ba0d4a7302768e996fff.html