正在进行安全检测...

发布时间:2023-10-20 22:52:48   来源:文档文库   
字号:
DesignandManagementofDataWarehousesReportontheDMDW’99Workshophttp://sunsite.informatik.rwth-aachen.de/DMDW99/StellaGatziuManfredJeusfeld1IntroductionTheideaofbuildingdatawarehousesascentraldatacollec-tionsmadeavailablefordecisionsupportapplicationsinacompanyiswidelyaccepted.Theconcretedesignandman-agementofadatawarehousefromatechnicalaswellasfromanorganizationalpointofview,however,turnsouttobefarfromtrivialbutrequiressophisticatedandtimecon-sumingefforts.TheDMDWworkshopwasheldattheCAiSE’99confer-enceinHeidelbergonJune14-15,1999.Ithadtheintentiontobringtogetherpractitionersandresearcherstodiscussthedesignandmanagementofdatawarehouses.Thevariouspresentationsgaveabroadviewonthedatawarehouselifecyclecoveringaspectsrelevantatdesigntime,atbuildtimeandatruntime.Overall,DMDW’99wasrecognizedasasuccess.The30+participantsenjoyedthehighqualitypro-gram(acceptancerateof50percentandhadvividdiscus-sions.Inthisreport,wereviewthepresentationsgivenattheDMDWworkshopandpresentsomeopenproblemswhichwebelieveshouldbeaddressedbyfutureresearchandwhosesolutioncouldcontributetomakedatawarehouseresearchmorerelevanttothepractice.2SupportingdatawarehousedesignOneofthemostimportanttaskswhendesigningaware-houseistominimizethecostofansweringqueriesbecausethewarehouseisverylarge,queriesareoftenad-hocandcomplex,anddecisionsupportapplicationsrequireshortre-sponsetimes.Ontheonehand,consideringthewarehouseitselfasasetofviewsdefinedovertheremotesourcedata,itsbasicconfiguration,i.e.theavailabledataandtheirin-terrelationship,clearlycontributestothis.Furthermore,theprocessingcostsofqueriescanbereducedbymaterializingUniversityofZurich,DepartmentofComputerScience,Winterthur-erstrasse190,8057Zurich,Switzerland;gatziu@ifi.unizh.chInfolab,TilburgUniversity,Postbus90153,5000LETilburg,TheNetherlands;jeusfeld@kub.nlSwissLife,InformationSystemsResearch,P.O.Box,8022Zurich,Switzerland;martin.staudt@swisslife.chNationalTechnicalUniversityofAthens,ComputerScienceDivision,Zographou15773,Athens,Greece;yv@cs.ntua.grMartinStaudtYannisVassiliouviewdataforfrequentlyaskedqueries.However,materi-alizingallpossibleviewsmayexceedtheavailablestoragespaceandimposeshighcostforkeepingtheviewsup-to-date.Thedeterminationoftheoptimalcollectionofviewsinbothcasesiscalledtheviewselectionproblem.ThreepresentationsatDMDWweredevotedtothisproblem.AsakindofpreparatorytaskMichaelAkindeandMichaelohlenpresentedtheconstructionofviewgraphstructurestobeusedbyviewselectionalgorithms.Thecontributionofthisworkistheconstructionofsuchgraphsinthepresenceofaggregationandgrouping.Theviewgraphsexpresshowsourcedataismaterializedintoviews.Thealgorithmbasicallycreatesthesearchspaceofpossibleconfigurationsofmaterializedviews.Adhocrulesareusedtolimitthesizeofthegeneratedviewgraphs.Oneopenproblemistocombineapruningstrategyforthealgorithmwithspecializedviewselectionalgorithms.AconcreteapproachtoselectingviewsunderqualityrequirementswasproposedbyDimitriTheodoratosandMokraneBouzeghoub.Theyrefinetheviewselectionprobleminordertoincludesourceavailabilityconstraints(howfrequentlycanadatasourcebeaccessedforviewmaintenanceandcurrencyconstraints(howoldcandataelementsinthedatawarehousebe.Theyshowthatthecurrencyandsourceavailabilityconstraintscanberestrictedtosimpleviews(onbaserelations.Iftheconstraintssatisfactionproblemhasnosolution,thealgorithmcanidentifythesourcerelationswhichcausetheconstraintviolation.Ifasolutionexists,thealgorithmcomputestheminimalupdatefrequenciestoachievethedesireddatacurrency.Randomizedalgorithmswerefoundtobeusefulforchoosingoptimalqueryevaluationplansduringqueryopti-mization.ThisideacanalsobeadoptedforviewselectionasshownbyMinsooLeeandJoachimHammerwhoemployageneticalgorithmtoaddressthesearchspaceproblem.Thebitsinagenomeencodewhetheraviewismaterializedornot.Comparisontoexhaustivesearchyieldsthatthesolu-tionsdeliveredbythegeneticalgorithmarewithin10%oftheoptimalsolution.Thealgorithmisespeciallysuitedfordatawarehouseswithalargenumberofviewsandwithfre-quentchangestoquerydefinitions.Here,exhaustivesearch1
isintractablewhereasthegeneticalgorithmdeliversresultswithinseconds.Whileviewselectionisatechnicalproblemusuallysolvedinarelationalcontext,theconceptualmodellingofadatawarehouseemployshigher-levelformalisms.EnricoFran-coniandUlrikeSattlerobservethattraditionalconceptualdatamodels(likeERdiagramslackconstructsforexpress-ingaggregation.Sincethisfeatureisessentialfordataware-houses,theyproposeanextensiontoERdiagramswhichal-lowstomodelaggregationoverdifferentdimensions.Multi-plehierarchiesofdimensionscanberepresentedinparallel.TheauthorsusetheMDsemanticsofCabbiboandTorlonetoassociateaninterpretationtotheirmodels.Besidesthat,themodelscanbemappedtodescriptionlogicexpressionswhichenablesreasoningontheconceptuallevelofware-housedesignsuchasthedetectionofinconsistencies.3PracticalaspectsofwarehousedesignPracticaldatawarehousedesignhastoincludetheinforma-tiondemandsexistinginthedatawarehouseapplicationcon-textandalsoprojectmanagementaspects.Theusageofad-vancedcommercialtoolssupportingcertainaspectsofdatawarehousedesignbecomesincreasingimportant.Anissuearisingbeforetheactualconceptualmodellingphaseforadatawarehousetakesplaceisinformationanal-ysis.HanSchouteninvestigatesintheanalysisanddesignofdatawarehousesinamoregeneralandabstractwaywith-outconsideringthewarehouseasasetofviews.Hepointsoutthatdatawarehouseanalysisconcernstheanalysisoftheuserneedsfordeterminingtherequiredwarehousedataandinparticulartherequiredderivationsandtheaggregationlevel.Datawarehousedesignconsistsingroupingderivablefactsintodatawarehouserelations.Functionalaswellasso-calledweakfunctionaldependenciesareconsideredforpro-ducingasuitablewarehousedesign.Existentialgraphsareproposedasarepresentationframeworkwhichofferarichersetofconstructstomodelcardinalitiesandconstraintsonentityattributes.Itisarguedthatanexpressiveframeworkisneededtocoversemanticpropertieslikedifferentversionsofthesubtype-supertyperelationship.Whileconcentrationinmanysoftwaresegmentsisstillongoing,thedatawarehousetoolarenaischaracterizedbyabroaddispersebothinhorizontalandverticaldirection(manyvendorsofmanyspecial-purposetools.MicrosofthasincludeddatawarehousespecificamendmentsintotheirgeneralOIMframeworkandisbecomingaplayerintheETL-toolmarket.JensOttoSørensenandKarlAlnoraddresseddatawarehousedesignfromthetoolangle.Theirquestionwas:CanadatawarehousebeeffectivelydesignedusingtheMicrosoftSQLServer(tmanditsDTScomponent?Intheircasestudy,theyselectedarelationalschemaforbooksandarticles.Astarschemawasdesignedtogetherwiththedataflowgraphsspecifyinghowdatasourcesarefedintothedatawarehouse(thewell-knowndatapumps.ThedatatransferiseitherimplementedbySQLqueriesorbyuser-definableprograms.Theauthorsconcludethattheavailabletoolsweresufficientandeasytousefordesigningthedatawarehouseandsourceintegration.Thecasestudywashoweverrestrictedtotherathercleanpublicationsdatabasedeliveredwiththeproduct.Datawarehousingconsistsofvariousprocessestobeex-ecutedatdesigntime,atbuildtimeandatruntime.TheauditingwithindatawarehouseprojectswasaddressedbyJos´eRodero,Jos´eToval,andMarioPiattiniwithmainem-phasisonembeddingthedatawarehouseintoacompany’sorganization.Theyidentifythemainactivities:datasourceidentification,datasourceintegration,datastorage,andana-lyticalprocessing.Foreachoftheseactivities,controlobjec-tives,metrics,andrecommendationstocopewithproblemsarepresented.TheframeworkfollowstheCOBITstandardforinformationsystemsdeploymentandsharesitsbusinessorientation.4LoadingthedatawarehouseTheloadingprocessesrunningonadatawarehouserelyoncomplexspecificationsofparallelandinteractingstreamsofoperationsonthedataextractedfromthesources.MokraneBouzeghoub,Franc¸oiseFabretandMajaMatulo-vic-Broqu´eproposetoviewdatawarehouserefreshmentasaworkflowapplicationinsteadofaviewmaterializationproblem.Transferringdatafromsourcestothedataware-househastoconsiderparametersliketheavailabilityofthesources.Moreover,datacleaning,integration,andcus-tomizationaredistinctprocesseswhichhavetobecarefullycoordinatedfortherefreshment.Theauthorsproposeanevent-drivenworkflowmodeltodescribetheinteractionoftheparallelprocesses.Differentworkflowreferencemodels(calledscenariosareadaptedtothespecificrequirementsofadatawarehouseproject.DiegoCalvanese,GuiseppeDeGiacomo,MaurizioLenz-erini,DanieleNardi,andRicardoRosatofocusonadeclar-ativerepresentationofthedependencybetweensourcesandwarehouseallowingthegenerationofmediatorsforextrac-tionandloading,ratherthanontheoverallloadingpro-cess.Theyproposeaconceptualrepresentationofdatasourcesandtheirinterrelationshipbyso-calledcorrespon-dences.Therepresentationhasaninterpretationindescrip-tionlogic(allowingreasoningaboutequivalenceofschemaconceptsandaDataloginterpretation(allowingtorepresentqueriesintermsoftheconcepts.Inter-schemaconstraintsareingredientsforthereasoningmethod.Dataconversionfunctions,e.g.forconvertingunits,arerepresentedasadorn-mentstotheDatalogrepresentations.Theconceptsofthedatasourcesaredefinedintermsoftheenterprisemodel,i.e.thedatasourcesareconsideredasviewsonan(imaginaryenterprisedatabase.Wheneveranewviewisintroducedinthedatawarehouseitsspecificationisrewrittenusingthecorrespondences.Theresultisamediatorprogramwhichreferstothedatasourcesandappliesconversion,matchingandreconciliationroutinesonandamongthem.2

本文来源:https://www.2haoxitong.net/k/doc/f40076ff700abb68a982fb03.html

《正在进行安全检测....doc》
将本文的Word文档下载到电脑,方便收藏和打印
推荐度:
点击下载文档

文档为doc格式