Three-dimensional projection pursuit

发布时间:   来源:文档文库   
字号:
Three-DimensionalProjectionPursuit
ByGUYNASON
DepartmentofMathematics,UniversityofBristol,UniversityWalk,Bristol,BS81TW,UKEmail:G.P.Nason@bristol.ac.uk
SUMMARY
Thedevelopmentandusageofanapproachtothree-dimensionalprojectionpursuitisdiscussed.Thewell-establishedJonesandSibsonmomentsindexischosenasacomputationallyefficientprojectionindextoextendto3D.The3Dindexwasinitiallydevelopedtofindinterestinglinearcombinationsofspectralbandsinamultispectralimage.Computeralgebraicmethodsareextensivelyemployedtohandlethecomplexformulaethatconstitutetheindexandareexplainedindetail.Adiscussionofimportantpracticalissuessuchasinterpretingprojectionsolutions,dealingwithoutliersandoptimizationtechniquescompletesthedescriptionoftheindex.Anartificialtetrahedraldatasetisusedtodemonstratehow3Dprojectionpursuitcanproducebetterclustersthanthoseobtainedbyprincipalcomponentsanalysis.Themainexampleshowshow3Dprojectionpursuitcansuccessfullycombinebandstodiscoveralternativeclusterstothoseproducedby,say,principalcomponents.
Keywords:projectionpursuit;multispectralimages;clustering;computeralgebra;varimaxrotation
1Introduction
Thisarticlediscussesvariousaspectsofprojectionpursuitintothreedimensions.Theaimofprojectionpursuitistofindinterestinglinearcombinationsofvariablesinamultivariatedataset.
Theprecisedefinitionof“interesting”isgivenlater
butclustersandotherformsofnon-linearstructureareinteresting.One-andtwo-dimensionalprojectionpursuithavebeendealtwithextensivelyintheliteratureandsomeexcellentsoftwareimplementationsareavailable.Thebenefitofprojectionintothree-dimensionsisthatmorecomplexstructurescanbeidentifiedthanwithlower-dimensionalprojections.Projectionpursuitintothreedimensionsisparticularlyattractivefortwofurtherperceptualreasons.Firstly,coloursnaturallycorrespondto3-vectors,forexamplethroughtheRGBrepresentation.Secondly,pointcloudsandotherobjectsinthreedimensionscanbeinvestigatedoncomputerscreens.Forexamplethroughspinning3Dplots,whichareimmediatelycomprehensiblebecauseofour3Dintuition.Thesereasonsareimportantwhenapplying3Dprojectionpursuit
1

tomultispectralimages(colourandmultivariatedatasets(intuition.
Section2brieflydescribesprojectionpursuitandincludesdetailsonprojectionindicesandtheprocessofsphering.Section3explainsthatwehavechosentoextendJonesandSibson’s(1987well-knownmomentsindexintothreedimensionsbecauseofitscomputationalefficiency.TheformulaeforthemomentsindexwereanalyticallycomputedbythecomputeralgebrapackageREDUCE(seeSection3.3.Section3alsoaddressesthedifferentiationandoptimizationofthemomentsindex,examineshowoutlierscanbetreatedtoprovidebetterprojectionsolutionsanddiscusseshowoptimalprojectionscanberotatedtogivesolutionsthataremoreeasilyinterpreted.
Section4givestwoexamplesofprojectionpursuitinaction.Thefirstexampleappliesthepursuittoanartificialdatasetthathasathree-dimensionalstructureembeddedwithinanoisysix-dimensionaldataset.Nosingleone-ortwo-dimensionalprojectionwillclearlyshowthestructureandwiththesedatatheprincipalcomponentsarecontaminatedwithnoise.Theprojectionpursuitmethodclearlyisolatesthethree-dimensionalstructureandgivesclearerdefinitiontotheclustersthanprincipalcomponentsanalysisdoes.Manyrealmultivariatedatasetsareofthistype(forexample,theLubischew(1962beetledata.
Thesecondexampleandthemainreasonforthedevelopmentofthree-dimensionalpursuitisdiscussedsection4.3.Thisexampleshowshowprojectionpursuitmaybeappliedtosomerealmultispectralimagedatatoproducelow-dimensionalprojectionsthatexhibitclustering.Wearguethattighterclustersinvariable-spaceresultinbettercontrastsbetweendifferinglandusetypesandwegiveanexamplewherethisisso.
2Abriefdescriptionofprojectionpursuit
Suppose
isa
datamatrixof
observationson
variables.Definethe
multivariatemeanof
by:

Thesamplevariancematrixof
isobtainedfromthecentreddataby

2.1Projectionindices
Thechoiceofprojectionindexisimportantinprojectionpursuit.Theprojectedsamplevariance(1couldbeusedasaprojectionindex,buttherewouldbelittlepointbecausethereisananalyticalsolution.
SuccessfulprojectionindicesaredesignedtorespondtointerestingorclusteredvariationnotjustthelargevariationdiscoveredbyPCA.EarlyworkbyFriedmanandTukey(1974developedaprojectionindextosearchfornon-linearstructure.SubsequentlyHuber(1985andJonesandSibson(1987consideredthepopulationratherthanthesamplecaseandassumedthattheprojecteddata,
,hasadensity,
.Theirprojectionindiceswerebasedonmeasuringthedepartureof
from
thestandardnormaldensity.Thisisbasedontheheuristicthatnormalityisthe
leastinterestingdensityontheline.Inpracticedeparturescanbemeasuredbyformingadensityestimate
oftheprojecteddataandcomparingittostandardnormality.
Huber(1985andJonesandSibson(1987suggestedaprojectionindexbasedontheShannonentropy
(4
Theentropyindex(4issometimesusedinprojectionpursuitbecauseitisuniquelyminimizedbythestandardnormaldensity.Thereareseveralotherpossiblechoicesforaprojectionindex(seeCooketal.(1993foralist.Projectionintotwodimensionsiscommonandhasbeenpromotedextensivelyintheliteratureandinimplementation.Anexcellentimplementationthatexistsforrunningtwo-dimensionalpursuitistheXGobiprogramdescribedbySwayneetal.(1991;1990.
2.2Centringandsphering
Adatasetisspheredbyusingalineartransformtocausethetransformeddatatohavezeromeanandidentityvariancematrix.
Ifalineartransformation
isappliedtothecentreddata
thentheresult
isalsocentredandhasvariance
4

Oneconvenientchoiceof
thatensuresthat
istheidentitymatrixis

interpretationofthefinalprojectionsolution.
Manytwo-dimensionalindicescouldbeextendedtoathree-dimensionalform.Wechooseoneinparticularbecauseitrequireslesscomputationaleffortthanotherindices.Theone-dimensionalindexwasdevisedbyJonesandSibson(1987andisbasedonthefollowingapproximationofthedifferencebetweentheShannonentropies(4oftheprojecteddatadensity,,andthestandardnormaldensity:
(6
OurextensionofJonesandSibson’sindextothreedimensionsintermsofthree-dimensionalcumulants
is:

Oneimportantpropertyofmomentsindicesofthetype(6and(7isthattheyarerotationallyinvariantwithrespecttoanychoiceofaxesfortheprojectionspace
.Anothersetofvectors
canbechosenthatrepresentthesame
projectionspaceasremainthesameon
butdifferbyarotation.Wewouldwanttheindextoason
sinceitistheprojectionspacethatmatters
notthewayinwhichwishtorepresentit.Surprisinglymanyindicesdonothavethisproperty(FriedmanandTukey(1974andFriedman(1987althoughmorerecentindicesdo(forexampleMorton(1989andCooketal.(1993.Theinvarianceisimportantduringtheoptimizationsequencesinceoptimizerscanspendtimechangingtherepresentationwhentheyshouldbechangingtheprojectionspace.Alsoitissometimesusefultobeabletorotatetherepresentingaxestoaidinterpretabilityoftheprojectionsolutionwithoutchangingtheprojectionindex.WediscussthisprocedureinSection3.5.
Estimationoftheprojectionindex
Kendalletal.(1969describedaclassofunbiasedestimatorsforcumulantsofanyorderknownas-statistics.The-statisticsarecomputedfromtheprojectedsphered
dataandaredependentontheprojectionvectors.ThereforeJonesandSibson’sone-dimensionalsampleindexwasobtainedbyreplacingcumulantswith-statisticsin
equation(5:
(9
and
(10
Inthetwo-dimensionalcasethebivariatecumulants
arereplacedbybivariate
-statistics
andsimilarlyforthethree-dimensionalindex.Theformulaefor
andweregivenbyKendalletal.(1969,page280andaremodifiedforsphered
datain(9and(10.Kendalletal.gaveonlyafewoftheformulaefortrivariate
7

-statistics,buttheyalsogaveanalgorithmforderivinganarbitraryorder-statistic
fromtheunivariateones.
AutomatingKendall’salgorithmusingcomputeralgebra.
WerepeatKendall’salgorithmherebecauseitisagoodexampleofaprocedurethatmaybeautomatedusingcomputeralgebra.Supposethatwewishedtoobtainthebivariate-statistic
.Wewouldstartwiththeformulaof
intermsofpower
sums:
(11
Toproducethebivariateformulawemustoperateon(11withtheoperator
Finally,replacingpowersbysubscriptsanddividingbothsidesbythreeandobtain:
on(12,whichwouldintroducea
newvariable
anddifferentiatethe
andobtaintheformula:
operatortothepowersum
.InREDUCEthisoperationisprogrammedinas
8

FORALLU,R,N,KKLETOP(U,SF(KK*R**N,R=N*SF(KK*U*R**(N-1;ThislinesaysthatforallinstancesofthevariablesU,R,N,KKwhenevertheoperatorOPisappliedtoargumentsU,SF(KK*R**N,Rwegettheresultontheright-handsideofthe=sign.Inthisexamplewehaveeffectivelybuiltourowndifferentiationoperator.REDUCEdoeshaveitsownoperatorcalledDIFFwhichwecouldhaveusedhere.Althoughwebelievethatmerelyproducingthetrivariate-statisticsisenough
reasontouseREDUCEthereareothermorecompellingones.REDUCEisabletoproducebothtypesettinginstructionsandFORTRANcodefortheformulaemakingincorporationintodocumentsandcomputerprogramseasyanderrorfree.
3.3Computingtheprojectionindex
ThealgorithmthatweuseisthatofJones(1983butmodifiedforthree-dimensions.ThelogicofthealgorithmisdepictedinFigure1.Therationaleforcomputingthe
-statisticsinthiswayoriginallystemsfromthewayinwhichKendalletal.(1969
represent-statisticsintermsofpowersums,andpowersumsintermsofthedata.
Firstthethirdandfourthorderproductmomenttensorsarecomputedfromthesphereddata
Figure1here
by
(13(14
Allevaluationsofthemomentsindexanditsderivativesaremadeusingonly
and
.Thelinkbetweenthemomentsindexand
isthebasisofthemomentsindex’s
computationalefficiencyandisdiscussedlaterinthissection.
Nextthepowersumsarecomputedfromcurrentprojectionusingformulaesuchas
(15(16
Thecomputationoftheprojectionindexrequires10thirdorderpowersumsand15fourthordersums.AcompletelistwaspresentedbyNason(1992.
9

Finally,afterthe-statisticshavebeencomputedtheprojectionindexandits
derivativesarecomputedwithrespecttotheprojectiondirection.Thederivativesarecomputedbecausetheysupplytwousefulpiecesofinformation:theyinformusofourproximitytoalocalmaximumandtheyindicatewhichdirectionshouldbefollowedtoincreasetheindex.Optimization
Mostoptimizationmethodsfindlocaloptimaandnottheglobaloptimum.Thisisanadvantageasalocaloptimumindicatessomedeparturefromnormalityandtheprojectionsolutioncanbequicklyexaminedforanypossiblestructure.Forprojectionpursuitwebelievethatanyreasonableoptimizerislikelytobeofuse.
Manyoptimizationmethodshavebeenusedpreviously.Forexample:steepestascent(JonesandSibson(1987;geneticalgorithms(Crawford(1991;acoarsesteppingandNewtonmethodhybrid(Friedman(1987andmethodsbasedonthegrandtour(Posse(1990.
Weusethemethodofconjugategradientsandthe
implementationsuppliedbytheNetLibarchive.
Theprojectionpursuitoptimizationproblemisconstrained.Mostoftheoptimizersmentionedabovearedesignedforunconstrainedproblems.
Toallowforthis
Friedman(1987maintainedorthogonalprojectionvectorsbymodifyingthederivativeoftheprojectionindexsothatastepinthemodifieddirectiondidnotviolatetheorthogonalityconstraints.WeusethecrudermethodusedbyJonesandSibson(1987whichreorthogonalizestheprojectionvectorsafteranoptimizationstephasbeentaken.
Computingthederivatives
Tooptimizetheprojectionindexefficientlyitisnecessarytoknowthederivativesoftheprojectionindex.Giventheprojectionspace
weneedtofind

ofthepowersumsviathechainrule.Eachpowersuminvolvedinthecomputationmustbedifferentiatedwithrespecttoeachcomponentofthethreeprojectionvectors.Forexample,thepowersum
isacomponentoftheprojectionindex,andthe
derivativeswithrespecttotheprojectionvectorsare:
(18

cluster.Toalleviatetheoutlyingprojectionproblemwewilleitherremoveoutliersortrimthem.Trimminginvolvesshrinkingoutliers’distancesettoanewdistance
tothecentroidofthe
.Weimplementtwopossiblechoicesforsuggestedby
Tukey(1987;theyare
and
L
(20
S

findtheorthogonalmatrix
thatmaximizesthecriterion:

thedataprojectedontotheoptimalprojectionplane(inbothspheredandoriginalcoordinatesystems;
arecordoftheprojectionindexforeachiteration;
thenumberofiterationsexpendedandthemaximumpossible;thesphereddata;
thespheringtransformationmatrixanditsinverse;
themodulusofthegradientoftheprojectionindexatterminationandthetoleranceusedtodecideconvergence
theinitialandfinalprojectionvectors.
4.2Atetrahedralexample
Althoughthedatasetdescribedinthisexampleisartificialitissimilartorealmultivariatedatainthattherearelooseclustersrepresentingdifferentpopulationsandothervariablesthatdonotdiscriminatebetweenpopulations.Forexample,thewell-knownLubischew(1962beetledatasetisofthistype,butitsstructureisusuallyclearinone-dimensionandnotchallengingenoughforthree-dimensionalpursuit.Wecreatea6-dimensionaldatasetcontaining400observationsandpossessingatetrahedralstructureinthefirstthreeandnoiseintheremainingthreedimensions.Thereisnosingleone-ortwo-dimensionalprojectionthatwillgiveacompleteideaofthetruethree-dimensionalstructure.ThesquaresoftheprincipalcomponentsofthetetrahedraldataappearinTable1.Thesquaresareshowntoemphasizethecontributionof
eachelementandconsequentlythesumofeachcolumnis1.EachrowofthetableTable1herecorrespondstothevariablesthatthetetrahedraldatawereoriginallyrecordedon.Evenknowingthattheclusteringisconcentratedinthefirstthreevariablesisnohelphere.TheonlyPCsthatmightbeofsomehelpindiscerningtheclustersare2,3and4buteventhesecontainsomeproportionofthenoisevariables.ItisbettertoexaminethedatawithrespecttothesethreePCsusingathree-dimensionaldataviewersuchasbrush(inSPlusorXGobi.Weobviouslycannotshowthethree-dimensionalprincipalcomponentspictureherebuttheclusteringisverydifficulttoseewithout
14

givingeachpointagrouplabel(whichdefeatstheaimofexploratorymethodswhereyoumaynotknowanystructurebutyouaretryingtofindit.
Three-dimensionalprojectionpursuitdoesmuchbetter.However,forthissettrimmingwasrequiredtoobtainagoodsolution.Itisdifficulttoknowwhentotrimdataandbyhowmuch.GenerallyTukey’sadvicefromSection3.3istaken.Thatis,thesphereddataaretrimmediftheirdistancetothecentroidislargerthan1,althoughthiscansometimesberelaxedasinthisexamplewherepointsaretrimmedif
.
ThetetrahedraldatawereputintoanSmatrixcalledtetra.Table2showssquaresofelementsoftheoptimalprojectionsolutionarrivedatafterissuingthecommand:>results<-pp3(tetra,trim.action="log",limit=2.4
followedbytheaxesrotationproceduredescribedinSection3.5.ItispatentlyclearinTable2hereTable2thatthethree-dimensionalpursuithasextractedthetetrahedralstructure.Forexample,thefirstcolumninTable2hasmostofitsweightassociatedwiththesecondoriginalvariable,thesecondcolumnwiththefirstoriginalandthethirdcolumnwiththethirdoriginalvariable.Oncemoreitisimpossibletoproperlyshowthethree-dimensionalprojectionsolution(sincethepaperonlyhas2dimensions.Figure2showsthesolutionusingthemethodofdisplayingtwovariablesonascatterplotand
codingthethirdassquaresize.ThedatainFigure2separateintofourgroups.TheFigure2herelargestsquaresappearinthetop-lefthandportionandareoverlaidwiththesmallestsquareswhichlooklikedots.Thesearetwogroupsseparatedinthethirddimension.Theothertwogroupsareinthetop-righthandportion(mediumsizedsquaresandthelowerhalfoftheplot(nextsmallestsquares.
Finally,ifthree-dimensionalprojectionpursuitisappliedtothetetrahedraldatausingthesecond,thirdandfourthPCsasastartingprojectionthenthealgorithmconvergestotheprojectionpursuitsolutionshownhere.Themomentsindexwasinitially8.47andincreasedto9.73.Thenormofthegradientwasinitially0.81anddecreasedto
.SomePCsoftenprovideareasonablestartingprojectionspace.
4.3Analyzingmultispectraldata
Thethree-dimensionalalgorithmandsoftwareweredevelopedprimarilytoapplythemtomultispectralimagedata.Multispectralimagedatarecordsthesameimage
15

scannedatmanydifferentfrequencies.AlltherealimagesthatweusetoillustrateourexamplesareimagesofChewValleyLakeinSomerset,UKandhavebeenscannedbyaDaedalusAADS1268thematicmapperfromanaeroplaneataaltitudeof2500metres.TheDaedalusscannedelevenfrequenciesandthesearelistedinTable3.Eachimageateachfrequencyconsistsof
pixels(=896610pixelsinalland
thevalueofeachpixelhasarangefrom0to255.Thedatacanbethoughtofasanimageframework;thatisthereare11imageseachofdimension1254by715ortheycanbethoughtofasastandardmultivariatesetwith11dimensionsand896610observations.Weshallrefertothesetwoaspectsas“image-space”and“variable-space”.Clusteringcanoccurinbothspaces.Usuallyspatialclustersinimage-space(fields,lakes,roadsetc.correspondto(partsofclustersinvariable-space.However,clustersinvariablespaceusuallycorrespondtoacollectionofspatialfeaturesinimage-space.Forexample,invariable-spaceseveralwheatfieldswilloccupyoneareabuttheycouldbespreadasapatchworkacrossthelandscapeinimage-space.
Twoofthemainobjectivesfortheanalysisofmultispectraldataare:
thevisualexaminationoftheimages;classificationofpixelsintolandtypes.
Visualexaminationoftheimagescanbecarriedoutinseveralways.Eachfrequencyintheimagecanbeviewedseparatelyasagreyscaleimageorthreeimagesmaybecombinedtoformacolourimagebyassigningonescannerfrequencytoeachofthered,greenandbluegunsofacolourdisplay.Thesetwomethodsareanalogoustoexaminingvariablesseparately(asadensityestimateperhapsoraspairwisescatterplots.Botharesimplemethodsbuttheirusefulnessshouldnotbeunderestimated.
Scannerfrequenciesmaybecombinedinseveralwaystoprovidecolourimages.Thereisthesimpleassignmentmentionedabovealthoughwithscannerfrequencies
thereare
Table3here
with
therearealready990differentassignments.Clearlywithmanymore
scannerstheproblemsquicklybecomessevere.
Onewell-knownapproachtoviewingimagedatainvolvesdisplayingthedatawithrespecttotheirPCs.InthisguisePCAisactingasadimensionreductiontechnique.Dimensionreductionisespeciallyusefulherebecauseimagesfromscannerfrequenciescloseinfrequencyareusuallyhighlycorrelated.Forexample,Table4displaysthecorrelationbetweensomeofthescannerfrequenciesforasmallsubsectionofthemainChewValleyimage.
Clusteringinthemultidimensionalspacecananddoesappearwhenthedataareprojectedwithrespecttotheiirrprincipalcomponents.Forviewingpurposestightclusteringinthevariable-spacecorrespondstohomogeneouscolouringofareasoflandinimage-space.Whatisrequiredisnotonlyadimensionreductiontechniquebutonethatpreservesorseeksoutclusteringinlowdimensions.Thisisbecauseifclusteringexistsinhigherdimensionswedonotwanttoloseitthroughdimensionreduction,asthatwillcauselossofcontrastinimage-space.Theotherobjective,ofclassifyingpixels,isaidedbydimensionreductiontechniquesthatsearchoutclusters.Huber(1985notedhowtheperformanceofvariousclassificationtechniquesdeterioratedinhighdimensionsandthereforegoodcluster-preservingdimensionreductiontechniquesarenecessary.
Quiteoftenlargevariationisduetoseparatedclusters,butnotalways,asthetetrahedralexampleshowedintheprevioussection.Asaresultweproposethree-dimensionalprojectionpursuitasacomplementtoPCA.WedonotrejectPCAbecauseitisausefulmethod,itisrapidlycomputedandwidelyunderstood.Finally,weproposeusingthethree-dimensionalmomentsindexbecausetheimagedatasetsarelargeandrequireacomputationallyefficientindex.AnexampleusingtheChewValleydataToillustrateandcomparethemethodsasmall
Table4here
pixelsectionoftheChew
Valleyimageisused.Theimagethatwehaveselectediscentredonthesailingclubonthelake.Theimageincludeswater,buildings,roads,treesandjetties!(ApproximateOSMapreferenceST568168.Colourimagescannotbedisplayedhere.However
17

grey-scaleimagescaneasilybedisplayed.InthefollowingexamplebothPCAandthree-dimensionalprojectionpursuitareperformedontheimagesection.WeperformPCAonthecorrelationmatrix.Wecouldhaveusedthecovariancematrixbutwewishtoconcentrateonclusteringandarenotreallyinterestedinlargevarianceinanyparticulardirection.PerformingPCAonthecorrelationmatrixisvalidandindeedrecommendedincaseswheretheindividualsamplevariancesdiffersubstantiallyinorderofmagnitude(ChatfieldandCollins(1980[Section4.4].
Fortheprojectionpursuitaslightlyelaborateprocedureisadopted.Afterthepursuitathree-dimensionaldatasetisobtainedandeachofthedimensionscouldbeassignedtoacolour.Alternativelytherepresentationcouldberotatedlikevarimaxandtheneachvariableassignedtoacolour.Thiswouldrelatecolourstotheoriginalvariableswhichmayaidinterpretation.Whatweactuallyhavedonemaybesurprising:weapplyprincipalcomponentstothethree-dimensionalpursuitsolution.Typicallyasuccessfulpursuitsolutioncontainswell-definedclustersandthefirstprincipalcomponentofthisexhibitsthemostwell-definedcluster.WithacolourdisplayonepossiblerulecouldassignthefirstPCofthepursuitsolutiontored,thesecondPCtothegreenandthethirdtoblue.Thiswouldensurethatthemaximumcontrastwouldbeappliedtothecolourthat(mosthumaneyesaremostsensitiveto(Feynman(1963althoughclearlythisisnottheonlyassignmentandeyesensitivitiesvarydramaticallyfrompersontoperson.Itisthisfirstprincipalcomponent(ofthepursuitsolutionthatwedisplaybelow.WeemphasizethatthisisnotthesameasthefirstPCofthedata.
Figures3and4shownormalkerneldensityestimatesoftheintensitiesfromthefirsttruePCandthefirstprincipalcomponentoftheprojectionpursuitsolution.ThesearethemostmultimodalprojectionsoutofallthestandardPCsandcomponentsofpursuitsolutions.TheprojectionpursuitderivedestimateismoremultimodalthanthefirstPCestimate.Thislendssupporttotheclaimthatprojectionpursuitiscapableof
findingmoreinterestingprojectionsthanPCA.ThemonoimagescorrespondingtotheFigure3here
Figure4here
densityestimatesofFigures3and4arepresentedintheleft-handimagesinFigures5and6.
WeusedthelocalminimafromthedensityestimatestodividetheimagesFigure5here
Figure6here
intodifferentland-usetypes.Eachoftheright-handsideimagesinFigures5and6isdividedintoregionsdefinedbythedensityestimatedivisionsandeachpixelisgrey
18

shadeddependingonitsintensityinthepursuitorPCAsolution.Forexample,theverybrightpatchesinFigure5correspondtothemodeattheextremerightofthedensityestimateinFigure3.
Astheprojectionpursuitsolutionhasonemoremodewecanidentifyanothertypeoflandwithanewshadeofgrey.Thereare5greyshadesontheprojectionpursuitclassificationand4onthePCApicture.ThedifferencesbetweentheclassificationscanbemoststrikinglyseenontheshorewhereprojectionpursuithassubdividedthewhiteareaonthePCApictureintotwogroupsandshadedthemwhiteandlightgrey(rightpicturesinFigure5and6.Indeed,nootherPCmakesthisdistinction.Itisonlyvisiblewiththepursuitsolution.Whatisevenmorefascinatingisreferringbacktotheleft-handpicturesinFigures5and6.Theareasthataredifferentiatedbytheextragreyleveldoseemtocorrespondtodifferentgroundtypes.Onegreylevelcorrespondstoagrid-likenetworkalignedwiththejettiesandtheothertomaterialinbetween.Theregularityofthenetworksuggeststhatthisisprobablyman-madeandthatprojectionpursuithasdiscoveredarealfeature.However,projectionpursuitcanonlyfulfilanexploratoryoleandagroundvisitwouldbenecessarytoconfirmtherealityofsuchfeatures.
NaturallyotherPCsshowinterestingspectralbandcombinationsthatprojectionpursuitdoesnotfind.Weclaimonlythatprojectionpursuitisanextratoolforfindingsuchcombinations.Theinteresthereliesinthegreatermultimodalityofthepursuitsolutionwhencomparedtothefirst(oranyPC.Thereforeprojectionpursuitwouldbeofvalueasanautomaticbandcombinationandselectiontoolbecauseitistunedforclustersandnotjustlargevariation.
5Conclusionsandfurtherwork
Thisarticleshowsthedevelopmentandapplicationofathree-dimensionalprojectionpursuitpackagebasedonathree-dimensionalextensionoftheJonesandSibson(1987momentsindex.Theworkinvolvedinthedevelopmentoftheindexwasgreatlyreducedbytheuseofcomputeralgebrathatpermittedthearbitrarycomputationoftrivariate-statistics.Wehavedescribedhowtousethepursuitwithinthestatistical
19

packageSusingafreelyavailablepackage.Thepotentialofpursuitonrealandsimulateddatahasbeendemonstratedanditsperformancecomparedtoprincipalcomponents.
Furtherworkwillneedtoinvestigatethechoiceofoutliertrimmingandlimitasthissometimesdeterminesthequalityoftheprojectionsolutions.
Acknowledgments
TheworkreportedherewassupportedpartlybyagrantfromtheUKScienceandEngineeringResearchCouncil(SERC.TheauthorwasagratefulrecipientofaSERCResearchStudentship.ThemultispectralimagesdescribedinSection4.3weresuppliedbyNERCComputerServices,UK.HeisgratefultoRobinSibsonforhelpfulcommentsandadvice,andtoMerrileeHurnandBernardSilvermanformanyhelpfulcommentsonanearlierversionofthisarticle.
References
Becker,R.A.,Chambers,J.M.andWilks,A.R.(1988.TheNewSLanguage.Pacific
Grove,CA:WadsworthandBrooks/Cole.
Chatfield,C.andCollins,A.J.(1980.IntroductiontoMultivariateAnalysis.London:
ChapmanandHall.
Cook,D.,Buja,A.andCabrera,J.(1993.
Projectionpursuitindicesbasedon
expansionswithorthonormalfunctions.J.Comput.Graph.Statist.,2,225–250.Crawford,S.L.(1991.
Geneticoptimizationforexploratoryprojectionpursuit.
InComputerScienceandStatistics:Proc.23rdSymp.Interface(ed.E.M.Keramidas,pp.318–321.FairfaxStation,VA:InterfaceFoundation.
Feynman,R.P.(1963.TheFeynmanLecturesonPhysics.Vol.1.Reading,Mass.:
Addison.
Friedman,J.H.(1987.Exploratoryprojectionpursuit.J.Am.Statist.Ass.,82,249–
266.
20

Friedman,J.H.andTukey,J.W.(1974.Aprojectionpursuitalgorithmforexploratory
dataanalysis.IEEETrans.Comput.,23,881–890.
Hall,P.(1989.Onpolynomial-basedprojectionindicesforexploratoryprojection
pursuit.Ann.Statist.,17,589–605.
Huber,P.J.(1985.Projectionpursuit(withdiscussion.Ann.Statist.,13,435–525.Jones,M.C.(1983.TheProjectionPursuitAlgorithmforExploratoryDataAnalysis.
PhDThesis,UniversityofBath.
Jones,M.C.andSibson,R.(1987.Whatisprojectionpursuit?(withdiscussion.J.
R.Statist.Soc.A,150,1–36.
Kaiser,H.F.(1958.Thevarimaxcriterionforanalyticrotationinfactoranalysis.
Psychometrika,23,187–200.
Kendall,M.G.andStuart,A.(1969.TheAdvancedTheoryofStatistics.3rdedn.Vol.
1.London:Griffin.Lubischew,A.A.(1962.
Biometrics,18,455–477.
Mardia,K.V.(1987.DiscussionofthepaperbyDrJonesandProfessorSibson.J.R.
Statist.Soc.A,150,22.Morton,S.C.(1989.
InterpretableProjectionPursuit.
TechnicalReport106.
Ontheuseofdiscriminantfunctionsintaxonomy.
DepartmentofStatistics,StanfordUniversity,Stanford,California.
Nason,G.P.(1992.DesignandChoiceofProjectionIndices.PhDThesis,University
ofBath.
Nason,G.P.(1994.PP3:Three-dimensionalprojectionpursuitinS.Available
viaanonymousFTPfromftp.stats.bris.ac.ukinthedirectory/pub/software/pp3/asthefilepp3.shar.gz.
Posse,C.(1990.Aneffectivetwo-dimensionalprojectionpursuitalgorithm.Comm.
Statist.Simul.Comput.,19,1143–1164.
21

Swayne,D.F.andCook,D.(1990.XGobi.AvailablefromtheStatLibarchive.
AnonymousFTPfromlib.stat.cmu.edu.Swayne,D.F.,Cook,D.andBuja,A.(1991.
User’sManualforXGobi,a
dynamicgraphicsprogramforDataAnalysisImplementedintheXWindowSystem(Release2.AvailablefromtheStatLibarchive.AnonymousFTPfromlib.stat.cmu.edu.
Tukey,J.W.(1987.DiscussionofthepaperbyDrJonesandProfessorSibson.J.R.
Statist.Soc.A,150,33.
Tukey,P.A.andTukey,J.W.(1981.Preparation;prechosensequencesofviews.In
InterpretingMultivariateData(edV.Barnett,pp.189–213.Chichester:Wiley.
22

ListofFigures
1Theprojectionpursuitalgorithm....................2
Projectionpursuitsolutionfromtetrahedraldata.
Thedatawith
respecttothethirdprojectiondirectioniscodedasthesizeofeachsquare.(Optimalprojectionindexis............3
Kerneldensityestimateofprojectionpursuitsolution(1stPCofsailingclubimage............................
4Kerneldensityestimateof1stPCofsailingclubimage........5
Projectionpursuitsolution,firstPC(left,classificationfromdensityestimate(right.............................
6RealfirstPC(left,classificationfromdensityestimate(right....
23
26
27
2829
3030

Original
1
123456
PrincipalComponentNumber
3
0.0200.5180.1190.2410.0080.093
0.6640.0130.0030.0730.0860.162
5
0.0640.3340.0010.0970.1480.357
Table1:Squaresofelementsofprincipalcomponentsoftetrahedraldata
Original
ProjectionVectors10.014
2
0.017
rest
0.1820.029
0.89630.025
Wavelength(m
1234567891011
violetblue
green,yellow,orange
redredrednearIRnearIRnearIRnearIRthermalIR
Table3:SpectralfrequenciessensedbyNERCDaedalusthematicmapper
24

Channel2
0.97
4
0.91
6
0.34
9
0.79
11
0.86
0.87
0.93
0.83
0.75
0.93
1
0.45
0.45
0.60
0.97
1
0.98
0.98
1
1

Data,X
SpheredData
Y
ProductMoment
TensorsT,U
Initialprojectiondirections(a,b,c
PowerSums
s
Modifyprojection
directions(a,b,c
k-statistics
k
Projectionindex
No
andderivatives
optimality?
projection
Yes
solution
Figure1:Theprojectionpursuitalgorithm
26

Axis3
-2.0-1.5-1.0-0.50.00.51.0
-10Axis2
12
Figure2:Projectionpursuitsolutionfromtetrahedraldata.Thedatawithrespecttothethirdprojectiondirectioniscodedasthesizeofeachsquare.(Optimalprojectionindexis
.
27

Densityestimate
0.00.0050.0100.015
050100150200250
Projectionpursuit:firstPC
Figure3:Kerneldensityestimateofprojectionpursuitsolution(1stPCofsailingclubimage
28

Densityestimate
0.00.0040.0080.012
050100150200250
FirstPC
Figure4:Kerneldensityestimateof1stPCofsailingclubimage
29

Figure5:Projectionpursuitsolution,firstPC(left,classificationfromdensityestimate(right
Figure6:RealfirstPC(left,classificationfromdensityestimate(right
30

本文来源:https://www.2haoxitong.net/k/doc/12049b858762caaedd33d48a.html

《Three-dimensional projection pursuit.doc》
将本文的Word文档下载到电脑,方便收藏和打印
推荐度:
点击下载文档

文档为doc格式