Quantitative Evaluation of Machine Translation Systems Sentence Level 1 Universidade de Lis
>>>>QuantitativeEvaluationofMachineTranslationSystems:SentenceLevel
PalmiraMarrafa1andAntónioRibeiro2
UniversidadedeLisboaFaculdadedeLetras
GroupofLexicalandGrammaticalKnowledge
Computation(CLULAvenida5deOutubro,85–5ºP–1050–050Lisboa,PortugalPalmira.Marrafa@netcabo.pt
Abstract
Thispaperreportsthefirstresultsofanon-goingresearchonevaluationofMachineTranslationquality.ThestartingpointforthisworkwastheframeworkofISLE(theInternationalStandardsforLanguageEngineering,whichprovidesaclassificationforevaluationofMachineTranslation.Inordertomakeaquantitativeevaluationoftranslationquality,wepursueamoreconsistent,fine-grainedandcomprehensiveclassificationofpossibletranslationerrorsandweproposemetricsforsentencelevelerrors,specificallylexicalandsyntacticerrors.
MachineTranslationevaluation,translationqualitymetrics
1
UniversidadeNovadeLisboaFaculdadedeCiênciaseTecnologiaDepartamentodeInformática
QuintadaTorreMontedaCaparica
P–2829–516Caparica,Portugal
ambar@di.fct.unl.pt
2
Keywords
Introduction
MuchworkhasbeendoneonevaluationofMachineTranslationinthelasttenyears(see,forexample,Balkan,1991;Arnoldetal.,1993;Vasconcellos,1994;Whiteetal.,1994;EAGLES,1996;WhiteandO’Connell,1996;White,forthcoming.AcommongoalhasbeenthedesignofevaluationtechniquesinordertoreachamoreobjectiveevaluationofMachineTranslationqualitysystems.
However,theevaluationofMachineTranslationhasbeensubjectivetoagreatextent.ISLE(theInternationalStandardsforLanguageEngineeringaimsatreducingsubjectivityinthisdomain.ItprovidesaclassificationofinternalandexternalcharacteristicsofMachineTranslationsystemstobeevaluatedinconformitywiththeISO/IEC9126standard(ISO1991,whichconcernsqualitycharacteristicsofsoftwareproducts.Itassumestheneedofaquantitativeevaluationleadingtodefinitionofmetrics.
However,thatclassificationisnotfine-grainedenoughtoevaluatethequalityofmachinetranslatedtextsregardingthepossibletypesoftranslationerrors.Thus,inthiswork,weproposeamoreconsistent,fine-grainedandcomprehensiveclassificationattheindividualsentencelevel.Ourclassificationtakesintoaccounttheinternalstructureoflexicalunitsandsyntacticconstituents.Moreover,weproposemetricstomakeanobjectivequantitativeevaluation.Thesemetricsarebasedonthenumberoferrorsfoundandthetotalnumberofpossibleerrors.Thestructuralcomplexityofthepossibleerrorsisalsoconsideredinthemetrics.
WeselectedsomepertinentcharacteristicsfromtheISLEclassificationtomeasurethequalityofsentenceleveltranslations,concerninglexicalandsyntacticerrors,includingcollocations,fixedandsemi-fixedexpressionsforlexicalevaluation.Asforsyntacticerrors,webuiltatypologyoferrors.
OurmethodologywasmotivatedbyEnglish,FrenchandPortugueseparalleltextsfromtheEuropeanParliamentsessionsandalsobytranslationsobtainedfromtwocommercialMachineTranslationsystems.
Inthenextsection,wepresentamotivationfortherefinementofthetaxonomywithsomeexamples.Afterthat,wesummarisetheclassificationanddefinethemetricsusedfortheevaluation.Inthefollowingsection,wediscusssomeprevious