research & summaries project and need support to help me learn.
i have tamplet and i need summarize article 6 pages to 2 pages with samll table in word
Requirements: | .doc file
DigiFace-1M:1MillionDigitalFaceImagesforFaceRecognitionGwangbinBaeUniversityofCambridgegb585@cam.ac.ukMartindeLaGorceMicrosoftmadelago@microsoft.comTadasBaltruˇsaitisMicrosofttabaltru@microsoft.comCharlieHewittMicrosoftchewitt@microsoft.comDongChenMicrosoftdoch@microsoft.comJulienValentinMicrosoftjuvalen@microsoft.comRobertoCipollaUniversityofCambridgerc10001@cam.ac.ukJingjingShenMicrosoftjinshen@microsoft.comAbstractState-of-the-artfacerecognitionmodelsshowimpressiveaccuracy,achievingover99.8%onLabeledFacesintheWild(LFW)dataset.Suchmodelsaretrainedonlarge-scaledatasetsthatcontainmillionsofrealhumanfaceimagescollectedfromtheinternet.Web-crawledfaceimagesareseverelybiased(intermsofrace,lighting,make-up,etc)andoftencontainlabelnoise.Moreimportantly,thefaceimagesarecollectedwithoutexplicitconsent,raisingethi-calconcerns.Toavoidsuchproblems,weintroducealarge-scalesyntheticdatasetforfacerecognition,obtainedbyrenderingdigitalfacesusingacomputergraphicspipeline1.Wefirstdemonstratethataggressivedataaugmentationcansignificantlyreducethesynthetic-to-realdomaingap.Hav-ingfullcontrolovertherenderingpipeline,wealsostudyhoweachattribute(e.g.,variationinfacialpose,acces-soriesandtextures)affectstheaccuracy.ComparedtoSyn-Face,arecentmethodtrainedonGAN-generatedsyntheticfaces,wereducetheerrorrateonLFWby52.5%(accu-racyfrom91.93%to96.17%).Byfine-tuningthenetworkonasmallernumberofrealfaceimagesthatcouldreason-ablybeobtainedwithconsent,weachieveaccuracythatiscomparabletothemethodstrainedonmillionsofrealfaceimages.1.IntroductionLearning-basedfacerecognitionmodels[29,23,33,35,8,15,24,18]useDeepNeuralNetworks(DNNs)toencodethegivenfaceimageintoanembeddingvectoroffixeddi-1DigiFace-1Mdatasetcanbedownloadedfromhttps://github.com/microsoft/DigiFace1Mmension(e.g.,512).Theseembeddingscanthenbeusedforvarioustasks,suchasfaceidentification(whoisthisperson)andverification(aretheythesameperson).Tolearndiverse,discriminativeembeddings,thetrainingdatasetshouldcon-tainalargenumberofuniqueidentities.Tolearnrobustembeddings,i.e.,whicharenotsensitivetothechangesinpose,expression,accessories,cameraandlighting,thedatasetshouldalsocontainasufficientnumberofimagesperidentitywiththesevariations.Publiclyavailablefacerecognitiondatasetssatisfyboth.MS1MV2[8]contains5.8Mimagesof85Kidentities(approx.68imagesperID).RecentlyreleasedWeb-Face260M[43]contains260Mimagesof4Midentities(ap-prox.65imagesperID).Whilesuchdatasetshavedrivenrecentadvancesinfacerecognitionmodels,therearesev-eralproblemsassociatedwiththem.(1)Ethicalissues.Large-scalefacerecognitiondatasetsareoftencriticizedforethicalissuesincludingprivacyvi-olationandthelackofinformedconsent.Forexample,datasetslike[39,12,8,43]areobtainedbycrawlingwebimagesofcelebritieswithoutconsent.Toincreasethenum-berofidentities,somedatasetsexploitedtheterm“celebri-ties”toincludeanyonewithonlinepresence.Datasetslike[17,26]collectedfaceimagesofthegeneralpublic(includ-ingchildren)fromFlickr[3].ProjectslikeMegaPixels[4]areexposingtheethicalproblemsofsuchweb-crawledfacerecognitiondatasets.Followingseverecriticism,publicac-cesstoseveraldatasetshasbeenremoved[2].(2)Labelnoise.Webimagescollectedbysearchingthenamesofcelebritiesoftencontainlabelerrors.Forexample,theLabeledFacesintheWild(LFW)dataset[14]containsseveralknownerrorsincluding:(1)mislabeledimages;(2)distinctpersonswiththesamenamelabeledasthesameper-
Figure1.Examplesofsyntheticfaceimagesinourdataset.Ourdatasetcapturesawidevarietyoffacialgeometry,pose,textures,expres-sions,accessoriesandenvironments.son;and(3)thesamepersonthatgoesbydifferentnameslabeledasdifferentpersons.(3)Databias.Facerecognitionmodelsaregenerallytrainedandtestedoncelebrityfaces,manyofwhicharetakenwithstronglightingandmake-up.Celebrityfacesalsohaveimbalancedracialdistribution(e.g.,84.5%ofthefacesinCASIA-WebFace[39]areCaucasianfaces[34]),leadingtopoorrecognitionaccuracyfortheunder-representedracialgroups[34].Inordertocircumventalltheseissuesthataffecttheex-istingrealfacedatasets,weintroduceanewlarge-scalefacerecognitiondatasetconsistingonlyofphoto-realisticdigitalfaceimagesrenderedusingacomputergraphicspipelineandmakethisdatasetavailabletothecommunity.Specifi-cally,webuilduponthefacegenerationpipelineintroducedbyWoodetal.[36],tailoringtheamountofvariabilityforeachattribute(e.g.,poseandaccessories)forourrecogni-tiontask,andgenerate1.22Mimageswith110Kuniqueidentities.Eachidentityisgeneratedbyrandomizingthefacialgeometryandtextureaswellasthehairstyle.Thegeneratedfaceisthenrenderedwithdifferentposes,ex-pressions,haircolor,hairthicknessanddensity,accessories(includingclothes,make-ups,glasses,andhead/facewear),camerasandenvironments,toencouragethenetworktolearnarobustembedding.Figure1showsexamplesofsyn-theticfaceimagesinthisnewdataset.Wegenerated1.22Mimages,butinpracticethenumberofidentitiesandimagesyoucangeneratewithsyntheticspipelineisonlylimitedbythecostofgeneratingandstoringtheseimages.Digitalsyntheticfacescansolvetheaforementionedproblemsassociatedwiththerealfacedatasets.Firstly,thegeneratedfacesarefreeoflabelnoise.Secondly,thebiasinlighting,make-upandskincolorcanbereducedaswehavefullcontroloverthoseattributes.Mostimportantly,thefacegenerationpipelinedoesnotrelyonanyprivacy-sensitivedataobtainedwithoutconsent.ThisisacriticaldifferencefromtheGAN-generatedsyn-theticfaces;faceGANsrely(eitherdirectlyorindirectly)onlarge-scalerealfacedatasetstotrainsomecomponentsoftheirpipeline,leavingunresolvedethicalproblems.Forex-ample,arecentmethodcalledSynFace[28]wastrainedonsyntheticfacesgeneratedusingDiscoFaceGAN[9].Whilethegeneratedfaceimagesarefreeoflabelnoise,millionsofrealfaceimageswereusedfortrainingDiscoFaceGAN.TheGANsmayalsoinheritanybiasthatexistsintherealfaceimagesusedtotrainthem.Forourdataset,only511facescans,obtainedwithconsent,wereusedtobuildaparametricmodeloffacegeometryandtexturelibrary[36].Fromthislimitedsourcedata,wecangenerateinfinitenum-berofidentities,makingourapproacheasilyscalable.Ourcontributionscanbesummarizedasbelow:•Wereleaseanewlarge-scalesyntheticdatasetforfacerecognitionthatisfreefromprivacyviolationsandlackofconsent.Tothebestofourknowledge,ourdataset,containing1.22Mimagesof110Kidentities,isthelargestpublicsyntheticdatasetforfacerecognition.•ComparedtoSynFace[28],whichistrainedonGAN-generatedfaces,wereducetheerrorrateonLFWby52.5%(accuracyfrom91.93%to96.17%).Forfivepopu-larbenchmarks[14,30,41,25,42],theaverageerrorrateisreducedby46.0%(accuracyfrom74.75%to86.37%).•Wedemonstratehowtheproposedsyntheticdatasetcanbeusedinconjunctionwithasmallnumberofrealfaceimagestosubstantiallyimprovetheaccuracy.Thissim-ulatesascenariowhereasmallnumberofcurated(i.e.,nolabelnoiseandreducedbias)realfaceimagesarecol-lectedwithconsent.Byfine-tuningournetworkwithonly120Krealfaceimages(i.e.,2%ofthecommonly-usedMS1MV2dataset[8]),weachieve99.33%accuracyonLFWand93.61%onaverageacrossthefivebenchmarks,whichiscomparabletothemethodstrainedonmillionsofrealfaceimages.•Havingfullcontrolovertherenderingpipeline,weper-formextensiveexperimentstostudyhoweachattribute(e.g.,variationinfacialpose,accessoriesandtextures)affectsthefacerecognitionaccuracy.
2.RelatedWorkFacerecognitiondatasetswithrealfaceimages.Majortechcompaniescanutilizeprivatedatatotraintheirfacerecognitionmodels.Googleused100M-200Mimagesof8MidentitiestotrainFaceNet[29],andFacebookused500Mimagesof10Midentities[31].Itischallengingtoconstructdatasetsofcomparablesizeusingfaceimagesthatarepubliclyavailable.Publicdatasetsgenerallyrelyoncelebrityimages[14,39,12,43]orwebimagesthatarepostedwithCreativeCommonslicense[17,26].Asdis-cussedinsection1,suchdatasetshaveethicalissuesandsufferfromlabelnoiseanddatabias.Syntheticfacesgeneratedusingdeepgenerativemodels.DeepgenerativemodelssuchasGANs[11]canproducephoto-realisticimagesandhavebeenusedtogeneratesyn-theticdatatotrainfacerecognition[32,28].Whiletradi-tionalgenerators(e.g.,[16])generateafaceimagefromasinglelatentvectorthatchangesboththeidentityanditsappearance,DiscoFaceGAN[9]learneddisentangledla-tentrepresentationsforidentity,pose,expressionandillu-mination.SynFace[28]usedDiscoFaceGANtogenerateasyntheticdatasetforfacerecognition,consistingof10Kidentitiesand500Kimages.SynFaceachieved91.93%ac-curacyonLFWdataset[14],andbymixingthesyntheticdatasetwith2Krealidentities(20imageseach),theaccu-racywaspushedupto97.23%.However,theirperformanceispoorforlarge-pose-variationdatasets(e.g.,75.03%onCFP-FP[30]and70.43%onCPLFW[41]).Thisismainlybecauseitischallengingtotraina2DGANtoproduceim-agesthatpreserve3Dgeometricconsistency[10].Syntheticfacesgeneratedusing3Dparametricmodels.Classical3Dparametricfacemodelssuchasmorphablemodels[5]explicitlymodeltheidentityindependentlyfromotherparameterswhichmakesthemwellsuitedforgenerat-ingfacerecognitiondatasets.However,previousresultsob-tainedwiththiskindofsyntheticimageshaveshownlimitedperformance[20,19]unlesscombinedwithalargenumberofrealimages.Thiscanbeduetothelackofrealismandvariabilityinthemodelsthathavebeenusedtogeneratethefaces.Woodetal.[36]introducedapipelineforgeneratingandrenderingdiverseandphoto-realistic3Dfacemodels.Agenerativefacemodel,learnedfromthe3Dscansof511in-dividuals,isusedtogeneratearandom3Dface.Thefaceisthencombinedwithartist-createdassets(e.g.,texture,hair,accessories)andisrenderedunderarandomenvironment(simulatedwithHDRIs-highdynamicrangeimages).Therenderedsyntheticfaceimages(andthecorrespondingauto-generatedgroundtruthannotations)wereusedtolearnvar-iousfaceanalysistaskssuchasfaceparsing[36],landmarklocalization[36,37]andfacereconstruction[37],demon-stratingstate-of-the-artperformance.Inthispaper,weaimtodemonstratethatsuchphoto-realisticrenderedsyntheticfacescanbeusedtotacklefacerecognition.Figure2.Eachrowshowsthesameidentityrenderedwithdiffer-entaccessorysetups.Accessoriesincludeclothes,glasses,make-up(e.g.,eyeshadowandeyeliner),face-wearandhead-wear.Thecolor,densityandthicknessoffacialandheadhairarealsoran-domized.Thehairstyleismodifiedonlywhenthesampledacces-soryconflictswiththeoriginalhairstyle.Figure3.Randomizingthehairstylemakestheproblemunnec-essarilydifficult(seethebottomrow),asmostpeoplemaintainsimilarhairstyles.Therefore,weonlyrandomizethecolor,den-sityandthicknessofthehairasshowninthetoprow(thehairisalsorandomlyflippedhorizontally).3.DigitalFacesforFaceRecognitionThissectionexplainshowtheproposeddatasetisgen-erated.Wefirstexplainhowdigitalfacesarecontrolled,renderedandalignedtocreatethedataset(subsection3.1).Afterprovidingthedatasetstatistics(subsection3.2),weintroducethedataaugmentationdetailswhichhelpinmin-imizingthesynthetic-to-realdomain-gap(subsection3.3).3.1.FaceRenderingWebuilduponthefacegenerationandrenderingpipelineintroducedbyWoodetal.[36].Inthissection,weexplainthemodificationswemadetotheoriginalpipelinetocreatealarge-scaledatasetforfacerecognition.Wedefineidentityasauniquecombinationoffacialge-ometry,texture(albedoanddisplacement),eyecolorandhairstyle.Foreachidentity,werenderanumberofim-
Figure4.Examplesofimagesrenderedforthesameidentityandaccessorysetup.Thesamefacecanlookverydifferentdependingonthepose,expression,environment(lightingandbackground)andcamera,encouragingthenetworktolearnrobustembedding.ageswhereallotherparametersarevariedtoencouragethenetworktolearnrobustembeddings.Whilehairstylecanchangeforanindividual,mostpeoplemaintainsimilarhairstyle(forbothfacialandheadhair)whichmakeshairstyleanimportantcuefortheperson’sidentity.Consequently,forthesameidentity,werandomizeonlythecolor,densityandthicknessofthehair(seeFigure3forexamples),andthehairstyleisonlychangedwhentheaddedhead-wearisnotcompatiblewiththeoriginalhairstyletoavoidintersec-tion(e.g.,thirdimageoftoprowinFigure2).Forsamplingfacialgeometry,textureandeyecolorwefollow[36].Foragivenidentity,wesampledifferentaccessoriesin-cludingclothing,make-up,glasses,face-wear(e.g.,facemasks)andhead-wear(e.g.,hats).Afterselectingthecloth-ingrandomlyfromthedigitalwardrobe,otheraccessoriesareaddedwithprobabilityp={0.15,0.15,0.01,0.15}re-spectively.Wealsoaddhandsandsecondaryfaceswithasmallprobability(p=0.01)tosimulatethecasewhen(1)thefaceisoccludedbyhandsandwhen(2)therearemulti-plefacesintheimage.Figure2showsexamplesofthesam-pledidentitiesrenderedwithdifferentsetsofaccessories.Foreachaccessorysetup,wevarythepose,expres-sion,cameraandenvironment(lightingandbackground)torendermultipleimages.Thecameraisrotatedaroundtheface,bothhorizontallyandvertically.Horizontalan-gleissampledfromatruncatedzero-meannormaldistri-butionwithsupportθhori∈[−90◦,90◦].Thevarianceissetsuchthattheprobabilitydensityp(θhori=90◦)equalsto10−3×p(θhori=0◦).Verticalangleissampledfromasimilartruncatednormaldistributionwithsupportθvert∈[−30◦,30◦]andp(θvert=30◦)=10−3×p(θvert=0◦).Thisallowsustorenderawiderangeofposeswhilemak-ingsurethatfrontalviewsarerenderedmoreoften.Lastly,thefaceisrandomlytranslatedwithintheviewingfrustumtoaddadditionalperspectivedistortion.Forpose,expres-sion,andenvironmentsampling,wefollow[36].Figure4showstheimpactofvaryingthepose,expression,environ-mentandcameraforthesameidentityandaccessorysetup.Facealignment.Theinputtothefaceembeddingnetworkshouldbeanalignedcroparoundtheface.Insteadofde-Figure5.Forsyntheticfaces,itistrivialtoextractthelocationsofground-truthfaciallandmarks(e.g.,eyes,nose-tipandmouthcorners)andalignthecroparoundtheface.Thisenablesrobustfacealignment,evenwhensomeofthelandmarksarenotvisible.tectingfaciallandmarksusingpre-trainedDNNs(suchasMTCNN[40]andRetinaFace[7]),wealignthefacesusingthegroundtruthlandmarks(seeFigure5),whichenablero-bustalignmentevenwhensomelandmarksarenotvisible.Limitations.Thefacegenerationpipeline[36]webuilduponhasanumberoflimitationsresultingindomain-gaptorealfaceimages.Particularlyrelevanttofacerecognitionisthatwecannotgeneratethesamepersonatdifferentages.Whilewesimulateagingtosomeextentbyrandomizingthecolor,densityandthicknessofthehair(ashairtypicallybecomesgrayer,sparserandthinnerduringaging),moreworkshouldbedonetofaithfullysimulateaging.Lackofcoverage(e.g.,nojewelryandtattoos)mayalsomeanthatthedistributionofthesyntheticdatadoesnotmatchreality.3.2.DatasetStatisticsTheproposeddatasetconsistsoftwoparts.Thefirstpartcontains720Kimageswith10Kidentities.Foreachidentity,4differentsetsofaccessoriesaresampledand18imagesarerenderedforeachset(i.e.,72images-per-identity).Sincemanyviewsofthesamefaceareavail-able,thenetworkcanlearnembeddingthatisrobusttothechangesinaccessories,camera,pose,expression,anden-vironment.Thesecondpartcontains500Kimageswith100Kidentities.Foreachidentity,onlyonesetofacces-soriesissampledandonly5imagesarerendered.Thispartwasaddedtosubstantiallyincreasethetotalnumberofiden-titieswithsmallrenderingcost.Ensuringsufficientnumberofidentitiesisimportantsincethenetworkshouldlearntodistinguishbetweensimilar-lookingfacesofdifferentiden-tities.Weshowintheexperimentsthatmixingthetwopartsleadstobetteraccuracythanusingoneofthem(Table3).3.3.DataAugmentationThequalityofin-the-wildfaceimagescanvarysignifi-cantly.Certainpartsofthefacemaybeoccluded,andtheimagesaresubjecttodistortionandnoisethatarespecifictoeachcamera.Asoursyntheticfacesarerenderedwithcon-trolledqualityusingaperfectpinholecamera,aggressivedataaugmentationisneededtoreducethesynthetic-to-realdomain-gap.Wefirstapplyrandomhorizontalflippingandcropping,following[18].Then,weapplytwosetsofaug-mentations-appearanceandwarping.Figure6showstrain-
Figure6.Syntheticfaceimagesatdifferentstagesofdataaugmen-tation.Aggressiveaugmentationhelpstosimulateeffectssuchasmotionbluranddistortioncommoninreal-worldimagesandthusimprovetherobustnessofDNNstrainedonsyntheticimages.ingimageswiththeseaugmentations.Notethatweapplythedataaugmentationon-the-flyduringtraining,i.e.,eachepochseesdifferentrandomaugmentations.Foreachtypeofaugmentation,weindicateitsprobabilityptobeappliedonasampleimage.Appearanceaugmentation.WeapplyrandomGaussianblur(p=0.05)andGaussiannoise(p=0.035).ByapplyingtheGaussianbluralongarandomdirectionus-ingananisotropiccovariance,wealsosimulatemotionblur(p=0.05).Brightness,contrast,hueandsaturationarerandomizedwithp={0.15,0.3,0.1,0.1}.Imagesarecon-vertedintograyscalewithp=0.01.Lastly,theimagequalityisrandomizedbydownsampling-and-upsampling(p=0.01)andJPEGcompression(p=0.05).Warpingaugmentation.Warpingisperformedbyran-domlyshiftingthefourcornersoftheimage.Firstly,theaspectratioisrandomizedwithp=0.1.Then,allimagesundergorandomscaling,rotationandshift.Lastly,thefourcornersareshifteddifferentlyforadditionaldistortion.4.ExperimentalSetupImplementationdetails.Syntheticfacesarerenderedus-ingCyclesrenderer[1],with256samplesperpixel.Therenderingofthefulldatasettookapproximately10days,using300NVIDIAM60GPUs.Theimagesarerenderedat256×256resolution,andthealignedcroparoundthefaceisresizedinto112×112.WeuseResNet-50[13]backbonefortheexperimentsinsubsection5.1,5.2and5.3.Forcompar-isonagainstthestate-of-the-artmethodsinsubsection5.4,weusetheirencoderarchitecturetoensurefaircomparison.Forallexperiments,thenetworksareimplementedwithPy-Torch[27]andaretrainedfor40epochsusingSGD.Thebatchsizeissetto256andthenetworksaretrainedonfourNVIDIAP100GPUs.Wefollowthelearningrateschedul-ingof[28],andusethetraininglossfrom[18].Notethatallnetworksaretrainedfromscratch(notpre-trainedon,e.g.,ImageNet[6]),tomakesurethatnorealimagesareused.Evaluationprotocol.Followingstate-of-the-artmeth-ods[15,21,24,22,18],wereportthefaceverificationaccuracyonfivebenchmarkdatasets-LFW[14],CFP-FP[30],CPLFW[41],AgeDB[25]andCALFW[42].LFWcontains6,000pairsofin-the-wildfaceimages.CFP-FPandCPLFWhavelargerposevariation(CFP-FPspecif-icallycomparesfrontalviewstoprofileviews).AgeDBandCALFWhavelargeragevariation.5.ExperimentsWerunaseriesofexperimentstodemonstratetheuse-fulnessoftheproposeddataset.Subsection5.1comparesdifferentdataaugmentations.Insubsection5.2,wetrainthenetworkonvariousdifferentsubsetsofthefulldatasettounderstandhoweachattributesamplinginrenderingaf-fectstheaccuracy.Insubsection5.3,weshowthatoursyn-theticfacescanbeusedinconjunctionwithasmallnumberofrealfacestosubstantiallyimprovetheaccuracy.Lastly,weprovidecomparisonagainstthestate-of-the-artmethodsinsubsection5.4.5.1.DataAugmentationInsubsection3.3,weintroducedappearanceandwarp-ingaugmentations.AsshowninTable1,bothleadtosig-nificantimprovementacrossalldatasets.WealsocompareagainsttheaugmentationusedbyAdaFace[18],whichin-cludeshorizontalflipping,croppingandmildcoloraugmen-tation.Foroursyntheticfaceimageswhicharefreeofim-perfection,moreaggressivedataaugmentationisneededtoreducethedomain-gap.Noticethatthewarpingaugmen-tationimprovestheperformanceespeciallyforthelarge-pose-variationdatasets(CFP-FPandCPLFW).5.2.DatasetCompositionHavingfullcontrolovertherenderingpipeline,wecancreateadatasetwithdesiredstatisticstostudyhoweachattributeaffectsthefacerecognitionaccuracy.TheresultsareprovidedinTable2.Accessorysampling.For10Ksyntheticidentities,wesam-pled4accessorysetupsandrendered18imagesforeachsetup(i.e.,720Kimagesintotal).These18imageshavevariationsinpose,expression,camera,andenvironment(seeFigure4).Fromthis,wecancreateasubsetof180Kimagesbyselecting18imagesperIDwithfixedaccessory.Similarly,wecanselect18imagesrandomlysothatimageswithdifferentaccessoriesareusedduringtraining.Whenrandomizingtheaccessories,wealsorandomizedthecolor,thicknessanddensityofthehairtosimulateaging(Fig-ure3).Asaresult,theaccuracyisimprovedespeciallyforthelarge-age-variationdatasets(AgeDBandCALFW).ForCFP-FPandCPLFW,whichhassmalleragegap(i.e.,
ExperimentMethodLFWCFP-FPCPLFWAgeDBCALFWAvgDataaugmentationNoaugmentation88.0770.9966.7360.9269.2371.19AugmentationfromAdaFace[18]90.1276.4171.3367.1774.1375.83Ours(appearance)94.3280.0074.8375.8276.9280.38Ours(appearance+warping)94.5584.8677.0876.9777.2082.13Table1.Theproposedaggressivedataaugmentationsignificantlyimprovestheaccuracyacrossalldatasets.ExperimentMethodLFWCFP-FPCPLFWAgeDBCALFWAvgAccessorysamplingFixaccessory93.5082.1675.7573.0573.8379.66Randomizeaccessory94.2382.0475.1876.4377.2281.02PosesamplingMinimizehorizontalangle93.4267.1966.4876.7877.2276.22Minimizeverticalangle93.6781.1374.5776.5776.6880.52Randompose94.2382.0475.1876.4377.2281.02Texturesampling5089.6375.0469.7269.4770.1074.79(#texturestoselectfrom)10090.8374.8470.3070.6270.5775.4315090.0373.0169.6371.4870.2774.8920089.8273.3769.3771.4570.5074.90Table2.Datasetcompositionexperimentstostudyhowthesamplingofeachattributeaffectstheaccuracy.Figure7.Left:40texturesselectedrandomlyfromthetextureli-brary.Thelibrarycoversdiverseskincolorandage.Righttoprow:variousidentities(facialgeometryandhairstyle)sampledwiththesametexture.Rightbottomrow:sameidentitywiththesametextureunderdifferentenvironments(takenfrom[36]).Withlargevariationsingeometry,hairstyleandenvironments,richap-pearancevariationcouldbeachievedwithlimitedtextures.positivepairscapturetheidentityatsimilarage),fixingtheaccessoryandhairleadstoslightlybetteraccuracy.Posesampling.Similartotheaccessorysampling,wecanselect18imagesforeachofthe10Kidentitiesbyse-lectingtheoneswiththesmallesthorizontal/verticalan-gles.Then,wecancomparethemagainstthe18imagesselectedrandomly.Fortherandomlyselectedimages,thestandarddeviationinhorizontalandverticalangleswere(σhori,σvert)=(24.13◦,9.20◦).Fortheimageswiththesmallesthorizontal/verticalangles,theywere(4.71◦,8.06◦)and(22.02◦,1.72◦)respectively.AsshowninRow3-5inTable2,increasingthevariationinhorizontalandverticalanglesimprovedtheaccuracyespeciallyforthelarge-pose-variationdatasets(CFP-FPandCPLFW).ForAgeDBandCALFW,whichconsistsmainlyoffrontalfaces,theaccu-racywassimilar.Texturesampling.Whilewecancreateinfinitenumberofuniquefacialgeometries,thetextureissampledfromali-brarybuiltfrom208scansofrealhumanfaces(obtainedwithconsent).Sincewegenerated110Kidentitiesinto-tal,manyofthemsharethesametexture.Toseehowthenumberoftexturesaffectstheaccuracy,wecreatedadatasetof1200identitieswithNtextures,bygenerating1200/Nidentitiesforeachtexture.AsshowninRow6-9ofTable2,increasingthenumberoftexturesdidnotleadtoamean-ingfulimprovementintheaccuracy.Thisiscontrarytotheintuitionthatsmallnumberoftexturesandlackoftexturegenerativemodelarelimitationsofsyntheticdataforfacerecognition.Webelievethattheappearancevariabilityisacombinationofgeometry,texture,hair,accessories,en-vironmentandimagequality.InFigure7,weshowthat(1)thetexturelibraryalreadycoversdiverseskincolorandage,(2)anarbitrarynumberofuniqueidentitiescanbegen-eratedwiththesametexture,and(3)skinappearanceisgreatlyaffectedbytheenvironment.Also,theimagequal-ityforfacerecognitiontaskisingenerallimitedduetolowresolutionanddataaugmentation.Thus,thecontributionoftexturevariationislikelylessimportantthanthatofgeom-etryandenvironment.Balancebetween#IDsand#images/ID.EnsuringlargenumberofIDsisimportantforlearningdiversediscrimina-tiveembedding.Ontheotherhand,largenumberofimagesperID(referredtoasimages/ID)isneededforlearningro-bustembedding(thatisnotaffectedbythechangesinpose,accessories,expressions,cameraandenvironment).Mixingthetwodatasetswithdifferentnumberofimages/IDcanbeconsideredasanefficientwayofgettingthebestofboth.Thisalsosimulatesthelong-taileddistributionofrealfacedatasets(i.e.,mostidentitieshavesmallnumberofimages).TheresultinTable3showsthatmixingthetwodatasetsleadstobetteraccuracythanusingoneofthem.
Figure8.Comparisonbetweentrainingwithoursyntheticdataonly(blackdashedline),withsmallamountofrealdataonly(redline),withthemixtureofthetwo(blueline),andpre-trainingonsyntheticsandfine-tuningwiththerealdata(blackline).Thenumberofrealidentitiesvariesfrom200to2000,and20imagesaresampledforeachidentity.Whenonlyasmallnumberofrealfaceimagesareavailable(e.g.,duetoethicalissues),theproposedsyntheticdatasetcansubstantiallyimprovetheaccuracy.#IDs×#images/IDLFWCFP-FPCPLFWAgeDBCALFWAvg10K×50+0×594.3884.0776.5375.9376.7281.538K×50+20K×594.8084.7977.5276.4777.6582.246K×50+40K×595.2285.2477.1577.5278.3282.694K×50+60K×595.4584.8377.7077.6879.1082.952K×50+80K×594.8284.0977.7577.5578.3782.510×50+100K×594.4583.3476.7776.3377.2881.64Table3.NumberofIDsandnumberofimages/IDshouldbothbehightolearndiverseandrobustembedding.Mixingtwodatasetswithlarge/smallnumberofimages/IDcanbeanefficientwayofsatisfyingboth.Theoverallaccuracybecomeshigherthanrelyingononeofthetwodatasets.5.3.MixingwithRealFacesThemainproblemsassociatedwithlarge-scalerealfacedatasetsareethicalissues,labelnoiseanddatabias.Inthisstudy,weassumeascenariowhereasmallnumberofrealfaceimagesarecollectedwithconsent.Forsmallnumberofimages,itwouldalsobepossibletoremove(orreduce)thelabelnoiseanddatabias.Forsyntheticdata,weused10Kidentitieswith72im-agesperidentity.Forrealfaceimages,wevariedthenum-berofidentitiesfrom200to2000,with20imagessampledforeachidentity(theidentitiesandimagesweresampledrandomlyfromCASIA-WebFace[39]).Wefirsttriedtrainingonlyonthesyntheticdata.Sec-ondly,wetriedtrainingonlyontherealdata.Then,weex-ploredtwodifferentstrategiesforusingbothrealandsyn-theticimages:(1)datasetmixingand(2)pre-trainingonsyntheticdataandfine-tuningontherealdata.Forfine-tuning,wereducedthelearningrateby1/10forthepredic-tionhead,and1/100fortheencodertoavoidcatastrophicforgetting.TheresultsareprovidedinFigure8.Whenthenetworkistrainedonlyonasmallnumberofrealfaceimages,theaccuracyisworsethanthenetworktrainedonlyonoursyntheticdataset.Bothdatasetmixingandpre-trainingcanleadtosignificantlyhigheraccuracy,especiallyforthelarge-pose-variationdatasets(CFP-FPandCPLFW).Comparedtodatasetmixing,pre-trainingonsyn-theticsfollowedbyfine-tuningonrealimagesledtobetteraccuracy.Thiscanbeduetotheimbalancebetweenthenumberofimages(weuse720Ksyntheticimages,andalotfewerrealimages).5.4.ComparisontotheState-of-the-ArtComparisontoSynFace.SynFace[28]isthecurrentstate-of-the-artforfacerecognitionmodeltrainedonsyn-theticfaces.TheyusedDiscoFaceGAN[9]togenerate500Ksyntheticfaces(10Kidentities&50images/ID).Toensureafaircomparison,wetrainedthesameencoder(LResNet50E-IR)withsamenumberofimages.Wealsotrainedusingourfulldataset(1.22Mimages).There-sultsareprovidedinRow1-3ofTable4.Inthesecondscenario,weadditionallyused40KrealfaceimagesfromCASIA-WebFace[39].WhileSynFacemixedtheirsyn-theticdatasetwiththerealfaces,weinsteadadoptedthetwo-stagemethodofpre-trainingandfine-tuningasdis-cussedinsubsection5.3.TheresultsareprovidedinRow4-6ofTable4.Forbothscenarios,wesignificantlyoutperformSynFaceacrossalldatasets.ThissuggeststhatourrenderedsyntheticfacesarebetterthanGAN-generatedfacesforlearningfacerecognition.WhileGANslike[9]cangeneraterealisticfaceimages,thedatatheygenerateisnotidealforfacerecogni-tion,duetofollowingreasons:(1)Identitychange.While[9]isencouragedtopreservetheidentitywhenchangingotherlatentvariables,thereisnoguaranteethattheidentitywillbepreservedduringdatageneration.(2)Geometricin-consistency.Aspointedoutby[10],theimagesgenerated
Method#Syntheticimages#RealimagesLFWCFP-FPCPLFWAgeDBCALFWAvgAvg†(#IDs×#imgs/ID)(#IDs×#imgs/ID)SynFace[28]500K(10K×50)091.9375.0370.4361.6374.7374.7579.13Ours500K(10K×50)095.4087.4078.8776.9778.6283.4587.22Ours1.22M(10K×72+100K×5)095.8288.7781.6279.7280.7085.3288.74SynFace[28]500K(10K×50)40K(2K×20)97.2387.6880.3281.4285.0886.3588.41Ours500K(10K×50)40K(2K×20)99.0594.0187.2789.7790.0892.0493.44Ours1.22M(10K×72+100K×5)40K(2K×20)99.1794.6388.1090.5090.9792.6793.97Table4.ComparisontoSynFaceusingthesameencoderarchitecture(LResNet50E-IR[28]).Forbothscenarios-trainingonlyonsyntheticfaces&usingasmallnumberofrealfaces-wesignificantlyoutperformSynFaceacrossalldatasets.Avg†showsaverageofLFW,CFP-FPandCPLFW,excludingthelarge-age-variationdatasets.Method#Syntheticimages#RealimagesLFWCFP-FPCPLFWAgeDBCALFWAvgAvg†Ours(SXbest)1.22M096.1789.8182.2381.1082.5586.3789.40Ours(SX+Realbest)1.22M120K99.3395.9389.4791.5591.7893.6194.91SV-AM-Softmax[35]05.8M99.5095.1089.4895.6894.3894.8394.69SphereFace[23]99.6796.8491.2797.0595.5896.0895.93CosFace[33]99.7898.2692.1898.1796.1896.9196.74ArcFace[8]99.8198.4092.7298.0595.9696.9996.98MagFace[24]99.8398.4692.8798.1796.1597.1097.05AdaFace[18]99.8298.4993.5398.0596.0897.1997.28Table5.Comparisontothestate-of-the-artmethodstrainedonrealfaceimages(MS1MV2[8]).Weusethesamebackbone(ResNet100)forfaircomparison.Byonlyusing120Krealfaceimages(2%ofMS1MV2[8]),weachieveaccuracythatiscomparabletothemethodstrainedonmillionsofrealfaceimages.Sincewedonotmodelagingexplicitly,ouraccuracyisworseforlarge-age-variationdatasets(AgeDBandCALFW).Avg†showsaverageofLFW,CFP-FPandCPLFW,andonthese,weoutperform[35]andaresimilarto[23].by[9]forsameidentityanddifferentposeslack3Dconsis-tency.(3)Lackofaccessorychange.[9]cannotrandom-izeaccessories.(4)Unresolvedethicalconcerns.TrainingtheGANmodelitselfrequireslarge-scalerealfacedataset.Forexample,70Kimagesareusedtotrain[9].Tolearntopreserveidentity,theyalsousedaperceptuallossbasedon[38],whichistrainedon3Mrealfaceimages.InRow2and3ofTable4,weincreaseoursynthet-icsdatasetsizefrom500Kto1.22M,andachievebetteraccuracy.Thisindicatesthattheaccuracymaynothaveconvergedyetandcouldbeimprovedfurtherbygeneratingmoresyntheticdata.Comparisontomethodstrainedonrealfaces.Lastly,wecomparetheaccuracyagainstthemethodsthataretrainedonrealfaceimages.InTable5,weprovidetheaccuracyofsixmethodsthatuseResNet100astheembeddingnet-workandMS1MV2[8]asthetrainingdata.Wetrainedthesamearchitectureonoursyntheticdataset(Row1).Wealsotriedfine-tuningthenetworkonasmallnumberofrealfaceimages(Row2).Whentrainedonlywiththepro-posedsyntheticdataset,thenetworkcanachieve96.17%onLFW.ForLFW,CFP-FPandCPLFW(excludingthehigh-age-variationdatasets),theaverageaccuracyis89.40%.Byfine-tuningthenetworkonjust120Kimages(2.0%ofMS1MV2),theaccuracybecomescomparabletothemeth-odstrainedonMS1MV2(e.g.,averageaccuracyonLFW,CFP-FPandCPLFWbecomeshigherthanthatofSV-AM-Softmax[35]).TheperformanceofourmethodonAgeDB[25]andCALFW[42]hasasignificantlylargergapthanfortheotherdatasetsevaluated.Thisisexpectedgiventhelackofagingsimulationinoursyntheticdata.Wesuspectthatothercausesofdomain-gap,asdescribedattheendofsub-section3.1,aretheprimaryreasonfortheremainingper-formancegapforotherevaluationdatasets.Reducingthisdomain-gapremainsanareaofongoingworkforoursyn-theticdataandislikelytoresultinimprovedperformanceforalldownstreamtasks,includingfacerecognition.Weleavethisasfuturework.6.ConclusionInthispaper,weintroducedanewlarge-scalesyntheticdatasetforfacerecognitionbyrenderingdigitalfacesusingagraphicspipeline.Weranextensiveexperimentstostudyhowdataaugmentationandvariousotherattributesaffecttheaccuracy.WedemonstratedthatoursyntheticfacesaresignificantlybetterthantheGAN-generatedfacesforlearn-ingfacerecognition.Withasmallnumberofrealfaceim-ages,weachieveaccuracythatiscomparabletothemethodstrainedonmillionsofweb-crawledfaceimages.Wehopethisdatasetwouldbeameaningfulsteptowardsdevelop-ingsociallyresponsiblefacerecognitionmodelsthatdonotdependonprivacy-sensitivedataobtainedwithoutconsent.
References[1]Blenderfoundation.cyclesrenderer.https://www.cycles-renderer.org/.Accessed:2022-10-03.[2]Theethicalquestionsthathauntfacial-recognitionre-search.https://www.nature.com/articles/d41586-020-03187-3.Accessed:2022-10-03.[3]flickrcreativecommons.https://www.flickr.com/creativecommons/.Accessed:2022-10-03.[4]Megapixels.https://ahprojects.com/megapixels-glassroom/.Accessed:2022-10-03.[5]VolkerBlanzandThomasVetter.Amorphablemodelforthesynthesisof3dfaces.InProc.SIGGRAPH,1999.[6]JiaDeng,WeiDong,RichardSocher,Li-JiaLi,KaiLi,andLiFei-Fei.Imagenet:Alarge-scalehierarchicalimagedatabase.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2009.[7]JiankangDeng,JiaGuo,EvangelosVerveras,IreneKotsia,andStefanosZafeiriou.Retinaface:Single-shotmulti-levelfacelocalisationinthewild.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2020.[8]JiankangDeng,JiaGuo,NiannanXue,andStefanosZafeiriou.Arcface:Additiveangularmarginlossfordeepfacerecognition.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2019.[9]YuDeng,JiaolongYang,DongChen,FangWen,andXinTong.Disentangledandcontrollablefaceimagegenera-tionvia3dimitative-contrastivelearning.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2020.[10]YuDeng,JiaolongYang,JianfengXiang,andXinTong.Gram:Generativeradiancemanifoldsfor3d-awareimagegeneration.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2022.[11]IanGoodfellow,JeanPouget-Abadie,MehdiMirza,BingXu,DavidWarde-Farley,SherjilOzair,AaronCourville,andYoshuaBengio.Generativeadversarialnets.InAdvancesinNeuralInformationProcessingSystems(NeurIPS),2014.[12]YandongGuo,LeiZhang,YuxiaoHu,XiaodongHe,andJianfengGao.Ms-celeb-1m:Adatasetandbenchmarkforlarge-scalefacerecognition.InProc.EuropeanConferenceonComputerVision(ECCV),2016.[13]KaimingHe,XiangyuZhang,ShaoqingRen,andJianSun.Deepresiduallearningforimagerecognition.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2016.[14]GaryBHuang,MarwanMattar,TamaraBerg,andEricLearned-Miller.Labeledfacesinthewild:Adatabaseforstudyingfacerecognitioninunconstrainedenvironments.InWorkshoponfacesin’Real-Life’Images:detection,alignment,andrecognition,2008.[15]YugeHuang,YuhanWang,YingTai,XiaomingLiu,PengchengShen,ShaoxinLi,JilinLi,andFeiyueHuang.Curricularface:adaptivecurriculumlearninglossfordeepfacerecognition.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2020.[16]TeroKarras,TimoAila,SamuliLaine,andJaakkoLehti-nen.Progressivegrowingofgansforimprovedquality,sta-bility,andvariation.InInternationalConferenceonLearningRepresentations(ICLR),2018.[17]IraKemelmacher-Shlizerman,StevenMSeitz,DanielMiller,andEvanBrossard.Themegafacebenchmark:1mil-lionfacesforrecognitionatscale.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2016.[18]MinchulKim,AnilKJain,andXiaomingLiu.Adaface:Qualityadaptivemarginforfacerecognition.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2022.[19]AdamKortylewski,BernhardEgger,AndreasMorel-Forster,AndreasSchneider,ThomasGerig,ClemensBlumer,CoriusReyneke,andThomasVetter.Cansyntheticfacesundothedamageofdatasetbiastofacerecognitionandfacialland-markdetection?arXivpreprintarXiv:1811.08565,2018.[20]AdamKortylewski,BernhardEgger,AndreasSchneider,ThomasGerig,AndreasMorel-Forster,andThomasVetter.Analyzingandreducingthedamageofdatasetbiastofacerecognitionwithsyntheticdata.InProc.IEEEConferenceonComputerVisionandPatternRecognitionWorkshops,2019.[21]BiLi,TengXi,GangZhang,HaochengFeng,JunyuHan,JingtuoLiu,ErruiDing,andWenyuLiu.Dynamicclassqueueforlargescalefacerecognitioninthewild.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2021.[22]ShenLi,JianqingXu,XiaqingXu,PengchengShen,ShaoxinLi,andBryanHooi.Sphericalconfidencelearningforfacerecognition.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2021.[23]WeiyangLiu,YandongWen,ZhidingYu,MingLi,BhikshaRaj,andLeSong.Sphereface:Deephypersphereembeddingforfacerecognition.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2017.[24]QiangMeng,ShichaoZhao,ZhidaHuang,andFengZhou.Magface:Auniversalrepresentationforfacerecognitionandqualityassessment.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2021.[25]StylianosMoschoglou,AthanasiosPapaioannou,Chris-tosSagonas,JiankangDeng,IreneKotsia,andStefanosZafeiriou.Agedb:thefirstmanuallycollected,in-the-wildagedatabase.InProc.IEEEConferenceonComputerVisionandPatternRecognitionWorkshops,2017.[26]AaronNechandIraKemelmacher-Shlizerman.Levelplay-ingfieldformillionscalefacerecognition.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2017.[27]AdamPaszke,SamGross,FranciscoMassa,AdamLerer,JamesBradbury,GregoryChanan,TrevorKilleen,Zem-ingLin,NataliaGimelshein,LucaAntiga,etal.Py-torch:Animperativestyle,high-performancedeeplearn-inglibrary.InAdvancesinNeuralInformationProcessingSystems(NeurIPS),2019.[28]HaiboQiu,BaoshengYu,DihongGong,ZhifengLi,WeiLiu,andDachengTao.Synface:Facerecognitionwithsyntheticdata.InProc.IEEEInternationalConferenceonComputerVision(ICCV),2021.
[29]FlorianSchroff,DmitryKalenichenko,andJamesPhilbin.Facenet:Aunifiedembeddingforfacerecognitionandclus-tering.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2015.[30]SoumyadipSengupta,Jun-ChengChen,CarlosCastillo,VishalMPatel,RamaChellappa,andDavidWJacobs.Frontaltoprofilefaceverificationinthewild.InProc.IEEEWinterConferenceonApplicationsofComputerVision(WACV),2016.[31]YanivTaigman,MingYang,Marc’AurelioRanzato,andLiorWolf.Web-scaletrainingforfaceidentification.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2015.[32]DanielS´aezTrigueros,LiMeng,andMargaretHartnett.Generatingphoto-realistictrainingdatatoimprovefacerecognitionaccuracy.NeuralNetworks,134:86–94,2021.[33]HaoWang,YitongWang,ZhengZhou,XingJi,DihongGong,JingchaoZhou,ZhifengLi,andWeiLiu.Cos-face:Largemargincosinelossfordeepfacerecognition.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2018.[34]MeiWang,WeihongDeng,JianiHu,XunqiangTao,andYaohaiHuang.Racialfacesinthewild:Reducingracialbiasbyinformationmaximizationadaptationnetwork.InProc.IEEEInternationalConferenceonComputerVision(ICCV),2019.[35]XiaoboWang,ShuoWang,ShifengZhang,TianyuFu,HailinShi,andTaoMei.Supportvectorguidedsoftmaxlossforfacerecognition.arXivpreprintarXiv:1812.11317,2018.[36]ErrollWood,TadasBaltruˇsaitis,CharlieHewitt,SebastianDziadzio,ThomasJCashman,andJamieShotton.Fakeittillyoumakeit:faceanalysisinthewildusingsyntheticdataalone.InProc.IEEEInternationalConferenceonComputerVision(ICCV),2021.[37]ErrollWood,TadasBaltruˇsaitis,CharlieHewitt,MatthewJohnson,JingjingShen,NikolaMilosavljevic,DanielWilde,StephanGarbin,TobySharp,IvanStojiljkovic,etal.3dfacereconstructionwithdenselandmarks.InProc.EuropeanConferenceonComputerVision(ECCV),2022.[38]JiaolongYang,PeiranRen,DongqingZhang,DongChen,FangWen,HongdongLi,andGangHua.Neuralaggre-gationnetworkforvideofacerecognition.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2017.[39]DongYi,ZhenLei,ShengcaiLiao,andStanZLi.Learningfacerepresentationfromscratch.arXivpreprintarXiv:1411.7923,2014.[40]KaipengZhang,ZhanpengZhang,ZhifengLi,andYuQiao.Jointfacedetectionandalignmentusingmultitaskcascadedconvolutionalnetworks.IEEEsignalprocessingletters,23(10):1499–1503,2016.[41]TianyueZhengandWeihongDeng.Cross-poselfw:Adatabaseforstudyingcross-posefacerecognitioninun-constrainedenvironments.BeijingUniversityofPostsandTelecommunications,Tech.Rep,5:7,2018.[42]TianyueZheng,WeihongDeng,andJianiHu.Cross-agelfw:Adatabaseforstudyingcross-agefacerecognitioninun-constrainedenvironments.arXivpreprintarXiv:1708.08197,2017.[43]ZhengZhu,GuanHuang,JiankangDeng,YunYe,JunjieHuang,XinzeChen,JiagangZhu,TianYang,JiwenLu,Da-longDu,etal.Webface260m:Abenchmarkunveilingthepowerofmillion-scaledeepfacerecognition.InProc.IEEEConferenceonComputerVisionandPatternRecognition(CVPR),2021.