1、Use of dedicated business websites to enhance the statistical business register in the NetherlandsSharing experiencesArnout van Delden,Nick de Wolf,Naomi Schalken,Sander Scholtus,Olav ten Bosch and others;Feb 4-5 2025,GdanskIcon from ;by zero_wingIntroductionAutomatic use of information on websites
2、to reduce manual labour for maintenance variables in a SBR(units,contact information,NACE)Experiences by Statistics Netherlands:1.Finding of URLs using data from an external company2.Development of a model to predict NACE misclassificationsURL findingThird party DataProvider(DP)scrapes URLs(and cont
3、act information)in many countries and makes a selection of Dutch businessesSourcePopulationFrequencyLinkageChamber of commerceRegistration of(new)legal unitsContinuousLegal unit ID numberICT surveySample of enterprisesYearlyEnterprise ID numberDataProviderDutch websites that are notblockedMonthlyID
4、numbers,name,email address and so onURLs collected by third parties are a potentially useful source for NSIs,but The collected URLs need to be linked to legal/statistical units in the SBR values of identifying variables need to be present in both sourcesURL finding:linkage of DPURL finding:contribut
5、ion of COC versus DPGroupsURL fromCOCURL fromDPDP URL-LU linkage probability 0%10-50%65%75%85 95%100%Total4 630 8364 630 8364 630 8364 630 8364 630 8364 630 8364 630 836Group A+700 973670 528656 672644 217635 936424 151389 165Group B-+671 011213 781123 76529 2651 10911Group C+-221 605252 050265 9062
6、78 361286 642498 427533 413Group D-3 037 2473 494 4773 584 4933 678 9933 707 1493 708 2573 708 257Number of Legal Units in the SBR(Oct 2020)With websites scraped by third-parties:considerable effort is needed to build and maintain a probabilistic linkage function to link non-unique identifiers,or li