Download presentation
Presentation is loading. Please wait.
Published byArron Warren Modified over 9 years ago
1
Researchers’ needs for transnational access to confidential microdata Survey preliminary results Marie Cros 1 Frédérique Cornuau 1 Roxane Silberman 2 1 Université Lille 1 2 CNRS Réseau Quetelet DwB workshop, NTTS conference, Satelllite event, Brussels, 4 March 2013
2
Why surveying researchers ? DwB objective: Enhancing transnational access to official microdata, particularly confidential microdata Increasing number of RDCs offering remote access at national level Access to confidential microdata also increasingly available to non resident researchers, though in many cases “on site” Building a European Remote Access Network (ERAN) would improve the situation No (or less) travelling required Would also allow researchers’ teams located in different sites/countries to work together and to conduct comparative projects Crucial to understand How the different solutions used by RDCs impact the researchers ? What would the researchers require from an ERAN ?
3
Survey perimeter and bases for the survey Users of 5 European RDCs offering remote access CASD/GENES, CBS, IAB, ONS, SDS (DwB partners) Bases for the survey: DWB survey on the different remote access solutions (D4- 1) Previous analysis of the different phases a researcher has to go through for a project Preliminary discussions with researchers involved in an international project/team and DwB/RDCs experiences
4
Structure and rationale for the survey Researchers and research project characteristics that may impact their needs Selected phases of a research project For each phase, selected components of remote access solutions to ensure security that may impact the researchers Researchers expectations
5
Structure and rationale for the survey 1. Researcher’s profile Assumption that researchers’ needs and the way they are impacted by various solutions may vary depending on: Discipline, type of institution and context of the institution Differences in type of outputs needed, complexity of analysis, way of working, support from colleagues, support from local IT … Structure of research team and organisation of work in research teams Work alone or not Work with colleagues located in other places Previous experience of remote access Type of datasets used Business, households, persons : different issues for outputs anonymisation More or less large data files : storage capacity
6
Structure and rationale for the survey Selection of phases of a research project We selected 4 phases over 8 phases identified, focusing on phases the RDCs are involved in Information phase Accreditation phase Access phase Data phase Support phase Output checking phase Closure phase Yet, though we did not investigate the information and accreditation phases (DwB WP8 and 3), it must be underlined that most researchers spontaneous complained about the time needed before accessing data was possible
7
Structure and rationale for the survey Selection of the components of the security solutions All RDCs providing remote access aim at garanteeing security, though with different solutions for the different components of security (DwB D4.1) Different technical solutions as well as different interpretation and requirements for security We divided these components in 2 subsets based on possible impact (neutral or non neutral on researchers’ performance depending on the type of adopted solution
8
Access phase Components that may impact the researchers Access from where and secure environment (physically) Regular PC or Thin client, What operating system (OS) is supported for the user’s workstation installations that have to be done on the user’s side of the connection Researcher authentication
9
Access phase Component: place for access The requirements concerning the place where from access is made possible researcher's office, locked room, dedicated space in researcher’s institution, space only in national institute …), security regulations for such a place May impact the researchers at various degrees depending on the nature and the intensity of these constraints and support the researchers may have from their institution Researcher’s office more friendly Dedicated space in researcher’s institution not available in all institutions Only in RDCs in NSI may need travelling inside the country
10
Access phase Starting point for the connexion Own computer (regular PC) vs dedicated equipment (thin client) Regular PC more friendly Yet secure access requires some installation on the user’s workstation Different problems may arise depending support is needed from external IT staff, the local IT, costs Fewer problems may be expected with a thin client solution Yet the researcher has to deal with a different solution from his routine way of working.
11
Data phase: “Work with data” Components that may impact the researcher’s work Different methods for authentication Smartcard, login/password, biometric Frequency of authentication, methods of diverse levels of complexity, and success of the authentication may impact the researcher How researchers organize their work Constraints when travel to a specific place is required or if high fees Needs to concentrate/shorten the work Working with other researchers located in different places/countries ? Combining datasets from different RDCs ?
12
Support phase Components that may impact the researcher’s work Available data, metadata and support from the RDC team Available software for the researcher User surveillance Upload data in the user’s workspace at the server of the data provider
13
Output checking phase A major issue for both sides, RDCs and researchers Similar principles for output checking, yet differences in procedures as well as SDC rules More or less time consuming Assumption that impact may depend on: Checking intermediary outputs or final outputs Disciplines (if more descriptive and detailed outputs such as demography, urban sociology)
14
In addition we investigated … Language issues Anticipated time Fees issues And Many questions allowed free comments Final questions about general feelings about their experience and expectations
15
Survey administration Web-based survey ( a few paper-based) Sent by the RDCs to researchers who experienced existing remote access solutions in France, Germany, The Netherlands, and United Kingdom Researchers completed the questionnaire anonymously and submitted it directly online to survey design team 90 researchers completed the questionnaire Lack of feedbacks at the moment from RDCs about how many researchers have received the questionnaire Not a “representative” survey, yet providing useful information
16
Some preliminary results Not all issues
17
Respondents profile (n=90) institution Public university55% Public research center 27% Private research center 9% Other7% % economists CASD65% VML54% IAB93% SDS88% CBS25% ( =8) ALL73% Also : geography, sociology, health … Remote access system described in survey CASD28% ONS14% FDZ16% SDS33% CBS9% Who do researchers work with ? * Alone17% With other researchers from the same institution57% With other researchers from other national institutions 22% With other researchers from other countries8% * Multiple answers possible Not surprisingly, most researchers used remote access solution of their own country, for national (non- comparative) projects
18
Points of access A majority of researchers could access data from their own institution. Researchers who couldn’t access data from their own institution had to go in accredited points of access. Points of access Researcher’s own institution71% Data centre of a National Statistical Institute22% Another research institution6% Material Dedicated equipement73 % Own computer27 %
19
Comments from researchers who had to join special points of access in their country outside their institution Material conditions : traveling, time-consuming and loosing money “Pre-booking was needed, Needed time and other resources to travel.” “Coming to the location to use data is expensive, time-consuming, and leads to inaccurate research (inability to check things later).” “Time lost travelling to and from the data centre. Inefficiency in having to spend solid chunks of time out of the office.” Work organisation constraints “(…) not all info (books, papers) around, no chance to ask colleagues for help,....” “(…) need to be organized and prepared, you don't have all your files, literature, software, less spontaneous - in particular difficult, because you never know what to find in the data, need time to get to know the data etc.” “It’s not very problematic but annoying that you don't have the usual tools and settings that are there in your own office” Only possible during office hours
20
Comments from researchers working on own computer or dedicated equipment Own computer Most comments “Friendly” Few got some problems for installation Dedicated equipment No problem for installation Yet “Personal files not available” “Time consuming, need to be organized and prepared” “Because we could not go back and forth easily between the drafting and empirical work on the data” “Limited choice of software” And more problems when dedicated equipment combined with place of access not in the researcher’s institution
21
Authentication 93% of researchers approved the type of authentication required and 86% the frequency they had to log in. Success in authentication procedures : more mixed opinions : for 60% it worked each time, but for others problems appeared, and differences between the RDCs “
22
Researchers’ comments on authentication “Authentication was no problem for me. But for one of my colleagues on the project it would fail more than 50% of the time. It seems that his skin was too thin...!” “It was stressing as if it was not working for a certain number of time in a row I would have had to change the authentication card and it was time consuming.” “Passwords frequently expired and could only be reset by an assistant, who was not always available.” “There were problems of authentication due to the sensitivity of the fingerprint sensor.” “Difficult at the very beginning, but it was solve.”
23
Researchers’ organisation of work A large majority of researchers used remote access during a long period (several months) > mainly a choice A minority worked on a shorter period & on a daily basis. Most of them (2/3) indicate this organisation was a constraint Constraints linked to: monthly costs location of the point of access and need to travel (time and costs) organisation issues (need time to work on other projects, teaching, other activities…) For those working with researchers from other institutions in the country or outside the country
24
Checking of outputs : what about delays ? Delays are generally a few days but can last from a few minutes to a few days, depending on - the RDC - if intermediate outputs are checked or only final outputs - Complexity of analysis 73% of researchers happy with delays for output checking Comprehensive / positive comments “Ideally it would be quicker of course, but I understand the labour constraints.” “A few years ago output checking took several days. The service is improved, and takes 1 day nowadays” “ Understand it is a trade off” Researchers having previous different experiences happier
25
Yet negative comments on delays and on the way output checkings work “Such delay is not convenient for quality research - such delay is not acceptable for a paid service - we are not kept informed about why this takes sometimes so long and what is checked” “It’s expensive as we have very limited research resources” “RAs are generally not experienced enough to judge that output is correct.” “I understand that it is time consuming but (…) waiting just to learn that the program crashed is very frustrating. “
26
Output restrictions ¾ of researchers did not experience annoying restrictions on outputs. Several comments show that they understand that these restrictions have to apply and they adapt their work to these constraints. Kind of « learning process ». However… Contest about methodology when checking is also on some on intermediary outputs, should be only on final ouptus Some restrictions are judged excessive
27
Researchers’ comments on outputs restrictions Frequencies in cells “Outputs on max, mean, min were not allowed as they could breach confidentiality, which I felt was OK”. “Some cells in output contains less than 10 persons. This is forbidden, even if there is no risk of violating confidentiality” “Unable to report summary statistics (maxima and minima) and scatterplots because of (excessive?) “confidentiality requirements” -”which rarely really an indicated actual possibility to identify single entire, though, they acted partly too strict” Software for outputs, complexity of analysis, methodology “We were not allowed to produce and use certain graphics using Stata because the RDC could not check this thoroughly enough” “…Because our methodology was not accepted; but this is not what the remote access should be about: they should not check our methodology” Delays “Because of delays, we left some works” “We needed detailed information for mapping and further analysis; this wasn't always possible. Even though we did not intend to publish the data”.
28
Overall comments Positive “Remote access systems have improved substantially over the last years” Negative Timing issue taking into consideration the overall process “ We work for organizations that always want the result ASAP. It is simply unacceptable for them that you have to wait for sometimes a few months to get a project going and access the files” Particularly bureaucracy, accreditation process, time for checking outputs Costs “Prices for a small project too high” “Because of the slow process the analysis cost about ten times more that expected. Cost increased” “Restriction in the number of outputs”
29
Conclusions Preliminary conclusions Remote access friendly Security constraints mostly accepted by researchers Yet interpretation and solutions differently impact the researchers Work organization also problematic, even more for researchers involved in teams with other researchers from other institutions in the country or across borders
30
Results to be refined and complemented Only preliminary results to be refined Only a few researchers involved in multi-institutions/multi-countries projects/teams Get more researchers involved in such teams multinational projects/teams completing the on-line survey Further in-depth interviews to be conducted with researchers involved in such team (important for Virtual Reseach Environment for the DwB ERAN project) Idea to focus more on how researchers would work together access to microdata for all access to intermediary outputs comparing outputs from analysis conducted in different RDCs combining datasets from different RDCs to run a single analysis
31
Thanks for Listening Contact: contact@dwbproject.orgcontact@dwbproject.org Website: http://www.dwbproject.org/http://www.dwbproject.org/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.