- The EUI Library supports (i) data discovery (ii) data use and (iii) data preservation and sharing.
- Write to Thomas Bourke, [email protected] for support
- For ICT support, Ethics Committee review and DPO enquiries; use the email contacts on the Research Data Lifecycle page.
The EUI Library Data Portal provides access to Library-licensed macroeconomic, micro-socioeconomic and Europe-related databases. 72 data resource guides provide information on coverage, variables, user networks, online manuals and terms and conditions of use. The Data Portal also provides information about the Library's restricted micro data server.
Click below for information about data discovery; data use; data management plans; data protection and copyright; data preservation and sharing; data in EU-funded projects and the EUI Research Data collection in the Cadmus repository. For assistance, please write to [email protected] or visit one of the Library information desks: Badia Fiesolana BF-085 or Villa La Fonte VLF-026.
EUI members can submit their research datasets for inclusion in the EUI Research Data collection in the Cadmus repository. To submit a dataset for inclusion, please complete the Library's online form. Library staff will make an appointment for transfer of data and documentation.
1. Data discovery and the EUI Library Data Portal
2. Data generation, data protection, data processing and ethical use
3. Data management plans
4. Research data management in Horizon Europe, ERC and other EU programmes
5. Managing data during the research project cycle
6. Repositing and preserving data in the EUI research repository Cadmus
7. Open Data and Open Science
8. Qualitative data and ancillary materials in the humanities and social sciences
9. EUI infrastructure, software, protocols and support
10. International research data guidelines
This page is the hypertext version of the 2022 EUI Library Research Data Guide (10th edition).
1. DATA DISCOVERY AND THE EUI LIBRARY DATA PORTAL
This section describes access to EUI licensed data resources; access to restricted micro-socioeconomic data; terms and conditions of access and use; access to data hosted by other institutions, and resources for open data discovery.
1.a The EUI Library Data Portal
The Library maintains a Data Portal providing access to licensed resources for EUI members and introducing open data resources of relevance to EUI research themes. All 80 resources indexed in the Data Portal have an online guide with (i) data description and scope (ii) time-period coverage and release/wave information (iii) supporting resources (online manuals, software transfer routines, user networks) and (iv) terms and conditions of access and use. Modes of access for each database are described in the individual resource guides linked from the Data Portal. There are five sub-directories:
Databases indexed in the EUI Library Data Portal have diverse modes of access, which are described in detail on each resource page: Proxied internet HTML interface (mostly for macroeconomic, political economic and historical resources); Library restricted data server (mostly for micro-socioeconomic resources) and; Desktop application: eg. Datastream via Programmes and Compustat via Stata. Macroeconomic data providers include Databanks, ECB, Eurostat, Global Financial Data, IMF, OECD, Refinitiv-LSEG, Standard & Poor’s, Statista GmbH, World Bank, UN and WIIW. Micro-socioeconomic data providers include: DIW, Eurostat, GESIS, ICPSR, Moody’s Analytics-BvD and UKDS.
1.b Registration for access to Library-hosted micro-socioeconomic data
EUI members can register for access to micro data hosted by the EUI Library as follows:
- Complete the micro-socioeconomic data registration form (selecting the required dataset from the drop-down menu)
- A separate registration form is required for each micro dataset
- Registered users are given access to the Library’s restricted data server
- For some providers, parallel registration with data issuers is required
- Specific instructions are provided under 'How to access' on the individual micro data description pages linked from the Data Portal
- Short-term EUI visitors do not have access to EUI-hosted micro data.
1.c Access to Eurostat micro-socioeconomic data
The EUI is an Eurostat-approved research data facility. Access to Eurostat collated data (EU Adult Education Survey, EU Labour Force Survey, EU Statistics on Income and Living Conditions and European Community Household Panel) requires individual end-user contracts with Eurostat:
- Complete the Library's micro-socioeconomic data registration form, selecting the Eurostat dataset required from the dropdown menu (EU-AES, EU-LFS, EU-SILC, ECHP)
- EUI scholars must create an EU Login to apply for Eurostat micro data access
- Special note: Eurostat informs that for EUI members, it is necessary to use a non-institutional (non-EUI) personal email account for the creation of the EU Login
- It is not necessary for all members of a research team to create an EU login - only the person submitting the application
- Before completing the application, consult the relevant Eurostat micro data resource page for details of variables, reference years and population
- Applications for data access are processed via the online Microdata Access Workflow Tool
- When completing the application, EUI researchers should enter the name of their thesis supervisor as Principal Investigator (P.I.)
- For institutional Contact Person indicate 'Dr. Thomas Bourke'
- Write to [email protected] requesting (i) the Research Entity ID and (ii) the standard description of the EUI's secure data infrastructure and protocols
- The names and email contact details of all scholars on the project must be included in the form (researchers, fellows, professors)
- The applicant uses the Microdata Access Workflow Tool to initiate the application procedure. Full instructions are online
- [After receipt of the application, the Eurostat Technical Unit may provide feedback and/or requests for further information about the project]
- The submitter should monitor their EU Login account for updates regarding the application. (Library staff do not have access to individual EU Login accounts)
- If the application is approved by Eurostat, the applicant should print the research project proposal (RPP) and confidentiality declaration(s)
- The Principal Investigator initials all pages of the RPP, and signs and dates the final page
- The application is sent to the institutional Contact Person for signature: [email protected]
- Separate confidentiality declarations must be signed by (i) the Principal Investigator (ii) the Applicant(s) and (iii) the Contact Person
- The applicant scans the signed documents and uploads to the EU Login account
- Eurostat confirms receipt. This must be copied to [email protected]
- The Library provides access to the restricted data server.
Eurostat informs that no person(s) other than those named in the contract can access the data. Breach of contract conditions may lead to the withdrawal of the EUI's research entity recognition, and the potential loss of access to Eurostat micro data by other EUI scholars. There is no off-campus access to Eurostat micro data. Short-term visitors to the EUI Library do not have access to Eurostat micro data. Member State statistical agencies and/or Eurostat reserve the right to decline access to Eurostat micro data. At the end of the project contract period, Eurostat will contact the P.I. requesting a signed confirmation of destruction of local copies/sub-sets of the data (Section 5.e below). Publications based on the data should be reported via the Eurostat online form.
1.d Terms and conditions of use and database copyright
Access to, and use of, databases provided by the EUI Library are subject to contractual license agreements and database copyright terms. Full details are on the Library’s Terms and Conditions’ web page. Data users are individually responsible for compliance with terms and conditions of access and use. Violation of license terms puts at risk other EUI members’ future access to data resources and may expose the university to financial penalties. EUI users may not distribute, or allow any other party to have access to data which is provided under license. Users may not use licensed materials for commercial purposes. In some cases, the sharing of micro-socioeconomic data with journal referees is not permitted. Please check individual data contracts for details. [PDF version of the 2022 EUI Research Data Guide]
1.e Annual call for EUI early stage researcher funding applications
The EUI Dean of Research issues an annual call for early stage researcher (ESR) funding support. EUI researchers and fellows can apply for financial support to cover some, or all, of the costs incurred in data collection and data generation (surveys, interviews, fieldwork, experiments &c.). ESRs should check the funding opportunities intranet page under 'Internal (EUI) funding opportunities', for the text of the call, application form, FAQs and updated deadline. Applicants must take, and pass, the online CITI training module 'Essentials of Grant Proposal Development' and (for empirical projects dealing with human-subject non-secondary data) 'GDPR for Research and Higher Ed' before the deadline of the ESR call. Some projects may require EUI Ethics Committee approval (Section 2.b below). ESR call applicants may request clearance after the funding decision - but before the commencement of data collection and processing. For data acquisition requests, applicants should first check the Library Data Portal for availability. If the required data is not available, write to [email protected] asking whether the Library can contribute to the cost of data acquisition. The reply of the Library Director should be included in the ESR funding application. The application form and full details are provided to all EUI researchers and fellows by email at the launch of the call. Each academic unit ranks the proposals and a Selection Committee decides on financial support.
1.f Access to data at other institutions and open data resources
EUI members who require access to restricted data at another institution (data centre, statistics agency, university, archive) should apply for access early in their research projects because application periods can be lengthy. Some sociological, qualitative and micro geo-coded data can only be accessed at issuers’ secure on-site facilities. See Section 2.f below for details of the Library’s third-party data contract support. EUI members who require access to unpublished datasets (eg. underlying data associated with a publication) should contact the Library before writing to data creators/owners. In some instances, it may be possible for EUI members to obtain access via library consortia. Scholars, government agencies and international organisations increasingly share data, codebooks and software via the internet. Open research datasets can be located via Google Dataset Search and the re3data registry of research data repositories which is indexed by discipline, sub-discipline, data type and host location.
2. DATA GENERATION, DATA PROTECTION, DATA PROCESSING AND ETHICAL USE
This section provides information on data protection requirements for scholars who generate and process sensitive data; the EUI Ethics Committee review procedure; anonymisation and pseudonymisation techniques; the Library Secure Data Room and, Library support for third-party data contracts.
2.a Data protection
Data Protection at the EUI is governed by President’s Decision No.10/2019 which was introduced following the adoption of the EU General Data Protection Regulation (GDPR). The EUI Guide to Good Data Protection Practice in Research provides further details of data protection at the EUI. Special terms and conditions apply to access and use of personal data, including micro-socioeconomic and qualitative data. This reflects the sensitive nature of observations about human subjects, families and households. Data protection provisions apply to (i) personal data generated by scholars for research purposes (ii) licensed micro-socioeconomic data provided to EUI members by the Library and (iii) data provided directly to EUI members under third-party contract.
EUI principle investigators are responsible for compliance with the laws and regulations governing data protection and database copyright. In collaborative projects, the P.I. may delegate such responsibilities to a member of the research team. Scholars are subject to personal data protection regulations when generating, processing, collecting and storing personal data. When using pre-existing datasets, scholars must not attempt to identify any individual, family or household. The processing of sensitive data relating to health, racial/ethnic origin, religious beliefs, political opinions, genetic/biometric observations, sexual orientation, memberships &c. requires even stricter safeguards. The EUI Guide to Good Data Protection Practice in Research states: “As a rule, the processing of sensitive data is prohibited. However, Article 8 of the EUI’s Data Protection Policy provides for specific circumstances which allow for the processing of sensitive data. The most common in research is upon the data subject’s explicit consent” (p.9). Persons, families and households cannot be identifiable in any subsequently published research dataset. It must not be possible to identify human subjects by combining observations with external sources (eg. using triangulation techniques). Anonymisation and pseudonymisation are described in Section 2.d below.
When collecting, processing, using and storing personal data; scholars are responsible for obtaining and documenting the informed consent of subjects. Consent must be intelligible; it should refer clearly and precisely to the scope and the consequences of the data processing, as well as the retention period. EUI scholars can use the sample consent form. Where written consent is not possible, audio or video recording may be used. Human subjects must be informed of their rights, and scholars should consider risks to data subjects before embarking on personal data gathering (eg. via interviews or questionnaires).
2.b EUI Ethics Committee review: processing and use of sensitive data
The collection and processing of personal data for research purposes may require EUI scholars to apply for Ethics Committee review before data collection and processing can begin. Principal investigators should consult the online Initial Checklist and the supporting documentation on the EUI information page for Ethics and Integrity in Academic Research. EUI members can initiate the review via the left menu link (intranet login required) before data collection begins, because approval cannot be granted retroactively. The Ethics Committee review can take up to six weeks.
EUI members requiring access to the survey tool Qualtrics should write to the HelpDesk to request an individual account. Users should read the EUI Declaration on the Processing of Data via Surveys before collecting personal information. In some cases, Ethics Committee review may be necessary before a survey can begin.
Since March 2022, EUI members have access to the online Collaborative Institutional Training Initiative (CITI) platform, providing online training on research ethics including: Human subjects research; Responsible conduct of research; Observational research protocols; General Data Protection Regulation (GDPR) for research, and Essentials of grant proposal development. EUI members can create a CITI account (username and password) via the EUI intranet (login required) selecting ‘European University Institute’ from the drop-down menu.
2.c Information Security Risk Assessment
EUI scholars and principal investigators who require ICT Service approval for information security should complete the Information Risk Management Questionnaire (EUI sign-on required) and send it to the Data Secirity Officer (DSO) via: [email protected] Information Security Guidelines regarding compliance with data protection requirements, and the checklist of security controls, are also provided. The information will be used by the DSO to assess the project risk level and to provide advice on the adequate safeguards to implement.
2.d Anonymisation and pseudonymisation
Throughout the research data cycle, scholars are responsible for preserving the confidentiality of human subjects, families and households. Sensitive data must be anonymised so that persons, families and households cannot be directly or indirectly identified. Anonymisation techniques include; the removal of direct identifiers from datasets (names, workplaces, telephone numbers &c.); the aggregation, banding or reduction of the precision of variables (eg: year of birth rather than year/month/date of birth); the reporting of general levels of observation (eg. replacing specific geo-codes with general locations); the generalisation of categories (eg. sector of employment rather than precise employment function); and the removal of the upper or lower ranges of observations.
Pseudonomysiation can be used when it is deemed necessary to retain personal data for future processing (eg. a follow-up survey). Pseudonomysiation entails the substitution of personally identifiable information with a unique identifier not connected to the human subject’s real identity - using unique, randomly-generated, numeric or alphanumeric identifiers. The data and the key file must be kept separate and secure at all times. Pseudonymised data must be treated as personal data in accordance with GDPR. The use of cloud computing services must not reduce the level of data protection. Data can be encrypted with a tool such as TrueCrypt.
2.e Library Secure Data Room and security protocols
The EUI Secure Data Room is located on the upper floor of the Library, Badia Fiesolana (BF-169). The Room is only accessible to scholars processing sensitive data, or using data from providers whose access contracts explicitly require that data be accessed and used in a locked room. The room is not for ‘general’ data access and use. New EUI data users who require access to the Secure Data Room should send a copy of their data access contract to [email protected] Authorised users can book a computer (Monday to Saturday) via the Library’s online form. New users can select from Workspaces #1, #2, #3 or #6. (Workspaces #4 and #5 are reserved for designated datasets.) To access the Secure Data Room, scholars should enter the Library via the Badia loggia, ascend to the first floor and show their EUI card at the Library Loan Desk where the key will be issued. The key must be returned to the Loan Desk at the end of the working session.
2.f Third-party data access contract support
Many third-party data providers require individual end-user contracts. The Library provides support for data access applications, contract workflow and data management. Send the following information to [email protected]
- Name of data issuer/publisher
- Title of dataset, with internet link (if available)
- Names of EUI and external research team members, with email contacts
- What special conditions are required for access to the data (eg. remote protocol)?
- What special conditions are required for use of the data?
- Does the data issuer require a contract counter-signature from the EUI DPO?
- Does the data issuer require that the data be used in the EUI’s Secure Data Room?
- Does the data issuer require that the data be used on a stand-alone, off-line computer?
- For projects with non-EUI collaborators: What controller/processing operations will be undertaken by research partners?
Contracts may require the counter-signature of an EUI administrator - normally the Library Information Specialist for Economics or the Data Protection Officer. The contact person for data protection enquiries is normally the Data Controller of the EUI Department or Centre (i.e. Head of Department/Centre). For highly sensitive data, the counter-signature of the EUI ‘authorised representative’ - President, Secretary General or delegate - may be required.
Data provided by third parties can be elaborated on a provider’s infrastructure, or (where permitted) can be transferred to the EUI. When sensitive personal data is transferred to the EUI, users have to complete the EUI ‘Notification of Processing Operations’ form, available from the office of the EUI Legal Advisor. In cases of joint research collaborations between EUI members and scholars at other institutions (eg. in the context of research consortia) it is important to distinguish between the ‘data controller’ function and the ‘data processor’ function - and to indicate the division of labour between scholars at the different institutions. The General Data Protection Regulation distinguishes between ‘data controller’ (eg. principal investigator) and ‘data processor’ (eg. PhD researcher). The GDPR defines ‘controller’ as a “natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data” (Article 4(7)). The GDPR defines ‘processor’ as a “natural or legal person, public authority, agency or other body which processes personal data on behalf of the controller” (Article 4(8)). If required by the data issuer, the EUI ICT Data Security Officer can provide a description of ICT security protocols and infrastructure: [email protected]. Further information about the contract support service is available by writing to [email protected].
3. DATA MANAGEMENT PLANS
Data management plans (DMPs) are documents normally required by research funders at the beginning, mid-point and end of research projects. EUI members who are required to submit a data management plan should contact the Library for assistance. Science funders normally require an initial DMP, a revised DMP at the mid-point of the research project and a definitive DMP as a project deliverable.
3.a Generating a data management plan (DMP)
The principal investigator, or project manager, or a delegated member of the research team, is responsible for writing and delivering the data management plan. The following supporting materials - provided by the EUI Library, the Legal Office, the Academic Service and the Data Protection Office - should be consulted during the preparation of EUI DM
Scholars preparing a data management plan should carefully follow the instructions provided by the relevant science programme office (eg. Horizon Europe, European Research Council &c.). Where a DMP template is provided, this must be used, for example: the Horizon Europe DMP template in the Horizon Europe Programme Guide or the ERC DMP template in the ERC Work Programme.
The following sections provide advice and prompts which can be used when preparing a data management plan. These guidelines are general in nature. Every individual data management plan must be a unique document, grounded in the individual research project. It is the responsibility of the P.I. to write and deliver a comprehensive and accurate data management plan. The Library provides feedback and an editing service for DMPs. Data management plans describe:
- How data is collected and generated
- How data is used, elaborated and organised
- How data, and data subjects, are protected
- How data, code and ancillary elements are described and documented
- How data is stored and secured, and how long it will be retained
- How dataset authorship and credit are assigned
- How data is preserved
- How, whether and under what terms, project data outputs can be shared.
The FAIR data principles - to make data findable, accessible, interoperable and reusable - must be considered during the preparation and revision of data management plans. In international collaborative projects, the name of the person who has final authority regarding decisions on credit for dataset creation and the sharing of data outputs should be indicated. Unless otherwise stated, the principal investigator undertakes this role. It is important to update DMPs throughout the research project, incorporating new information about data collection, data generation, methodologies and relevant changes to the composition of the research team or consortium.
3.b EUI support for principal investigators and project managers
EUI principal investigators and Project Managers can use the general prompts in the grid below when compiling a data management plan. If the science funder provides a DMP template, this should be used. The data management plan must include detailed, project-specific information before submission to the science funder. During preparation of the DMP, EUI members may need to liaise with Library staff, the Data Protection Office, the Academic Service (if Ethics Committee approval is required) and/or the Legal Office. The Library provides feedback and an editing service for DMPs: [email protected] The following elements should be provided in a data management plan:
Overview of research project
- Provide an overview of the research project, indicating discipline/sub-discipline, scope, methodology and primary research questions
- List the types of data being collected, generated, processed and used.
Anticipated data outputs and utility
- Describe expected data outputs
- Distinguish (where applicable) between (i) in-project data which will not be shared and (ii) post-project research dataset outputs (public or embargoed)
- Describe expected community utility; eg. for researchers, policymakers, media organisations and the general public
- Describe possible inter-disciplinary use
- Describe data quality assurance processes.
Resources required for data collection, generation, processing and use
- Provide a detailed description of data collection, generation, processing and use
- Describe anticipated technical and research assistance requirements and data support
- Give an overview of the data management responsibilities of research project leaders, team members and any institutional partners.
Data security, infrastructure and protocols
- EUI members should write to [email protected] to request the standard Library/ICT description of data security, infrastructure and protocols. This internal document describes layered physical and network security, authentication protocols and other elements for inclusion in data management plans.
- Describe data protection measures in place to scrupulously guarantee that persons, families and households are not identifiable in any dataset outputs
- Pre-project EUI Ethics Committee review may be required for survey, ethnographic, qualitative, experimental and other data collection, generation and processing
- The input of the EUI Data Protection Office and/or Legal Office may be required
- List pre-existing data resources to be used, and acknowledge database copyright compliance.
FAIR Data Principles making data (i) Findable (ii) Accessible (iii) Interoperable and (iv) Reusable
Making data Findable
- Identify the repository where data outputs will be preserved and, where possible, made openly available; eg. the EUI research repository, Cadmus
- Describe how folders, files, variables and versions will be consistently named to aid discovery
- Datasets should be assigned accurate and consistent metadata to aid findability and machine-retrievability. EUI members should refer to Section 6.c of this Guide for metadata schema details. EUI Cadmus repository metadata are indexed for findability
- Datasets should have persistent, unique digital object identifiers (eg. handles generated by the EUI research repository, Cadmus).
Making data Accessible
- Distinguish between in-project data which will remain closed (or will be destroyed) after the project; and data outputs which will be preserved and (where possible) openly shared after the project
- For restricted data, state reasons for restricted status (eg. personal data protection and/or database copyright compliance)
- For sharable data, the status ‘open’ (accessible immediately) or ‘embargoed’ (accessible at a later date) should be assigned
- Describe data protection and anonymisation measures in detail
- State how the data can be accessed; indicating the software packages/tools required to access the data
- Metadata records are indexed in the EUI research repository, Cadmus, which provides embedded machine-readable metadata for effective indexing by major search engines and application programming interfaces (APIs) for metadata exposure (eg. Google Scholar, OpenAIRE and CORE).
Making data Interoperable
- Provide details of any standard controlled vocabularies (ontologies, thesauri and taxonomies) from the discipline (social sciences, ethnography, economics &c.)
- Variable schemas should be readable by standard software packages to facilitate data interoperability
- Standard metadata and naming conventions should be used
- The EUI research repository, Cadmus, uses web standards (the underlying DSpace software uses Java script) providing both human- and machine-readable interfaces to search, discover and access reposited data.
Making data Reusable
- Open data generated by the research project should be made available under an open licence that clearly states reuse conditions – either the Creative Commons Attribution International license: CC-BY International or the Creative Commons Public Domain Universal license: CC0
- Supporting documentation and codebooks (PDF/A documents and ‘readme.txt’ files) should be reposited with the data, to make the observations comprehensible and reusable by other scholars and stakeholders
- Provide information about any tools or instruments necessary to reuse or verify the data (software, algorithms, models, routines &c.)
- Note any data embargo period necessary after the research project
- Note any time limit on retention of the data.
EUI research data guidelines and standards
EUI members can cite these institutional guidelines in their data management plans:
3.c Data management planning tools
Online tools can be used to prepare structured data management plans - complying with EU Horizon Europe, European Research Council, and other science funder DMP requirements. The Library provides support for these tools:
- DMPonline (maintained by the Digital Curation Centre) can be accessed via ‘Create Account’ at the upper right of the DMPonline homepage. For organisation enter ‘other’. Select the science funder's template. The principal investigator (P.I.) should be identified in the data management plan. For research teams, the P.I. can assign DMP sharing rights by entering colleagues’ email addresses and assigning the status of ‘co-owner’, ‘editor’ or ‘read only.’ Enter project details (title, abstract, &c.) and click on ‘Initial DMP’ in the top menu. Complete the sub-fields to generate the data management plan. A Word document can be generated. If the science funder provides a DMP template (eg. with logo), the text generated in DMPonline can be transferred into the funder's template.
- The Research Data Management Organiser (RDMO) is an online tool, available in German and English, to support research data management. RDMO is maintained by the Deutsche Forschungsgemeinschaft (DFG).
- Argos is an online DMP creation tool maintained by OpenAIRE. Create an account on the Argos platform, click ‘Launch Wizard’ and follow the step-by-step instructions.
- The DMP Evaluation Rubric developed by Science Europe, provides core criteria which can be used by principal investigators and project managers to evaluate data management plans.
- FAIRsharing is a data support service of the Data Readiness Group and the Bodleian Library, University of Oxford, providing resources for data and metadata standards and research data management.
- The F-UJI Automated Fair Data Assessment Tool maintained by the Fostering FAIR Data Practices In Europe project, helps scholars comply with the FAIR Guiding Principles for scientific data management that ensure data are - and remain - findable, accessible, interoperable and reusable (FAIR).
- FAIR-Aware maintained by the Data Archiving and Networked Services Institute is an online checklist tool to help researchers build compliance with the FAIR data principles.
- The DMP Catalogue maintained by LIBER, provides examples of research data management plans.
4. RRESEARCH DATA MANAGEMENT IN HORIZON EUROPE, ERC AND OTHER EU PROGRAMMES
EUI principal investigators and project managers who are preparing EU Horizon Europe and European Research Council (ERC) grant proposals are required to submit preliminary information about data management provisions in the research project proposal. The Horizon Europe Programme Guide and the ERC Work Programme provide guidance for project proposers and details of data management obligations for principal investigators.
Data management sections of project proposals are evaluated by the European Commission under the criterion ‘impact’, which includes scientific and societal benefit. When completing the general information section of EU funding applications, principal investigators and project managers should address: (i) What types of data will the project generate/collect? (ii) What standards will be used? (iii) How will the data be exploited and/or shared for verification and re-use? (iv) If data outputs cannot be made available, explain why (v) How will the data be curated and preserved? The EUI Library assists principal investigators and project managers with both the data management components of funding proposals, and the subsequent data management plans (Section 3 above).
In the Horizon Europe programme, open science - which includes open access to research data - is evaluated under the ‘excellence’ criterion. The European Commission’s recommendation for sharing research data outputs is: “As open as possible; as closed as necessary”. The Horizon Europe programme requires grant beneficiaries to deposit open research data outputs in a research data repository which is federated with the European Open Science Cloud (EOSC). The EUI research repository, Cadmus, can be used for the reposit of EUI Horizon Europe and ERC project data outputs. The EC guidelines state that “as far as possible, projects must then take measures to enable third parties to access, mine, exploit, reproduce and disseminate (free of charge for any user) this research data. One way of doing this is to attach Creative Commons Licences (CC BY or CC0) to the data reposited” [Grants Manual p.10]. Project results can also be shared via the Horizon Results Platform.
The European Open Science Cloud portal was launched in 2018. The EOSC portal enables the discovery of data, provides tools for data analysis and information about storage, computation, training and security. Content and services are available via the EOSC catalogue. The European Commission has published two supporting documents: Prompting EOSC in Practice, which explains the governance of the service, and Turning FAIR into Reality which explains how to make research data ‘findable, accessible, interoperable and reusable.’ The European University Institute endorsed the European Open Science Cloud Declaration in 2017. The Social Sciences and Humanities Open Cloud (SSHOC) project was launched in 2019 to explore solutions for greater interoperability between data repositories in the social sciences and the humanities. 47 institutional partners are involved in the 2019-2022 project phase, under the coordination of the Consortium of European Social Science Data Archives (CESSDA).
5. MANAGING DATA DURING THE RESEARCH PROJECT CYCLE
Data should be carefully managed throughout the duration of the research project. Particular attention should be given to data input, quality control, dataset design, data protection, data security and backup during the project. Early in the research project, scholars should define dataset structure, file structure and variable naming. Documentation and codebooks should be updated throughout the research project. The following guides are useful for scholars working in the social sciences and humanities: CESSDA Data Management Expert Guide; Finnish Social Science Data Archive Guide; JISC Research Data Management Toolkit and; UKDS Prepare and Manage Data Guide.
5.a Data input and quality control
The collection and generation of data should be informed by high standards of quality control. Datasets generated from fieldwork must be accurate, complete and consistent – and must adhere to data protection legislation and the ethical best-practices of the discipline or sub-discipline. During the research project, scholars should keep an accurate record of data inputs and data generation. Scholars undertaking surveys and experiments should ensure that anonymisation techniques are documented. Where research datasets are partially based on pre-existing resources, scholars should ensure that the underlying sources are of high quality and that there are no infringements of copyright. Write to [email protected] for assistance.
Scholars must comply with the data protection requirements described in Section 2 above. Sensitive data must be anonymised so that persons, families and households cannot be directly or indirectly identified in any research data outputs. In those cases where it may be necessary to retain personal data for future research use, scholars may use pseudonomysiation techniques (Section 2.d above). Scholars must comply with any end-of-project data destruction requirements indicated by providers (Section 5.e below).
5.b Secure storage and network backup during the research project
During research projects it is important to keep data secure at all times. The EUI ICT Service maintains a strong password policy to prevent the loss, exposure, or corruption of sensitive information. The ICT Service advises that the personal G:\ network drive should always be used to store work-related data. The drive is backed up centrally, and can be accessed off-campus via VPN. When collaborating with other EUI scholars, a shared drive may be used (eg. S:\). Contact the local ICT Site Office for details of drive paths, which vary across EUI departments and centres. Full details about EUI drive quotas are provided by the ICT Service. EUI members also have a personal C:\ drive for storing non-work related data (photos &c.). Note that the C:\ drive is not backed up centrally. The EUI protects computing devices by applying patches and performing frequent data backups. Computers are protected by anti-malware solutions which are updated regularly to prevent malware infections. Preliminary data findings and associated documentation should be kept in locked storage when not in use.
5.c Structure, folders, files, variables, format and versioning
The design of research datasets should be carefully considered at the outset of the project. Dataset design varies by discipline, types of data and variables, medium of data, units of analysis, methodology, relationship between data elements, and whether or not the dataset is part of a series. Clear and consistent metadata for folders, files, variables and versioning helps make research data findable, accessible, interoperable and re-usable; FAIR data principles.
The folder structure of research datasets (eg. hierarchical/horizontal) should be considered early in the project. Qualitative datasets containing text, interviews, images &c. may require individual files for every element. File names should be standardised and consistent, eg: date, descriptor, version. Variables, such as age, country and sex should be clearly tagged, avoiding special characters and spaces. Temporary identifiers should be removed from the schema. Files should be systematically named using a standardised and consistent versioning system, eg. YYYY-MM-DD. Changes to the structure should be documented. The software format of the dataset should facilitate flexible use of the data. Scholars using one format during a research project, may consider a different format for preservation - taking into consideration open source accessibility. It is best to avoid future file obsolescence by avoiding proprietary formats, where possible. For example .rtf instead of .doc/.docx; .tif instead of .jpg; .flac instead of .mp3 &c. Details of how to submit dataset outputs to the EUI research repository, Cadmus, are in Section 6 below.
5.d Documentation and codebooks
Clear and accurate documentation should be provided about the purpose, context and methodology of the research project and the data outputs. Good documentation helps make datasets findable, accessible, interoperable and re-usable (FAIR principles). Documentation should include a detailed description of the dataset, providing information about structure, folders, files, variables and versioning. Information about definitions, variable names, problematic values, missing observations and weightings should be included. A concise note on methodology should be provided, along with information on how and when the data was collected or generated, and elaboration techniques (sub-setting, combining &c.). At the end of the research project; codebooks, questionnaires and data dictionaries should be reposited with the data.
5.e Deletion of sensitive data at the end of the research project
At the end of a research project, it may be necessary to permanently delete data used in the elaboration phase. Many micro-socioeconomic data providers (eg. Eurostat) require deletion for work-files and non-aggregate outputs. EUI members can use software such as Disk Wipe which performs byte-by-byte deletion of data, ensuring that data on the drive cannot be retrieved. If the provider of the underlying data requires a compliance statement, the name of the software used, the location of the device, the name of the drive and the time of deletion should be recorded. Contact the EUI ICT Service for assistance with deletion software, and the Library for further information about compliance.
6. REPOSITING AND PRESERVING DATA IN THE EUI RESEARCH REPOSITORY CADMUS
EUI research data can be preserved in the research repository, Cadmus, which is an infrastructure for EUI research outputs, a platform for sharing research data and a solution for funder-mandated research data management compliance. The EUI workflow is explained in Bourke, T. (2022). Bibliographic Control of Research Datasets: reflections from the EUI Library, JLIS.It, 13(1), 321–334.
6.a Preparing data for reposit in Cadmus
Datasets presented for reposit in the EUI research repository, Cadmus, must be compliant with data protection regulations pertaining to persons, families and households. Research datasets which are partially based on pre-existing sources must not violate database copyright provisions. Library staff can advise on compliance. The European Commission published a report on Open Science and Intellectual Property Rights in 2022. EUI members submitting to the Cadmus repository should create a DATA folder and a DOCS folder. The DOCS folder should contain precise documentation about the methodology, context and discipline of the research project. Where applicable, codebooks should also be presented for reposit. Documentation should be in PDF/a format for preservation. Data should be submitted in the original file format version. Large datasets should be submitted in a compressed archival format (eg. .zip, .tgz). Subsets should be accommodated within the DATA folder - not submitted as multiple repository entries. If new iterations of a dataset are planned, scholars can consult with Library staff to determine whether these should be added to the original Cadmus entry (with updated date range), or reposited with a new Cadmus entry. The uploading of datasets and documentation is done by Library staff, after the submission of the online submission form. The archiving of qualitative multi-media outputs (audio, transcripts &c.) is described in Section 8.e below.
6.b Completing the dataset submission form
EUI researchers, fellows and professors, from any discipline, can submit research data outputs for reposit in the Cadmus repository. Datasets presented for inclusion must be the output of research by a current EUI member, or an EUI research team, or a team of researchers with at least one EUI member. The first step is to complete the online submission form. Library staff will use the submitted information to generate a formal metadata description in the Dublin Core schema, which will be used to generate the description page for the dataset in Cadmus. This process is essential to render the data findable, accessible, interoperable and reusable. Library staff will then make an appointment for data transfer. The name of the principal investigator, researcher(s), and - where appropriate - technical collaborator(s), must be provided. The email contacts of all collaborators must be provided. If the project is undertaken in the context of a consortium, the name of the data manager should be provided (if different from the principal investigator).
By submitting the online data submission form, the P.I. acknowledges that the dataset presented for inclusion in the Cadmus repository is the output of original data generation and/or collection; or the output of significant, value-added, elaboration of pre-existing sources. If the dataset is the output of original data collection and elaboration, details must be provided. If the dataset is derived from pre-existing sources, these sources must be listed (publisher, rights owner and/or institutional source). Creators of research data outputs which have been elaborated from pre-existing copyrighted sources may need to obtain the permission of rights owners before the research data output can be openly shared. Write to the Library for assistance. It is not possible to publish a dataset containing substantial portions of data sourced from pre-existing databases governed by contractual license. The EUI Library also assists scholars to reposit datasets in the discipline-specific data repositories indexed in the international re3data registry and by Google Dataset Search.
EUI members submitting data to the Cadmus repository should state whether or not the dataset can be shared as open data or will be subject to embargo. Library staff can assist. In some cases it may be possible to create a limited, or aggregated, public version of the dataset for open sharing. Embargo status can change over time.
Metadata are ‘data about data’ presented in a formal schema. Accurate metadata are necessary for the organisation, use, repositing, sharing and machine-discovery of datasets. Throughout the research cycle, it is important to keep a detailed and updated record of data capture, generation and elaboration. An introduction to metadata standards for social science and humanities’ data is maintained by the Digital Curation Centre. Metadata should be consistent throughout the research project. Some research data outputs may require multi-lingual metadata. EUI Cadmus metadata are generated by Library staff, using information from the online submission form. These are the principal fields:
NAME(S) OF DATASET CREATOR(S)
The name, or names, of the scholars and technical collaborators who created the dataset must be provided. The name of the principal investigator must be given if the dataset has been created by a research team. If the project is undertaken in the context of a consortium, the name of the data manager must be provided (if different from the P.I.). Where researcher IDs are available (eg. ORCID) these should be provided.
The email contacts of the dataset creator(s) must be provided. Datasets presented for inclusion in the EUI Cadmus repository must be the output of research by a current EUI member, or an EUI research team, or a team of researchers with at least one EUI member.
TITLE OF DATASET
The title should precisely convey the nature and scope of the dataset. The title should not be identical to the project title, or a related publication title.
DESCRIPTION OF DATA
A meaningful abstract, describing the data, the purpose and scope of the research project, and the methodology, must be provided. The Library can help edit data abstracts.
SOURCE(S) OF DATA
The source(s) of the data must be provided. If the dataset has been generated during a research project, this should be indicated with details of data collection methods (eg. survey parameters). If the dataset is derived from pre-existing resources, these must be listed.
TYPE OF DATA
The type of data must be indicated: eg: statistical; textual; computational; experimental; simulational, observational &c.
YEAR OF COMPLETION OF DATASET
The date of completion of the dataset must be provided. If the dataset is part of a series, this should be indicated.
DATE-RANGE COVERAGE OF DATASET
The start- and end-dates of dataset coverage must be provided.
GEOGRAPHICAL COVERAGE OF DATASET
Where applicable, the geographical scope of the dataset (number of countries, sub-states, regions &c.) should be indicated.
FORMAT OF DATA
The file format and version must be given (.csv, Excel, .flac, .rtf, Stata 17, .tif .txt &c.).
CODEBOOK / SUPPORTING DOCUMENTATION
Codebooks and supporting documentation should be provided in PDF/A format.
The status of access to the data must be indicated. The status ‘open data’ should be assigned to datasets which can be shared on reposit. If the data is subject to embargo, the expiry date of the embargo should be indicated.
Dataset submitters should choose a license option from the drop-down menu: CC-BY international or CC0.
UNIQUE OBJECT IDENTIFIER
Library staff will assign a unique object identifier (handle) to the dataset for the purpose of discovery, linking and citation.
Where applicable, multi-lingual documentation, tags, questionnaires and variable descriptions should be provided.
The name of funding bodies and research grant numbers should be provided, where applicable.
Bibliographical details of publications based on the dataset, if any, should be listed with links to abstracts and, where possible, full-texts. (The Library can also cross-index publications and datasets after reposit.)
PROJECTED FUTURE WAVES OF DATASET
In cases where it is intended to generate future iterations of the dataset, details should be provided.
The Cadmus repository generates standardised citations for datasets, eg:
BADULESCU, Claudia, TELLE, Stefan, Differentiated integration and the EU member states: salience and governmental positions, EUI Research Data, 2021, Robert Schuman Centre for Advanced Studies. Retrieved from Cadmus, European University Institute Research Repository, at: https://hdl.handle.net/1814/70895.
When submitting details of research publications to the EUI Cadmus repository, EUI scholars should cite datasets which support the research findings. Further information about data citation is provided by DataCite.
7. OPEN DATA AND OPEN SCIENCE
Scholars, government agencies and international organisations increasingly share datasets, codebooks and software via the internet. Major data repositories are indexed in the international re3data registry and Google Dataset Search. The EUI Library provides comprehensive support for open access publishing and open science. In 2021, the OECD updated its recommendations on access to research data from public funding.
Accurate dataset metadata help scholars determine whether, when, how, where and under what terms, research data outputs can be shared as open data. Not all research data outputs can be openly shared at the conclusion of a research project. The two most important considerations when determining whether a research dataset can be made openly available are: the protection of data pertaining to human subjects (Section 2.a) and database copyright (Section 1.d). Library staff can help EUI members determine whether and when a dataset can be openly shared. Data access status may change over time. Data can be made openly available on reposit; data can be embargoed for a defined period; data can be restricted to on-site (‘safe room’) access; data can be subject to pre-access registration terms or; data can be subject to user contract (sometimes requiring a project proposal). In cases where underlying data observations cannot be shared, it may be possible to create a limited or aggregated public version. Data can also be reposited solely for preservation purposes (‘dark archive’).
Scholars may pre-register research plans before substantive work on the project commences. The Berkeley Initiative for Transparency in the Social Sciences provides resources for research data openness and reproducibility.The GitHub hosting service (a subsidiary of Microsoft Corp.) provides resources for software development, version control and code hosting and sharing.
8. QUALITATIVE DATA AND ANCILLARY MATERIALS IN THE HUMANITIES & SOCIAL SCIENCES
This section treats qualitative data generation, use and access in the humanities and social sciences. Examples of qualitative data include; audio and audio-visual interviews, transcripts, images, minable text, survey diaries, field notes and multimedia materials. The definition of ‘data’ varies across academic disciplines (eg. history, ethnography, anthropology, sociology &c.) - especially where there is a mix of qualitative and quantitative methods. Data collection methodology, dataset structure and data documentation should conform to the best practices of the discipline in which the research is undertaken. Learned societies, academies and international research associations provide guidelines by discipline.
8.a Protection and ethical use of qualitative data
Scholars generating, processing and sharing qualitative data should pay particular attention to data protection and ethical use provisions described in Section 2 of this Guide. The EUI Guide to Good Data Protection Practice in Research also provides details of data protection at the EUI. Scholars are subject to data protection legislation when collecting, generating, processing, using and storing personal data. The processing of sensitive data relating to health, racial/ethnic origin, religious beliefs, political opinions, genetic/biometric observations, sexual orientation, memberships &c. may require EUI Ethics Committee clearance before data collection and processing begins. Ethics Committee approval cannot be sought retroactively. The Ethics Committee review can take up to six weeks. Principal investigators should consult the online Initial Checklist and the supporting documentation on the EUI information page for Ethics and Integrity in Academic Research. When collecting, processing, using and storing personal data; scholars are responsible for obtaining and documenting the informed consent of subjects. EUI scholars can use the sample consent form. Throughout the research data lifecycle, scholars are responsible for preserving the confidentiality of data observations pertaining to human subjects, families and households. Details of anonymisation and pseudonomisation techniques are at Section 2.d above. Software such as the open source QualiAnon anonymisation tool can be used for the anonymisation and redaction of interview transcripts.
Scholars who intend to generate images, videos or photocopies of materials and objects in museums and archives, should seek the rights owners’ permission in advance. Scholars who wish to reproduce pre-existing images or multimedia of such materials (eg. photographs in catalogues raisonne) should seek copyright clearance before reproducing.
8.b Support, software and infrastructure for qualitative data
Support for qualitative data use and elaboration is provided by the EUI Library. Software support is provided by the EUI ICT Service. ArcGIS and ATLAS.ti can be used for analysis, mapping and visualisation of qualitative non-numerical data such as audio, video and graphics. The Coding Analysis Toolkit (CAT) can be used for content and discourse analysis. Tools for data backup (SyncToy), file zipping (7-Zip), data encryption (TrueCrypt) and image adjustment (Resizer) are also available. Many of the tools used for the elaboration and analysis of quantitative data (eg. Excel, Julia, Python, R, Stata) can be used to generate aggregate statistical observations from qualitative data. The analysis of restricted personal data may require scholars to work in a safe-room environment. If this is required by a data provider, contact the Library.
8.c Research data management and data management plans for qualitative data
Although research data management for qualitative data is similar to research data management for quantitative data - there are some additional considerations. Scholars completing data management plans should describe how unique and confidential qualitative data materials (eg. primary sources) will be handled and secured. Due to the heterogeneous, multi-media and complex nature of qualitative data in the humanities and social sciences, it is particularly important for scholars to keep a record of data sources, including notebooks, questionnaires, codebooks and multilingual thesauri. Supporting documentation serves as the basis for accurate metadata, and facilitates future retrieval and reuse. In the case of non-repeatable, time-sensitive, socio-political research, data management plans should describe the qualitative research methods used.
8.d Metadata for qualitative data
The metadata schema used for quantitative data outputs (6.c above) can also be used for qualitative data outputs, with some additional considerations. As well as the name(s) of scholars and technical collaborators who generate the research dataset, it may be necessary to acknowledge the authors/creators of pre-existing subsidiary resources. Multi-lingual documentation should be provided where relevant. The format and version of software used to elaborate the data should also be indicated. The Oral History Association (OHA) maintains an online Manual of Best Practices for Archiving Oral History describing good practices for metadata and description and the Consortium of European Social Science Data Archives (CESSDA) provides guidelines for data management in the social sciences.
8.e Qualitative data archiving, preservation and access
The EUI Library and the Historical Archives of the European Union provide support services for the archiving and preservation of qualitative, multi-media and ancillary research outputs.
- From winter 2022, semi-structured and unstructured research materials (audio-visual and other qualitative data, transcripts, ancillary materials) can be securely archived at the HAEU.
- Structured qualitative datasets not requiring embargo or access control (eg. matrix datasets of text, laws, observations &c.) can be reposited in Cadmus, the EUI research repository, by completing the online submission form.
- Structured quantitative datasets (eg. economic historical datasets) can be reposited in Cadmus, the EUI research repository, by completing the online submission form.
Qualitative data presented for preservation must be accompanied by documentation describing the resource and, where applicable, indices or codebooks describing the material.
9. EUI INFRASTRUCTURE, SOFTWARE, PROTOCOLS AND SUPPORT
The EUI ICT Service provides infrastructure, software, hardware and connectivity support. Research software programmes are listed on the ICT web site. Technical support is provided at the site offices of the ICT Service.
The EUI ICT Service provides and supports Fortran, Gauss, MATLAB, OxMetrics, Python, R, Stata 17, Stat/Transfer, WinEdt, WinRATS - and the High Performance Cluster. A guide to the HPC is available by writing to [email protected] (EUI members only). Advice on the use of statistical software is provided by the ICT Service and software tutors. For assistance with functionalities of statistical databases such as Amadeus, Compustat, Datastream, Global Financial Data, S&P Capital IQ Pro and other Library-licensed resources, write to [email protected].
The Library’s subscription to SAGE Campus provides self-paced, interactive, learning modules for research methods in the social sciences. The companion SAGE Research Methods provides access to over 1,000 eBooks on all aspects of methodology in the social sciences - including searchable, full-text access to the Quantitative Applications in the Social Sciences series and the Qualitative Research Methods series. The Data Management Training Clearinghouse provides online learning resources for Python, R and other software.
A directory of online research data software manuals, with links to full-text, is available on the Library web site. Data software manuals are available in the Badia Library and the Economics Departmental Library (Villa La Fonte) at shelfmarks 001 to 006. The Library holds a substantial collection of works on statistical theory, data science, applied mathematics, probability, calculus, data mining, modelling, optimisation, regression analysis, experimental design, survey methods, time series, artificial intelligence and machine learning (shelfmarks 500-519). Books, ebooks and manuals in any language may be suggested for acquisition by writing to [email protected].
Data support is provided at the Badia Library (BF-085) every morning and on Tuesday and Thursday afternoons. The Economics Departmental Library information desk is open on Monday, Wednesday and Friday afternoons (VLF-026). Every Friday during term, the Library issues the ‘Weekly Bulletin of Statistical Data and Economic Research Resources.’ EUI members can sign up by sending a message with ‘subscribe’ in the title to [email protected] The Bulletin can only be distributed to EUI mail accounts. Data updates are also disseminated via the EUI Library Blog and Twitter. Other useful Library resources include the Publishing and Open Science web page.
10. INTERNATIONAL RESEARCH DATA GUIDELINES
This guide draws on these international sources, which are strongly recommended to scholars working with research data.
This page is based on the 2022 (10th edition) EUI Library Research Data Guide.
Contact: Thomas Bourke at [email protected]
Page last updated on 23 March 2023