Research Data Services
For data support please contact Thomas Bourke
1. DATA DISCOVERY AND ACCESS
The Library maintains a Data Portal providing access to both (i) licensed, restricted access, data resources for EUI members and (ii) open data resources relevant to EUI research themes.
There are three sub-directories:
Macroeconomic databases provide statistics on national, regional and global economic and political-economic developments from major international institutions and data publishers, including Thomson Reuters, Eurostat, the ECB, OECD, IMF, World Bank and WIIW.
Micro-socioeconomic databases provide individual, family and household-level data observations. Major providers include GESIS, Bureau van Dijk, ICPSR, UKDS, Eurostat and DIW. Access to micro data provided to EUI members under Library license requires additional registration. Special terms and conditions relating to micro-socioeconomic data access and use are described in Section 2 below.
European, EU and Euro Area data resources provide macroeconomic, financial, monetary and micro-socioeconomic data for research on pan-European topics, EU states and sub-state regions, and Europe in the global context.
All resources in the EUI Data Portal have a dedicated resource page providing:
- Data description
- Time period and release/wave information
- Support links (online manuals, software transfer routines, user networks)
Data provided to EUI members under Library license is accessible via internet protocol (IP) or - for micro-socioeconomic data - via restricted server. The Datastream (Thomson Reuters) financial database is run directly from the EUI desktop ‘programmes’ menu. Data from the Inter-university Consortium for Political and Social Research (ICPSR) archive can only be downloaded by Library staff on request.
Access to data at other institutions
EUI scholars who require access to data held at another facility should apply for access early in their research projects, as applications can be lengthy. Some micro-socioeconomic geo-coded data can only be accessed at the issuer's secure facility. Contact the Library for assistance with access applications. Scholars who require access to unpublished datasets (eg. underlying data associated with a journal article) should contact the Library before writing to data owners/authors. In certain instances, it may be possible for EUI scholars to obtain access via Library consortia. See Section 2 on data user undertakings below.
Access to open data resources
Open data resources are available on the internet. Major data repositories are indexed by subject, data type and host location in the re3data.org research data registry and the data repositories section of the Open Access Directory. Section 7 below provides information on how research data outputs can be shared as open data.
2. TERMS AND CONDITIONS OF USE
This section gives an overview of terms and conditions of data access and use; special security measures for micro-socioeconomic data; individual data user undertakings; and terms and conditions of use for open data.
The use of data provided by the EUI Library is subject to license agreements, copyright terms and data protection provisions. Full details are on the Library's Terms and Conditions page. Data users are individually responsible for compliance with terms of access and use. Violation of terms and conditions puts at risk other EUI members’ future access to data resources. All users are requested to scrupulously abide by the terms and conditions of access to, and use of, data. Open data, freely accessible via internet, are also subject to terms and conditions of use and restrictions on re-publishing.
Under the EUI Library's license agreements for data resources: Users may not distribute or allow any other party to have access to data which is provided under license; users may not modify or create a derivative work of the licensed materials without the permission of the licensor; users may not remove, obscure or modify any copyright or other proprietary notices included in licensed materials; users may not use licensed materials for commercial purposes; users may not retain or distribute substantial portions of a database, and must comply with any post-project data destruction undertakings in the license. The University of Michigan maintains a database of data user agreements.
Micro-socioeconomic data access and use
Special terms and conditions apply to access and use of micro-socioeconomic data, reflecting the sensitive nature of data observations about individuals, families and households. Such terms and conditions apply to both (i) micro data hosted by the Library for EUI members and (ii) micro data hosted by third-parties provided directly to EUI members under individual license. Terms and conditions for each micro dataset are given in the 'full details' section of the resource pages on the Library web site. The EUI guide to Good Data Protection Practice in Research gives further information on data protection.
Micro data users must preserve the confidentiality of observations pertaining to dataset subjects: individuals, families and households. Users must not attempt to identify any individual or other data subject in the dataset. Micro data users must also comply with any post-project data destruction undertakings in the license.
EUI members must use the Library micro data access form when applying for access to micro-socio economic data hosted by the Library by (i) Selecting one dataset from the dropdown menu (a separate form is required for each dataset) (ii) Signing the EUI paper form 'Terms and Conditions of Use of Micro Data Provided by the EUI Library' (iii) Signing the data provider's terms and conditions paper form. Forms can be signed at the Economics Information Office (Badia Library, 085) or at the Economics Departmental Library (Villa La Fonte). For access to some datasets, providers require parallel application procedures. In some cases (eg. Eurostat) it is necessary to present a project proposal when applying for access to micro data.
There is no off-campus access, VPN access, or laptop access to micro-socioeconomic data hosted by the EUI Library. Non-EUI members and short-term visitors do not have access to EUI-hosted micro data, and should contact their institution's data librarian, statistics manager, or data publishers directly for information about access at their home institutions.
Data User Undertakings: individual agreements with data providers
Some data issuers require that access contracts be established directly with the user(s) of the data. EUI members who require access to data provided by third parties are required to sign a data user agreement with the provider/licensor. Some data issuers also request that an EUI administrator co-sign individual user agreements. In such cases the Library will require the data user(s) to sign an internal undertaking that they will abide by the terms and conditions of use and access of the data.
Open Data: terms and conditions of use
In 2016, the Organisation for Economic Cooperation and Development (OECD) published new Guidelines on Research Ethics and New Forms of Data for Social and Economic Research.
3. SUPPORT, SOFTWARE AND INFRASTRUCTURE
This section outlines EUI Library and EUI ICT Service support for data users. Data support is available at the Library, Badia Fiesolana, office 085, on weekday mornings and Tuesday and Thursday afternoons. Data support in the Economics Department (Villa La Fonte) is provided on Monday, Wednesday and Friday afternoons from 14:30 to 18:30. Training sessions take place throughout the year.
The Library maintains the Data Portal and the Micro Data Restricted Server, and provides assistance for the discovery, access and use of EUI-licensed digital databases and internet-hosted open data. The Library also advises users on research data management (Section 4) and open data options (Section 7).
Data software manuals (print versions) are available in the Badia Library and the Economics Departmental Library, at shelfmarks 001 to 005. The Library maintains a directory of online research data software manuals. The Library also provides a comprehensive collection of monographic works on data methodology, ontology and statistical theory. Books and manuals in any language can be suggested for acquisition by the Library.
Every Friday during term, the EUI Library issues a data e-Bulletin with updates on new statistical data releases and information on how to use Library and internet resources. EUI members can sign up with an @eui.eu account. Send a message with 'subscribe' in the title to email@example.com
Data news is also disseminated via blog and Twitter.
Software, infrastructure and connectivity support is provided by the EUI ICT Service. Research software programmes are listed on the ICT web site.
Technical support is provided by the site offices of the ICT Service. Advice on the use of statistical software is provided by the EUI research software tutors. The EUI provides access to major data software including: Stata, Gauss, MATLAB, OxMetrics, Stat/Transfer, WinEdt, WinRATS, R for Windows and other programmes.
4. RESEARCH DATA MANAGEMENT AND DATA MANAGEMENT PLANS
This section provides an introduction to research data management, the security of data during research projects and data management plans. Research data management during the project cycle encompasses the control of data inputs, the use of data, the protection of data, and the creation of data outputs. RDM covers the description of data and tools; the storage of data during analysis; the provision of clear and accurate metadata; the preservation of data; and - where possible - making research data outputs available to other researchers. Research data management is carried out by individual scholars and research teams throughout the duration of a research project.
Security of data during research projects
During the project cycle, it is important to keep data safe and secure. Researchers are encouraged to use a desktop computer for data elaboration, and to make regular backups on the EUI network server, or on a safely-secured external memory device. In accordance with contractual agreements, micro data should only be elaborated on a desktop computer in a secure location. Data DVDs and associated documentation, should be kept in locked storage when not in use.
Data management plans
Data management plans (DMPs) are short documents outlining how data are handled, stored, documented, preserved, and - where possible - made available for sharing. DMPs provide information on:
- The generation of data and the sources of data
- How data is elaborated, collated and organised
- How data is documented; which metadata scheme is used
- Where data is stored during the project
- How data is preserved
- Which researcher(s) - if a team - decide(s) the terms and conditions of sharing data outputs
- How data authorship and credit are assigned.
It is important to keep an accurate record of dataset changes, variables, characteristics, software versioning and - in the case of survey and experimental data - pre-agreed terms of disclosure, so that decisions about data sharing can be made at the end of a project. 'Metadata' refers to the descriptors or 'tags' that identify a dataset (Section 6).
Data management plans can be used as the basis for deciding whether, when, how, where and under what terms, research data outputs can be openly shared - or shared under defined terms and conditions. Open data refers to the trend among international organisations, scholars and government agencies to share data outputs, usually via the internet.
EUI members who are required to submit a data management plan - either as part of a funding proposal or early in a research project – should contact the Library for assistance. Tools such as DMPonline – the Digital Curation Centre's data management planning tool - can be used to construct a tailored data management plan. To use this tool, enter an email address, name of organisation and create a password. EUI users of DMPonline should select ‘other organisation’ from the drop-down menu. First-time users are taken to the ‘edit profile’ section of the platform. When using the tool to create a data management plan, it is possible to select a funder template (eg. European Commission, Horizon 2020) which generates the relevant matrix. The author(s) complete each free-text section of the plan - responding to the prompts. It is possible to share editing rights with project collaborators by entering their emails and assigning status as ‘co-owner’; ‘editor’ or ‘read only.’ For assistance using DMPonline, contact the EUI Library.
The name of the Principal Investigator (P.I.) should be clearly stated in data management plans. For research teams, this should be the contact person for all decisions regarding whether, when, how, where and under what terms, research data outputs emanating from the research project might be openly shared. If the Principal Investigator is not the same person as the project Data Manager, this should be clearly stated. In international collaborative projects, the name of the person who has authority with regard to decisions on the sharing of data outputs should be clearly indicated. Unless otherwise indicated, the Principal Investigator undertakes this role. Many data repositories require a 'data ownership statement' from the data depositor.
Data management plans should be updated over the course of the research project, incorporating descriptions of any new data, changes in project policy, and changes in the composition of the research team or consortium.
5. DATA MANAGEMENT IN EU HORIZON 2020
Horizon 2020 is the European Union Research and Innovation funding programme for the period 2014-2020. EUI members preparing applications for Horizon 2020 funding are required to submit general information on data management as part of their proposal. This is evaluated by the European Commission under the criterion 'impact.' Social science and economic research projects are included in the 'Societal Challenges' cluster, which includes projects in the 'Europe in a changing world - inclusive, innovative and reflective societies' category. The EUI Library can assist project leaders with the data sections of funding applications as well as in-project data management plans.
Project managers should address the following questions - provided by the European Commission - when completing the general information section (p.2) of H-2020 applications:
- What types of data will the project generate/collect?
- What standards will be used?
- How will this data be exploited and/or shared/made accessible for verification and re-use? If data cannot be made available, explain why.
- How will this data be curated and preserved?
In the context of Horizon 2020, the European Commission has launched a research data pilot. Project managers must provide a data management plan (DMP) within six months of the official start of the research project. The European Commission mandates two further versions of the DMP; one at the mid-point and one at the completion of the funding period. The EC's DMP template is on p.5 of the document Guidelines on Data Management in Horizon 2020. The DMPonline tool can also be used for H-2020 data management plans - by selecting the H-2020 template in the Funder section.
H-2020 grant beneficiaries must "deposit in a research data repository and take measures to make it possible for third parties to access, mine, exploit, reproduce and disseminate - free of charge for any user - the following: (i) the data, including associated metadata, needed to validate the results presented in scientific publications as soon as possible; (ii) other data, including associated metadata, as specified and within the deadlines laid down in the data management plan." (European Commission Guidelines, p.3.) The EC suggests attaching a license, eg. CC-BY. The EC does not specify or recommend specific repositories. Further details are provided in these official documents:
- Guidelines on Data Management in Horizon 2020
- Guidelines on FAIR Data Management in Horizon 2020
- Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020
- Fact Sheet: Open access to publications and data in Horizon 2020
6. METADATA: DESCRIBING DATA OUTPUTS
Metadata are data about data, presented in a systematic scheme. Throughout the research project it is important to keep an updated record describing data capture, use and elaboration. There are many different schemata for describing datasets. An introduction to metadata standards for social science and humanities data (eg. Dublin Core, DDI) is maintained by the Digital Curation Centre.
The headings below can be used as a guide when creating metadata. Different repositories require different combinations of metadata fields. Not all of the fields suggested below will be required by data funders or repository managers. Metadata can be used as a 'checklist' to determine whether, when, how, where and under what terms, research data outputs can be shared as open data. Metadata should be updated as research projects evolve. Some research data outputs may also require multi-lingual metadata and data tags.
Title of dataset: The title should succinctly convey the nature and scope of the dataset. Dossiers and files should be assigned consistent tags.
Name(s) of dataset creator(s): The name, or names, of the Principal Investigator, researchers, and technical collaborators, who created/ elaborated the dataset should be indicated - preferably with institutional contacts. Guidelines on authorship and credit for research outputs are provided by CASRAI. If the project is undertaken in the context of a consortium, the name of the Data Manager should be provided (if different from the P.I.). Where researcher ID numbers are available, these should be provided (eg. ORCID).
Description of data: A meaningful 'data abstract’, indicating the kind of data, the scope of the research context and a note on data methodology should be provided. The abstract should be updated as the research project evolves. This information is useful if a technical paper about the data is submitted to an open data journal.
Source of data: The source(s) of the data should be indicated. If the dataset has been generated during the research project, this should be stated with details of data collection (eg. survey parameters). If the dataset is derived from a pre-existing database, the source(s) should be clearly cited; eg. institution, publisher and/or data author. If there are multiple streams of input - all should be cited.
Date of creation of dataset: Give the date of the completion of the dataset. If part of a data series, this should be indicated.
Date-range coverage of dataset: Indicate the start- and end-dates of data observations contained in the dataset.
Spatial coverage of dataset: Indicate the geographical scope of the dataset (national, regional, global &c.) as applicable.
Format: The software format and version should be given (eg. Stata 14, csv, &c.). Software version changes and transfers should be noted.
Language: The language of data variables, notes and codebooks should be indicated. Multi-lingual documentation, tags, questionnaires and variable descriptions should be provided, with thesaurus if applicable.
Location of data: During the research cycle, researchers should carefully store and protect their data. After the research project, the location of the data should be given (eg. subject or institutional repository) - even if the data is not openly available.
Digital object identifier: Reposited datasets should be assigned a digital object identifier (DOI). Most data repositories automatically assign DOIs, which is useful for linking to, and citing, datasets.
Access status and embargo: After the project, the status of access to the data should be indicated. The status 'Open Data' should be assigned to datasets that are publicly accessible on the internet. If the data is subject to embargo, the end-date of the embargo period should be indicated. If the data can only be accessed at a secured facility this should be indicated. If the data can only be accessed by requesting permission from the Principal Investigator, contact information should be provided.
License: Terms and conditions of access and use of research data outputs by other persons should be stated. It is advisable to provide a license.
Funding statement: The name of funding bodies, and research grant numbers, should be provided where applicable.
Related publication(s): Bibliographical details of publications based on the dataset should be listed, with links to abstracts and, where possible, full texts.
Dataset citation: A short ‘ready-to-use’ citation reference for the dataset should be provided, incorporating core descriptive elements.
7. DATA PRESERVATION, REPOSITING AND OPEN DATA
During the course of research projects, scholars should safely preserve their data and associated documentation, code, software and tools. Researchers are encouraged to use a desktop computer for data elaboration, and to make regular backups on the EUI network server or a safely-secured external memory device.
Preservation and repositing
The Library can assist EUI members to preserve research datasets in a data repository. Major data repositories are indexed in the re3data.org registry and the data repositories section of the Open Access Directory.
The Library can assist researchers to deposit datasets in Zenodo (CERN) and other data repositories. EUI researchers and principal investigators who would like to submit data for inclusion in the EUI's Zenodo community should write to firstname.lastname@example.org EUI-generated research data is indexed in the EUI Research Data Registry.
Reposited data should contain at least one human-readable manual file. The UKDS provides guidelines on preparing data for deposit. The Digital Curation Centre maintains a Where to Keep Research Data checklist of repositories aiming “to assist research support staff whose task is to help researchers make informed choices about where to deposit data.”
There is a growing trend among scholars, government agencies and international organisations to share data and associated documentation, code, software and tools. By carefully noting the metadata elements suggested in Section 6 above, scholars will have a ready checklist for determining whether, when, how, where and under what terms, research data outputs can be shared as open data. Datasets that are made available as open data should be the product of original research. Outputs should be either (i) original datasets generated during the research project or (ii) datasets which are the product of significant, value-added elaboration of pre-existing data.
Not all research data outputs can be openly shared. The two most significant considerations when determining whether a research dataset can be made available on an open data basis are:
- Data protection: It is not permitted to openly publish datasets in which persons, households or families are identifiable. In such instances, it may be possible (pending terms) to generate an anonymised version of the dataset for sharing.
- Database copyright: It is normally not possible to publish a dataset containing significant portions of data sourced from pre-existing databases provided under contractual license. The EUI Library can help scholars determine whether, and how, derivative datasets can be
made available as open data.
Levels of access
Research data outputs can be shared under a variety of access models. Datasets can be made openly available for all users via the internet; can be subject to registration prior to access and use; can be subject to user contracts (sometimes requiring a project plan); can be embargoed for a defined period (or indefinitely); can be restricted to on-site access and use; or can be subject to an access request directly to the data owner(s) for case-by-case access authorisation. Data can also be reposited solely for preservation purposes (dark archive). Access status can change over time.
Where data are provided on an open basis, it is normally not possible to re-publish freely. Open data are normally subject to licensed terms and conditions of access. Even where there is no explicit license for the re-use of open data resources, there are frequently restrictions on re-distributing data. Contact the EUI Library for assistance.
8. QUALITATIVE DATA IN HUMANITIES AND SOCIAL SCIENCES
Special considerations are necessary regarding access to, and use of, qualitative data in the social sciences and humanities.
Examples of qualitative data include; audio and video recordings; survey diaries and archival material; field notes; free-text answers to surveys; images; minable text and transcripts of interviews. The definition of ‘data’ varies across academic disciplines - especially when there is a mix of qualitative and quantitative methods. It is important that project data management planning be located in the culture of the discipline in which the research is undertaken. See, for example, this EUI SPS directory and the 2016 Humanities at Scale project report State of the Art Report on Open Access Publishing of Research Data in the Humanities (Stefan Buddenbohm et al). Non-numerical data is subject to most of the same terms and conditions of access, and use, that apply to quantitative data. Many research projects incorporate a mix of both qualitative and quantitative approaches, and in many research projects qualitative data can be processed and expressed numerically.
Access and terms and conditions of use: qualitative data
The handling, use and sharing of qualitative data in the social sciences and humanities is subject to strong ethical considerations and standards. When accessing qualitative data, it is important for scholars to familiarise themselves with the terms and conditions of access, and use, as indicated by the holding institution and/or rights holder. If a digital database is being generated from non-digital materials, it is important to obtain the consent of subjects or rights holders in advance of inclusion.
Qualitative data can be generated during the course of a research project - eg. from surveys, free-text responses to interview questions, focus group recordings or experimental simulations. In all cases, subjects should be informed of their rights as established by jurisdictional data protection legislation, and best-practice guidelines from scholarly societies in the discipline(s). Appropriate disclosure forms should be signed by participants who become subjects in databases of photographic, voice-recording or video content. Some multi-media data may only be accessed at the data holders' secure facility.
Anonymised qualitative data is frequently available on an Open Data basis. Major data repositories are indexed by subject, data type and host location in the global, multi-disciplinary re3data research data registry.
Due to the frequently personal nature of qualitative data observations, scholars should pay particular attention to ethical standards when handling such data. In many cases it is necessary to anonymise data in public versions and sub-sets of qualitative datasets.
The linking of variables on gender, religion etc, to individuals, families or households, is governed by data protection legislation and academic discipline best-practice.
Support, software and infrastructure: qualitative data
Support for qualitative data content is provided by the EUI Library. Software support is provided by the EUI ICT Service. Many of the tools used for the analysis of quantitative data (eg. Gauss, MATLAB, Python, R, Stata) can also be used for qualitative data analysis – especially if the data is given a numerical expression – or if aggregate statistical observations are drawn from underlying qualitative data. ArcGIS and ATLAS.ti can be used for analysis, mapping and visualisation of qualitative non-numerical data such as audio, graphics, text and video. Coding Analysis Toolkit (CAT) can be used for content and discourse analysis. Tools for data backup (SyncToy), file zipping (7-Zip), data encryption (TrueCrypt) and image adjustment (Resizer) are also available.
The use of some of these tools for the analysis and presentation of restricted personal data may require scholars to work in a 'safe-room' environment. If this is stipulated by a data provider or project funder, contact the EUI Library for support.
Research data management and data management plans
Research data management includes the control of data inputs, the elaboration, collation and protection of data, and the creation of data outputs. Research data management for qualitative data is similar to provisions for qualitative data - but there are some specific additional considerations. Special considerations for the humanities are included in the National Endowment for the Humanities (USA) 2016 Data Management Plans for NEH Office of Digital Humanities Proposals and Awards. During data analysis work, qualitative data materials should be carefully handled and secured. Materials being analysed in scholars' working areas should be carefully secured, either in a locked storage unit or in a locked room. This is particularly important for confidential and unique material being collated and/or digitised for the creation of a dataset.
Data Management Plans are required by funding authorities for each individual project. Due to the heterogeneous, multi-media and complex nature of qualitative data in the humanities and social sciences, it is particularly important that scholars keep a record of data sources, code and tools used throughout the research project cycle. Such documentation - created by scholars and research teams during the project - helps with the creation of accurate metadata for the dataset output (whether shared or restricted) and with the creation of project-by-project Data Management Plans.
In the case of non-repeatable, time-sensitive, socio-political research Data Management Plans may require detailed explanation of the qualitative methods used, and the kinds of data observed. In some cases it may be necessary to modify, or append to, a funder's DMP to explain qualitative methods and data outputs. Scholars should refer to the EUI Library for further information.
Metadata are descriptors for data, which help present outputs in a structured way for ease of retrieval, use and sharing. Metadata fields for quantitative data outputs can also be used for qualitative data outputs. However there are some additional considerations.
In addition to the name(s) of scholars and technical collaborators who generate a dataset, it may be appropriate to include the authors/creators of subsidiary qualitative data. The sources of all data in a qualitative dataset should be cited with particular care, and should comply with the informed consent undertakings identified at the outset of the project. The dates of creation of subsidiary works included in the qualitative dataset should be indicated, as well as the date of creation of the dataset. Linguistic, national and regional designations should be indicated where relevant (eg. multi-lingual surveys). The format and version of software used to elaborate the data should also be provided.
Data preservation, repositing and open data
Qualitative data outputs in the humanities and social sciences can be reposited in subject repositories or multi-disciplinary repositories. Access control is particularly important for qualitative data relating to human subjects, families and households. Dataset authors should adhere to the standards of the repository, project funder and discipline best-practice when setting conditions for the sharing of qualitative data outputs. The sharing of data - either via Open Data terms and conditions (eg. CC-BY) or more restrictive access terms and conditions - must be explicitly described. For example, if the data is subject to embargo, or if the data can only be accessed at a secured facility; such terms should be indicated.
Scholars who use and produce qualitative datasets should also be aware that most funding agencies provide guidelines regarding data gathering, handling, outputs and sharing. Scholars should refer to funding contracts and scholarly societies in their discipline when deciding whether, and under what terms, qualitative research data outputs can be shared.
9. EUI LIBRARY RESEARCH DATA SERVICES
Research data management assistance is provided by the EUI Library in the following areas:
- Data discovery: EUI-licensed resources and open data
- Terms and conditions of data access and use
- Data user undertakings for access to third-party resources
- Metadata and ontology for research datasets
- Data documentation
- Data security during research projects
- Data repositing and preservation
- Data sharing and guidelines for open data.
Write to Thomas Bourke for data support: email@example.com
10. INTERNATIONAL RESOURCES AND GUIDELINES
Further information on research data management and open data is available from these sources:
Contact: Thomas Bourke at firstname.lastname@example.org