Research Data Services
- The EUI Library supports (i) data discovery (ii) data use and (iii) data preservation and sharing
- For research data support contact Thomas Bourke
The EUI Library Data Portal provides access to Library-licensed macroeconomic, micro-socioeconomic and Europe-related databases. 72 data resource guides provide information on coverage, variables, user networks, online manuals and terms and conditions of use. The Data Portal also provides information about the Library's restricted micro data server.
Click below for EUI Library support with data management plans (DMPs); data protection and database copyright; metadata; data preservation and sharing; and data support for Horizon 2020 projects. For assistance, please write to [email protected] or visit one of the Library information desks: Badia Fiesolana 085 or Villa La Fonte 035.
EUI members can submit their research datasets for inclusion in the EUI ResData repository - a solution for preserving and sharing data outputs. To submit a dataset for inclusion in ResData, complete the Library's online form. For support with the preparation of datasets for submission, write to [email protected]
1. LIBRARY DATA PORTAL AND DATA DISCOVERY
The Library maintains an indexed Data Portal which (i) provides access to licensed resources for EUI members and (ii) introduces open data resources of relevance to EUI research themes. All resources in the Data Portal have an online guide with:
- Data description and scope
- Time-period coverage and release/wave information
- Support links (online manuals, software transfer routines, user networks)
- Terms and conditions of access and use.
Data provided to EUI members under Library license are accessible by internet protocol (IP) proxy, or via the Library’s restricted data server for micro-socioeconomic data. Modes of access for every resource are described in the Data Portal. There are four sub-directories:
- Macro data directory providing access to national, regional and global economic, political and historical resources
- Micro data directory providing access to household, family, individual and company-level resources
- European, EU and Euro Area directory providing data for research on pan-European topics, EU states, European sub-state region
- Data Topics directory helping match research topics with data resources.
Access to data at other institutions
EUI members who require access to restricted data at another facility should apply for access early in their research projects because application periods can be lengthy. Some sociological, qualitative and micro geo-coded data can only be accessed at issuers’ secure, on-site facilities. Contact the Library for assistance with access applications. EUI members who require access to unpublished datasets (eg. underlying data associated with a publication) should contact the Library before writing to data creators/owners. In some instances, it may be possible for EUI members to obtain access via library consortia.
Access to open data resources
Open data refers to the trend among scholars, government agencies and international organisations to share datasets, codebooks and software via the internet. (In this Guide, the term ‘scholar’ includes researchers, fellows, professors and all other scientific collaborators on research projects.) Open research datasets can be located via the re3data registry of research data repositories, which is indexed by discipline, sub-discipline, data type and host location; and via Google Dataset Search. Open data is further discussed in Section 7.
Download the PDF version of this Guide.
2. DATA PROTECTION, DATABASE COPYRIGHT AND ETHICAL USE
This section provides an overview of data protection and ethical data use; database copyright; and individual data user undertakings. Further information is provided in the EUI Guide to Good Data Protection Practice in Research.
Special terms and conditions apply to access and use of qualitative and micro-socioeconomic data. This reflects the sensitive nature of observations about human subjects, families and households. Such terms and conditions apply to (i) data gathered and used during research projects, (ii) data hosted by the Library for EUI members and (iii) data provided directly to EUI members under third-party license. Terms and conditions of access and use for every dataset hosted by the Library, are provided in the ‘full details’ section of the individual resource guides in the Library’s Data Portal.
Persons, families and households cannot be identifiable in any dataset. When using micro datasets, users must not attempt to identify any individual, family or household. The storage, transfer and use of personal data is subject to data protection rules. EUI members should be familiar with the President's Decision on Privacy and, with a particular view on research, the Guide to Good Data Protection Practice in Research. The processing of particularly sensitive data (eg. pertaining to ethnicity, health, religion, political or sexual orientation, and biometrics) requires even stricter safeguards (p.7 of the the Guide to Good Data Protection Practice in Research). Those rules are currently under revision in light of the new European Union General Data Protection Regulation which came into effect on 25 May 2018 and the new rules on data protection concerning EU institutions and agencies. As stated in the EUI Guide on Good Data Protection Practice: “As a general rule, the processing of such personal data is prohibited. However, Article 7 of the EUI’s Data Protection Policy does allow it to be lawfully processed in specific circumstances. The most common in research is upon the data subject’s explicit consent.”
When collecting (eg. via interviews or forms), storing, using or transferring personal data; scholars are responsible for obtaining the specific informed consent of subjects (Section 4 below). Scholars are also responsible for preserving the confidentiality of data observations pertaining to human subjects, families and households. Consent must be intelligible: it should refer clearly and precisely to the scope and the consequences of the data processing, and the retention period. Blanket consent without specifying the exact purpose of the processing is not acceptable; it must also be documented by the researcher.
EUI members must use the online micro data request form when applying for access to micro-socioeconomic data hosted on the Library's restricted server. A separate form is required for every dataset required. Applicants must also sign (i) the EUI ‘Terms and Conditions of Use of Micro Data’ paper form, and (ii) the data provider’s terms and conditions’ paper form. Both forms can be signed at the Badia Library (085) or the Economics Departmental Library (Villa La Fonte, 035). Some providers (eg. Eurostat) require the submission of a detailed project proposal, and parallel registration, before access is granted.
There is no off-campus access, VPN access, or laptop access to micro-socioeconomic data hosted by the EUI Library. Non-EUI members and short-term visitors do not have access to EUI-hosted micro data. Further information is provided in the EUI Guide to Guide to Good Data Protection Practice in Research and the OECD’s guidelines on Research Ethics and New Forms of Data for Social and Economic Research.
License agreements and database copyright
Access to, and use of, databases provided by the EUI Library are subject to contractual license agreements and copyright terms. Full details are on the Library’s Terms and Conditions’ web page. Data users are individually responsible for compliance with terms and conditions of access and use. Violation of license terms puts at risk other EUI members’ future access to data resources. All EUI users must scrupulously abide by the terms and conditions of access to, and use of, data provided under license to the EUI community.
Users may not distribute or allow any other party to have access to data which is provided under license; users may not modify or create a derivative work of the licensed materials without the permission of the licensor; users may not remove, obscure or modify any copyright or other proprietary notices included in licensed materials; users may not use licensed materials for commercial purposes; users may not retain or distribute substantial portions of a database, and must comply with any post-project data destruction undertakings in the license.
Agreements between individual users and data providers
Some data issuers require that access contracts be established directly with end-users. If a data issuer also requests the counter-signature of an EUI administrator (‘guarantor’), contact the Library: [email protected] Upon completion of the access application, the Library will request that the EUI data user sign an internal undertaking to abide by the terms and conditions of access and use.
3. DATA MANAGEMENT PLANS (DMPs)
Data management plans are short documents, normally required by science funding agencies. Research data management is carried out by scholars throughout the duration of a research project. Data management plans should address:
- How data is generated and/or sourced
- How data is used, elaborated and organised
- How data, and data subjects, are protected
- How data, code and ancillary elements are described and documented
- How data is stored and secured, and how long it will be retained
- How data authorship and credit are assigned
- How data is preserved
- How, whether, and under what terms, research data outputs can be shared.
During research projects it is important to keep an accurate record of changes to dataset design, folders, files, variables and versioning. Scholars undertaking experiments should ensure that pre-agreed terms of disclosure are documented and preserved. The data and methodology should be clearly documented. Where applicable, code should be preserved for reposit and sharing. Metadata – ‘data about data’ – should be carefully assigned (Section 5 below).
Data management plans (DMPs) can be used as the basis for determining whether, when, how, where and under what terms, research data outputs can be openly shared – or shared under more restrictive terms and conditions. Open data refers to the trend among scholars, government agencies and international organisations to share data outputs and ancillary materials via the internet. FAIR data principles (findable, accessible, interoperable and re-usable) must be considered during the preparation, and revision, of data management plans.
EUI members who are required to submit a data management plan – either as part of a funding proposal or during a research project – should contact the EUI Library for assistance. Tools such as DMPonline – the Digital Curation Centre’s data management planning tool – can be used to prepare a structured data management plan, complying with (eg.) EU Horizon 2020 requirements. To use DMPonline, enter an email address, name of organisation and create a password. EUI users should select ‘other organisation’ from the drop-down menu. First-time users are taken to the ‘edit profile’ section of the DMPonline platform. The Principal Investigator (P.I.) should be identified in the data management plan. For research teams, the P.I. can assign co-author rights to the data management plan, by entering email addresses and assigning status as ‘co-owner’, ‘editor’ or ‘read only.’
The Principal Investigator should be the contact person for decisions regarding whether, when, how, where and under what terms, research data outputs might be openly shared. If the Principal Investigator is not the same person as the project Data Manager, this should be stated. In international collaborative projects, the name of the person who has final authority regarding decisions on the sharing of data outputs should be indicated. Unless otherwise stated, the Principal Investigator undertakes this role.
Science funders normally require a revised DMP at the mid-point of the research project and a definitive DMP as a project deliverable. It is important to update plans throughout the research project by incorporating changes to the scope of the project; new data generation and use; and changes in the composition of the research team or consortium.
4. MANAGING DATA DURING THE RESEARCH PROJECT CYCLE
Data should be carefully managed throughout the duration of the research project. Particular attention should be given to data input; quality control; dataset design; folder structure; file structure; file naming; variable naming; software format; documentation and codebooks; security and backup during the project; data protection; informed consent of data subjects; anonymisation; copyright; preservation; archiving; data citation and – where possible – sharing data outputs as open data. The following guides are particularly useful for scholars working in the social sciences and humanities: Research Data Management Toolkit by JISC (2018); Expert Tour Guide on Data Management by the Consortium of European Social Science Data Archives (2017); Research Data Lifecycle by the UK Data Service (2017); and Guide to Social Science Data Preparation and Archiving by the Inter-university Consortium for Political and Social Research (2012).
Data input and quality control
The generation and/or sourcing of data should be informed by high-quality control standards. Datasets which are generated from empirical work during a research project should adhere to the ethical best-practices of the discipline or sub-discipline. In cases where research datasets are partially based on pre-existing resources, scholars should ensure that the underlying sources are of high quality, and that there are no infringements of copyright. Contact the Library for assistance. The ingestion of observations into a research dataset should be undertaken with close attention to accuracy, completeness and consistency. Data codes and variables should be assigned in a logical, accurate and sustainable manner.
Folders, files, variables, format and versioning
The design of research datasets should be carefully considered at the outset of the project. Dataset design varies by conventions of disciplines and sub-disciplines, medium of data, types of variables, units of analysis, methodology, relationship between data elements, and whether or not the dataset is part of a series. Clear and consistent metadata for folders, files, variables and versioning helps facilitate future data retrieval, reuse and replicability.
The folder structure of the research dataset should be considered at the beginning of the project (eg. hierarchical / horizontal). Qualitative datasets containing text, interviews, images &c. may require individual files for every element. File names should be standardised and consistent, eg: date, descriptor, version. Variables, such as age, country and sex should be clearly tagged, avoiding special characters and spaces. Temporary identifiers should be removed from the schema. Files should be systematically named, using a standardised date system (YYYY-MM-DD) or other consistent versioning. Further details are in this UKDS guide.
The software format of the dataset should facilitate flexible use of the data. Scholars using one format during a research project, may consider a different format for preservation – taking into consideration accessibility via open source software (if the dataset can be shared as open data). Details of how to submit datasets to the EUI ResData repository are in Section 5 below.
Documentation and codebooks
Clear and accurate documentation should be provided about the purpose, context and methodology of the research project; and about the research data output. Good documentation makes datasets findable, accessible, interoperable and re-usable – FAIR data principles. Documentation should include a detailed description of the dataset, providing information about folders, files, variables, versioning, and – where applicable – information about problematic values, missing observations and weightings. Codebooks, questionnaires and data dictionaries should be included. Recommendations are in this ICPSR guide. A concise note on methodology/methodologies, should be provided, along with information on how and when the data was generated/collected; how the data was collated and elaborated during the research project; and how the dataset might be used for future research.
Security and backup during the research project
During research projects it is important to keep data secure at all times. Scholars should use a desktop computer for data elaboration, and make regular backups on the EUI network server or on a secured external memory device. In accordance with contractual agreements, micro-socioeconomic data hosted by the EUI Library can only be accessed and elaborated on desktop computers. Preliminary findings and associated documentation should be kept in locked storage when not in use. The EUI ICT Service maintains a strong password policy to prevent the loss, exposure, or corruption of sensitive information.
Data protection, informed consent of data subjects and anonymisation
The data protection requirements for data access, described in Section 2 above, also apply to data gathering and use throughout the research cycle. The mode of consent obtained from subjects depends on the nature of the research project, the kind of data collected, and how and when the data will be used. Where possible, it is recommended to obtain written consent, using a template appropriate to the discipline or sub-discipline in which the research is being conducted.
Dataset creators are responsible for the anonymisation of sensitive data observations. Anonymisation techniques include: data masking (partial data removal and data quarantining); pseudonymisation; aggregation (cell suppression, inference control, rounding, sampling &c.); and banding. Anonymisation techniques are described in the Anonymisation Code of Practice, UK Information Commissioner's Office (2012, Appendix 2). The UKDS provides guidelines to anonymisation for both quantitative and qualitative data.
It is not possible to publish a dataset containing significant portions of data sourced from pre-existing copyrighted databases governed by contractual license. The Library provides advice on data protection and copyright.
5. EUI RESDATA REPOSITORY: PRESERVING AND SHARING DATA
EUI ResData is a solution for the preservation and sharing of research data outputs, and a complement to the EUI publications repository Cadmus. This section provides information on preparing data for long-term preservation and, where possible, sharing as open data. The metadata section below explains how to create metadata – which are essential for reposit, retrieval and reuse.
Preparing data for reposit in EUI ResData
EUI members who wish to submit a dataset for inclusion in the EUI ResData repository should first complete the ResData submission form. Library staff will use the observations from the form to generate metadata in the Dublin Core schema. An appointment for data transfer will be made by Library staff.
When preparing submissions, it is advisable to create two dossiers: one for data and one for documentation. Large datasets should be submitted in archive format (eg. .zip). Data should be submitted in original file format version. Subsets must be accommodated within the folder structure – not as multiple ResData entries. New entries can be created for subsequent iterations of the dataset. Documentation should include a concise overview of the research project and methodology. Codebooks and other relevant documentation should be provided in PDF/A format, for preservation in an adjacent ‘Docs’ folder.
Scholars presenting datasets for inclusion in the ResData repository should pay particular attention to data quality control, dataset structure and data protection. It is not possible to reposit unstructured data elements and observations in ResData.
Completing the online EUI ResData submission form
EUI members wishing to reposit data, should complete the ResData submission form. By submitting the form, EUI members acknowledge that the dataset for reposit is the result of original data generation; or is the output of significant, value-added, elaboration of pre-existing sources. Datasets presented for inclusion in the EUI ResData repository must be the output of research by a current EUI member – or at least one EUI member of a research team. The name of the Principal Investigator, researcher(s), and – where applicable – technical collaborator(s), must be provided. EUI email contacts must be given. If the project is undertaken in the context of a consortium, the name of the Data Manager should be provided (if different from the Principal Investigator).
By completing the ResData submission form, EUI dataset creators certify that their work complies with the Code of Ethics in Academic Research of the European University Institute. The source(s) of the data must be indicated. If the dataset is the output of original data generation, details must be provided. If the dataset is derived from pre-existing sources, those sources must be clearly indicated (eg. data creator, institutional source and/or publisher).
EUI members submitting data to ResData should state whether or not the dataset can be shared as open data or be subject to embargo. The Library can offer guidance on data protection and database copyright. In some cases it may be possible to create a limited public version of a larger restricted dataset. Embargo status can change over time.
Persons, families and households cannot be identifiable in any dataset (Section 4 above). The collection and use of observations relating to ethnicity, health, religion, orientation, biometrics &c. are subject to data protection laws. Creators of research data outputs which have been elaborated from pre-existing copyrighted sources may need to obtain the permission of rights’ owners before open data sharing. It is not possible to publish a dataset containing substantial portions of data sourced from pre-existing databases governed by contractual license. The EUI Library can provide advice, and can also assist scholars to reposit datasets in discipline-specific data repositories and the multi-disciplinary Zenodo. Major data repositories are indexed in the international re3data registry and Google Dataset Search.
Metadata are 'data about data' presented in a systematic schema. Accurate metadata are necessary for the organisation, use, repositing, sharing and machine-discovery of datasets. Throughout the research cycle, it is important to keep a detailed and updated record of data capture, input, use and elaboration. An introduction to metadata standards for social science and humanities’ data is available from the Digital Curation Centre.
Metadata elements can be used as a ‘checklist’ to determine whether, when, how, where and under what terms, research data outputs can be shared as open data. Metadata should be consistent throughout the research project. Some research data outputs may require multi-lingual metadata.
EUI members submitting a dataset for inclusion in the EUI ResData repository should first complete the Library’s online submission form. These are the principal metadata fields:
NAME(S) OF DATASET CREATOR(S)
The name, or names, of the scholars and technical collaborators who created the dataset must be provided. The name of the Principal Investigator must be given if the dataset has been created by a research team. If the project is undertaken in the context of a consortium, the name of the Data Manager must be provided (if different from the P.I.). Where researcher IDs are available, eg. ORCID, these should be provided. The CRediT framework provides guidelines on authorship and credit for research outputs.
The EUI email address of the dataset creator(s) must be provided.
TITLE OF DATASET
The title should succinctly convey the nature and scope of the dataset.
DESCRIPTION OF DATA
A meaningful abstract, describing the data, the purpose of the research project and the methodology, must be provided. The Library can help edit abstracts.
SOURCE(S) OF DATA
The source(s) of the data must be clearly indicated. If the dataset has been generated during a research project, this should be indicated with details of data collection methods (eg. survey parameters). If the dataset is derived from a pre-existing database, all source(s) must be clearly cited; eg. publisher, issuer. If there are multiple sources, all must be cited.
TYPE OF DATA
The type of data must be indicated: eg: statistical; textual; observational; computational; experimental; simulational &c.
YEAR OF COMPLETION OF DATASET
The date of completion of the dataset must be provided. If part of a data series/wave, this should be indicated.
DATE-RANGE COVERAGE OF DATASET
The start- and end-dates of dataset coverage must be provided.
GEOGRAPHICAL COVERAGE OF DATASET
Where applicable, the geographical scope of the dataset (national, regional, global &c.) should be indicated.
FORMAT OF DATA
The software format and version must be given (eg. Stata 15, .csv, Excel, .txt, &c.).
CODEBOOK / SUPPORTING DOCUMENTATION
Codebooks and supporting documentation should be provided in PDF/A format.
The status of access to the data must be indicated. The status 'open data' should be assigned to datasets that are to be made publicly available via the internet. If the data is subject to embargo, the expiry date of the embargo should be indicated, where applicable.
Submitters should chose a license option from the drop-down menu: CC-BY (international) or CC0.
The Library will assign a unique object identifier to the dataset, for the purpose of locating, linking and citation.
Where applicable, multi-lingual documentation, tags, questionnaires and variable descriptions should be provided.
The name of funding bodies, and research grant numbers, should be provided where applicable.
Bibliographical details of publications based on the dataset, if any, should be listed with links to abstracts and, where possible, full-texts.
PROJECTED FUTURE WAVES OF DATASET
In cases where it is intended to generate future iterations of the dataset, details should be provided.
A short ‘ready-to-use’ data citation should be provided, incorporating core descriptive elements: family name; first name; year; dataset title; ‘EUI ResData’. For example: ‘Steinmo, Sven; D’Attoma, John; Noyon, Sanne Maria; Zhang, Nan (2018): Willing to Pay? Dataset. EUI ResData.’
Data sources should be cited in EUI theses, working papers, books and other publications (eg. by using the citation format above). When submitting details of research publications to the EUI Cadmus repository, scholars should cite datasets which support the research findings. Further information about data citation is provided by DataCite and the Data Citation Synthesis Group.
6. RESEARCH DATA IN EU HORIZON 2020
EUI project managers who are preparing EU Horizon 2020 applications are required to submit preliminary information about data management in the proposed project. This information is evaluated by the European Commission under the criterion 'impact.' When completing the general information section of the Horizon 2020 application, EUI project applicants should address the following questions: (i) What types of data will the project generate/collect? (ii) What standards will be used? (iii) How will the data be exploited and/or shared for verification and re-use? (iv) If data cannot be made available, explain why (v) How will the data be curated and preserved? Further information is in the European Commission Guidelines on Data Management in Horizon 2020. The EUI Library assists project managers with the data management sections of funding applications and data management plans.
When a project proposal is approved, a data management plan (DMP) must be provided within six months of the start of contract. The European Commission mandates two further versions of the DMP; one at mid-point and one as a final project deliverable. The EC provides further details in the Guidelines on Data Management in Horizon 2020. The DMPonline tool (Section 3 above) can be used for Horizon 2020 data management plans, by selecting the Horizon 2020 template in the Funder section.
The European Commission’s policy (p.8) is for research data to be “as open as possible, as closed as necessary.” A list of opt-outs is provided. The Horizon 2020 research data pilot requires grant beneficiaries to deposit research data outputs in a research data repository. The EC does not recommend specific repositories. The guidelines (p.10) state that “as far as possible, projects must then take measures to enable third parties to access, mine, exploit, reproduce and disseminate (free of charge for any user) this research data. One straightforward and effective way of doing this is to attach Creative Commons Licences (CC BY or CC0) to the data deposited.”
European Open Science Cloud (EOSC)
The European Open Science Cloud portal was launched in November 2018. The EOSC portal enables the discovery of data; provides tools for data analysis; and gives information about storage, computation, training and security. Content and services are available via the EOSC catalogue. To mark the launch of the EOSC Portal, the European Commission published two supporting documents: Prompting EOSC in Practice which explains the governance of the new service, and Turning FAIR into Reality which explains how to make research data ‘Findable, Accessible, Interoperable and Reusable.’ The European University Institute endorsed the European Open Science Cloud Declaration in 2017.
Social Sciences and Humanities Open Cloud (SSHOC)
The Social Sciences and Humanities Open Cloud project was launched in January 2019. SSHOC will constitute the social sciences and humanities cluster of the European Open Science Cloud. SSHOC has been established “as part of the European Union’s Horizon 2020 research and innovation programme… It aims to bring together existing and new infrastructures from social science and humanities ERICs (European Research Infrastructure Consortiums) and foster interdisciplinary research and collaboration.” 47 partners are involved, under the coordination of the Consortium of European Social Science Data Archives (CESSDA).
7. OPEN DATA
There is a growing trend among scholars, government agencies and international organisations to share data and associated documentation, code, software and tools. Open data resources are available via the internet. Major data repositories are indexed in the international re3data registry and Google Dataset Search.
By carefully generating the metadata elements listed in Section 5 above, scholars will have a ready checklist for determining whether, when, how, where and under what terms, research data outputs can be shared as open data. Not all research data outputs can be openly shared. The two most significant considerations when determining whether a research dataset can be made available on an open data basis relate to data protection and database copyright (Section 2 above).
Data access status may change over time. Data can be made openly available for all users via the internet; data can be subject to pre-access registration terms; data can be subject to user contract (sometimes requiring a project proposal); data can be embargoed for a defined period (or indefinitely) and data can be restricted to on-site access and use. In some cases it may be possible to create a limited public version of a larger, restricted dataset. Data can also be reposited solely for preservation purposes (dark archive).
8. QUALITATIVE DATA IN THE HUMANITIES AND SOCIAL SCIENCES
This section treats access to, and use of, qualitative data in the humanities and social sciences. Examples of qualitative data include; minable text, interview transcripts; images; audio and video recordings; survey diaries; archival material; field notes, and free-text surveys. The definition of ‘data’ varies across academic disciplines and sub-disciplines – especially where there is a mix of qualitative and quantitative methods. It is important that dataset design be located in the culture of the discipline or sub-discipline in which the research is undertaken.
Access and terms and conditions of use: qualitative data
The generation, handling, use and sharing of qualitative data in the social sciences and humanities is subject to strong ethical considerations and standards. Qualitative data can be generated (eg.) from surveys, free-text responses to interview questions, focus group recordings or experimental simulations.
In all cases, subjects should be informed of their rights as established by jurisdictional data protection legislation, and best-practice guidelines from scholarly societies in the relevant discipline or sub-discipline. When collecting (eg. via interviews or forms), storing, using or transferring personal data, scholars are responsible for obtaining the specific informed consent of subjects. Scholars are also responsible for preserving the confidentiality of data observations pertaining to human subjects, families and households. Consent must be intelligible: it should refer clearly and precisely to the scope and the consequences of the data processing, and the retention period. Blanket consent without specifying the exact purpose of the processing is not acceptable; it must also be documented by the researcher. Due to the sensitive personal nature of many qualitative data observations, scholars should pay particular attention to ethical standards when handling data relating to ethnicity, health, religion, orientation, biometrics &c. Persons, families and households cannot be identifiable in any dataset. The linking of variables on ethnicity &c. to individuals, families or households, is governed by data protection legislation and academic best-practice.
When accessing pre-existing qualitative data, it is important for scholars to familiarise themselves with the terms and conditions of access, and use, as indicated by the holding institution and/or rights owner. If a digital database is being created from non-digital materials, it is important to obtain the consent of owners and/or rights’ holders (eg. archives, libraries, publishers) before electronic versions are created.
Support, software and infrastructure for qualitative data
Support for qualitative data use and elaboration is provided by the EUI Library. Software support is provided by the EUI ICT Service. ArcGIS and ATLAS.ti can be used for analysis, mapping and visualisation of qualitative non-numerical data such as audio, graphics, text and video. Coding Analysis Toolkit (CAT) can be used for content and discourse analysis. Tools for data backup (SyncToy), file zipping (7-Zip), data encryption (TrueCrypt) and image adjustment (Resizer) are also available. Many of the tools used for the analysis of quantitative data (eg. Gauss, Julia, MATLAB, Python, R, Stata listed in Section 9 below) can also be used to generate aggregate statistical observations from qualitative data. The analysis of restricted personal data may require scholars to work in a ‘safe-room’ environment. If this is required by a data provider, contact the EUI Library.
Research data management and data management plans for qualitative data
Although research data management for qualitative data is similar to research data management for quantitative data – there are some additional considerations. Research data management encompasses the control of data inputs, the handling and use of data, the protection of data, the creation of data outputs and the preservation of data. Research data management is carried out by scholars throughout the duration of a research project (Section 3 above). During data analysis, qualitative data materials should be carefully handled and secured, either in a locked storage unit or in a locked room. This is particularly important for confidential, unique and archival material.
Data management plans (DMPs) are normally required by science funding agencies. Due to the heterogeneous, multi-media and complex nature of qualitative data in the humanities and social sciences, it is particularly important for scholars to keep a record of data sources, including notebooks, questionnaires, codebooks and multilingual thesauri. Supporting documentation serves as the basis for accurate metadata, and facilitates future retrieval and reuse. In the case of non-repeatable, time-sensitive, socio-political research, data management plans may require a detailed explanation of the qualitative research methods used.
Metadata for qualitative data
Metadata are data about data, presented in a systematic schema. Accurate metadata are necessary for the organisation, use, repositing, sharing and machine-discovery of datasets (Section 5 above). The metadata fields used for quantitative data outputs can also be used for qualitative data outputs. However there are additional considerations. As well as the name(s) of scholars and technical collaborators who generate a dataset, it may be necessary to include the authors/creators of subsidiary qualitative data. The dates of creation of subsidiary works included in any new qualitative dataset should be clearly indicated. Linguistic, national and regional metadata should be provided where relevant (eg. multi-lingual surveys). The format and version of software used to elaborate the data should also be indicated.
Data preservation, repositing and open qualitative data
Qualitative digital data outputs in the humanities and social sciences can be reposited in the EUI ResData repository, or in a subject repoistory or general repository. By carefully noting the metadata elements explained in Section 5, scholars will have a ready checklist for determining whether, when, how, where and under what terms, research data outputs can be shared as open data. Qualitative datasets that are made available as open data should be the product of original research and should adhere to best-practice regarding data protection and database copyright (Section 2 above).
9. INFRASTRUCTURE, SOFTWARE AND SUPPORT
The EUI ICT Service provides infrastructure, software and connectivity support. Research software programmes are listed on the ICT Service web site. Technical support is provided at the site offices of the ICT Service.
The EUI Library maintains the Data Portal, the Micro Data Restricted Server and the ResData repository. The Library provides support for data discovery, data access, data use, and helps EUI members prepare data management plans. The workflow for the submission, and metadata-enrichment of EUI data outputs is also managed by the Library.
In addition to the software programmes for qualitative research, listed at Section 8 above; the EUI ICT Service provides Fortran, Gauss, MATLAB, OxMetrics, Python, R, Stata, Stat/Transfer, WinEdt, WinRATS – and supports the high-performance computing cluster. A guide to the High Performance Cluster (HPC) is available by writing to [email protected] (EUI members only). Advice on the use of statistical software is provided by the ICT Service and software tutors.
A directory of online research data software manuals, with links to full-text, is available on the Library web site. Data software manuals are available in the Badia Library and the Economics Departmental Library (shelfmarks 001 to 006). The Library holds a substantial collection of works on statistical science, data science, applied mathematics, probability, calculus, data mining, machine learning, modeling, optimisation, regression analysis, experimental design, survey methods and time series (shelfmarks 001-006 and 500-519). Books and manuals in any language may be suggested for acquisition by the Library.
Data support is provided at the Badia Library, and at the Economics Departmental Library, Villa La Fonte (Monday, Wednesday and Friday afternoons from 14:45 to 18:30). Every Friday during term, the Library issues Bulletin of Economic Research and Statistical Data. The Bulletin can only be distributed to EUI mail accounts. Sign up by sending a message with 'subscribe' in the title, to [email protected] Data news is also disseminated via the EUI Library Blog and Twitter.
10. INTERNATIONAL RESEARCH DATA GUIDELINES
Further information is available from these sources:
This page is based on the 7th edition of the EUI Library Research Data Guide published on 18 March 2019.
Contact: Thomas Bourke at [email protected]