Library Data Portal, Data Services and ResData Repository
- Wednesday, 21 Nov., 13:30, Sala del Capitolo: Presentation of Data Management Plans, Open Data and the EUI ResData Repository
- For research data support please contact Thomas Bourke
The EUI Library Data Portal provides access to Library-licensed macroeconomic, micro-socioeconomic and Europe-related databases. 72 data resource guides provide information on coverage, variables, user networks, online manuals and terms and conditions of use. The Data Portal also provides information about the Library's restricted micro data server.
Click below for EUI Library support with data management plans (DMPs); data protection and database copyright; metadata; data preservation and sharing; and data support for Horizon 2020 projects. For assistance, please write to [email protected] or visit one of the Library information desks: (BF-085) or (VLF-035).
EUI members can submit their research datasets for inclusion in the EUI ResData repository - a solution for preserving and sharing data outputs. Full details - and the online form for submitting data for reposit - are on this Library page. For support with the preparation of datasets for submission, please write to [email protected]
1. LIBRARY DATA PORTAL AND DATA DISCOVERY
The Library maintains an indexed Data Portal which (i) provides access to licensed data resources for EUI members and (ii) introduces open data resources of relevance to EUI research themes. All resources in the Data Portal have an individual guide with:
- Data description and scope
- Time-period coverage and release/wave information
- Support links (online manuals, software transfer routines, user networks)
- Terms and conditions of access and use.
There are four sub-directories:
- Macroeconomic databases provide statistics on global, regional and national economic and political-economic developments from international organisations and major data publishers including: Databanks, ECB, Eurostat, Global Financial Data, IMF, OECD, Standard & Poor’s, Thomson Reuters, World Bank, UN and WIIW.
- Micro-socioeconomic databases provide individual, family, household and company observations. Major providers include: DIW, Eurostat, GESIS, ICPSR, Moody’s Analytics/BvD and UKDS. Access to micro data on the EUI Library restricted data server requires additional registration.
- European, EU and Euro area resources provide data for research on pan-European topics, EU states, European sub-state regions and Europe in global context
- The Topics Directory helps match research topics to Library databases.
Data provided to EUI members under Library license are accessible via internet protocol (IP) and – for micro-socioeconomic data – the Library’s restricted data server. The Datastream (Thomson Reuters) financial database is run directly from the EUI desktop ‘programmes’ menu. Data from the Inter-university Consortium for Political and Social Research (ICPSR) archive is downloaded by Library staff, upon request. Modes of access are described in the Data Portal resource guides.
Access to data at other institutions
EUI members who require access to restricted data at another facility should apply for access early in their research project, because applications can be lengthy. Some qualitative and micro geo-coded data can only be accessed at issuers’ secure facilities. Write to [email protected] for assistance with access applications. EUI members who require access to unpublished datasets (eg. underlying data associated with a publication) should contact the Library before writing to data creators/owners. In some instances, it may be possible for EUI members to obtain access via Library consortia.
Access to open data resources
Open data refers to the trend among scholars, government agencies and international organisations to share datasets, codebooks and software via the internet. (In this context, the term ‘scholar’ includes researchers, fellows, professors and all other scientific collaborators.) Open research datasets can be located via the re3data registry of research data repositories, which is indexed by discipline, sub-discipline, data type and host location, and via Google Dataset Search (beta) launched in September 2018. The Open Access Directory and internet search engines also help locate open data. Data management plans (Section 3 below) can be used to determine whether, when, how, where and under what terms, research data outputs can be shared as open data. The research data outputs of EUI members can be reposited in the ResData repository for preservation and sharing (Section 5).
2. DATA PROTECTION, DATABASE COPYRIGHT AND ETHICAL USE
This section provides an overview of data protection and ethical use requirements for micro-socioeconomic and qualitative data; database copyright; and individual data user undertakings. Further information is provided in the EUI Guide to Good Data Protection Practice in Research.
Special terms and conditions apply to access and use of micro-socioeconomic and qualitative data. This reflects the sensitive nature of data observations about human subjects, families and households. Such terms and conditions apply to both (i) micro-socioeconomic data hosted by the Library for EUI members and (ii) micro-socioeconomic data provided directly to EUI members under third-party license. Terms and conditions for each micro dataset hosted by the Library are given in the ‘full details’ section of the resource guides in the Library’s Micro Data Directory.
Persons, families and households cannot be identifiable in any dataset. When using micro datasets, users must not attempt to identify any individual, family or household.
The storage, transfer and use of personal data is subject to data protection rules. For the EUI, see the President's Decision on Privacy and, with a particular view on research: the Guide to Good Data Protection Practice in Research. The processing of particularly sensitive data (referring, for example, to ethnicity, health, political or sexual orientation, religion and biometrics) requires even stricter safeguards. Those rules are currently under revision in light of the new European Union General Data Protection Regulation which comes into effect on 25 May 2018. As written in the EUI Guide on Good Data Protection Practice: “As a general rule, the processing of such personal data is prohibited. However, Article 7 of the EUI’s Data Protection Policy does allow it to be lawfully processed in specific circumstances. The most common in research is upon the data subject’s explicit consent.”
When collecting (eg. via interviews or forms), storing, using or transferring personal data - scholars are responsible for obtaining the specific informed consent of subjects (Section 4(e) below) and for preserving the confidentiality of data observations pertaining to human subjects, families and households. Consent must be intelligible: it should refer clearly and precisely to the scope and the consequences of the data processing, and the retention period. Blanket consent without specifying the exact purpose of the processing is not acceptable.
EUI members must use the micro data request form when applying for access to micro-socioeconomic data hosted on the Library’s restricted server. A separate form is required for each dataset. Applicants must also sign (i) the EUI ‘Terms and Conditions of Use of Micro Data’ paper form, and (ii) the data provider’s terms and conditions’ paper form. Both forms can be signed at the Economics Information Office (Badia Library, 085) or at the Economics Departmental Library (Villa La Fonte, 035). Some providers (eg. Eurostat) require the submission of a detailed project proposal before access is granted.
There is no off-campus access, VPN access, or laptop access to micro-socioeconomic data hosted by the EUI Library. Non-EUI members and short-term visitors do not have access to EUI-hosted micro data.
License agreements and database copyright
Access to, and use of, databases provided by the EUI Library are subject to contractual license agreements and copyright terms. Full details are on the Library’s Terms and Conditions’ web page. Data users are individually responsible for compliance with terms and conditions of access of use. Violation of license terms puts at risk other EUI members’ future access to data resources. All EUI users must scrupulously abide by the terms and conditions of access to, and use of, data provided under license to the EUI community.
Users may not distribute or allow any other party to have access to data which is provided under license; users may not modify or create a derivative work of the licensed materials without the permission of the licensor; users may not remove, obscure or modify any copyright or other proprietary notices included in licensed materials; users may not use licensed materials for commercial purposes; users may not retain or distribute substantial portions of a database, and must comply with any post-project data destruction undertakings in the license.
Agreements between individual users and data providers
Some data issuers require that access contracts be established directly with end-users. If a data issuer also requests the counter-signature of an EUI administrator (‘guarantor’), contact the Library. Upon completion of the access application, the Library will request that the EUI data user sign an internal undertaking to abide by the terms and conditions of access and use.
3. DATA MANAGEMENT PLANS (DMPs)
Data management plans are short documents, normally required by science funding agencies. Research data management is carried out by scholars throughout the duration of a research project. Data management plans should address:
- How data is generated and/or sourced
- How data is used, elaborated and organised
- How data, and data subjects, are protected
- How data, tools and ancillary elements are described and documented
- How data is stored and secured, and how long it will be retained
- How data authorship and credit are assigned
- How data is preserved
- How, whether, and under what terms, research data outputs can be shared.
During the research project it is important to keep an accurate record of changes to dataset design, folders, files, variables and versioning. Scholars undertaking experiments should ensure that pre-agreed terms of disclosure are documented and preserved. Codebooks and other documentation should describe the data, the methodology and the metadata – also known as ‘data about data’ (Section 5 below).
Data management plans (DMPs) can be used as the basis for determining whether, when, how, where and under what terms, research data outputs can be openly shared – or shared under more restrictive terms and conditions. Open data refers to the trend among scholars, government agencies and international organisations to share data outputs and ancillary materials via the internet. The FAIR data principles (findable, accessible, interoperable and re-usable) should be considered during the preparation, and revision, of data management plans. Further information is in the European Commission guide to the FAIR data principles.
EUI members who are required to submit a data management plan – either as part of a funding proposal or during a research project – should write to [email protected] for assistance. Tools such as DMPonline – the Digital Curation Centre’s data management planning tool – can be used to prepare a structured data management plan, complying with (eg.) EU Horizon 2020 requirements. To use DMPonline, enter an email address, name of organisation and create a password. EUI users should select ‘other organisation’ from the drop-down menu. First-time users are taken to the ‘edit profile’ section of the DMPonline platform. The Principal Investigator (P.I.) should be identified in the data management plan. For research teams, the P.I. can assign co-author rights to the data management plan, by entering email addresses and assigning status as ‘co-owner’, ‘editor’ or ‘read only.’
The Principal Investigator should be the contact person for decisions regarding whether, when, how, where and under what terms, research data outputs might be openly shared. If the Principal Investigator is not the same person as the project Data Manager, this should be stated. In international collaborative projects, the name of the person who has final authority regarding decisions on the sharing of data outputs should be indicated. Unless otherwise stated, the Principal Investigator undertakes this role.
Science funders normally require a revised DMP at the mid-point of the research project and a definitive DMP as a project deliverable. It is important to update plans throughout the research project by incorporating information about changes in the scope of the project, new data generation and use, and changes in the composition of the research team or consortium.
4. MANAGING DATA DURING THE RESEARCH PROJECT CYCLE
Data should be carefully managed throughout the duration of the research project. Particular attention should be given to data input; quality control; dataset design; folder structure; file structure; file naming; variable naming; software format; documentation and codebooks; security and backup during the project; data protection (in particular, obtaining and documenting informed consent of data subjects and anonymisation); copyright; preservation; archiving; data citation and – where possible – sharing data outputs as open data. For further details see p.8 and p.16 of the EUI Guide to Good Data Protection Practice in Research. These international guides are also useful for scholars in the social sciences and humanities: Research Data Management Toolkit by JISC (2018), Expert Tour Guide on Data Management by the Consortium of European Social Science Data Archives (2017), Research Data Lifecycle by the UK Data Service (2017) and Guide to Social Science Data Preparation and Archiving by the Inter-university Consortium for Political and Social Research (2012).
Data input and quality control
The generation and/or sourcing of data should be informed by high-quality control standards. Datasets which are generated from empirical work during a research project should adhere to the ethical best-practices of the discipline or sub-discipline. In cases where research datasets are partially based on pre-existing resources, scholars should ensure that the underlying sources are of high quality, and that there are no infringements of copyright. Contact the Library for assistance: [email protected] The ingestion of observations into a research dataset should be undertaken with close attention to accuracy, completeness and consistency. Data codes and variables should be assigned in a logical, accurate and sustainable manner.
Folders, files, variables, format and versioning
The design of the research dataset should be carefully considered at the outset of the project. Dataset design varies by conventions of the discipline and sub-discipline, medium, types of variables, units of analysis, methodology, relationship between data elements, and whether or not the dataset is part of a series. Clear and consistent metadata for folders, files, variables and versioning helps facilitate future data retrieval, reuse and replicability.
The folder structure of the research dataset should be considered at the beginning of the project (eg. hierarchical or horizontal). Qualitative datasets containing text, interviews, photographs &c. may require individual files for each element. File names should be standardised, eg: date, descriptor, version. Variables, such as age, country and sex should be clearly tagged, avoiding special characters and spaces. Temporary identifiers should be removed from the schema. Different file versions should be systematically named, using a standardised date system (YYYY-MM-DD) or version numbering. Further details are in this UKDS guide.
The software format of the dataset should facilitate flexible use of the data. Scholars may consider a different format for final reposit, taking into consideration accessibility via open source (if the dataset can be shared as open data). For reposit in EUI ResData, large datasets should be submitted in archive format (eg. .zip). It is not possible to create multiple ResData entries for individual elements of a dataset. Subsequent waves of a dataset can be assigned new entries in ResData (Section 5 below).
Documentation and codebooks
Clear and accurate documentation should be provided about the purpose and context of the research project, and about the research data output. Good documentation makes datasets findable, accessible, interoperable and re-usable - FAIR data principles. Documentation should include a detailed description of the dataset, providing information about folders, files, variables, versioning, and – where applicable – information about problematic values, missing observations and weightings. Codebooks, questionnaires and data dictionaries should be included. Recommendations are in this ICPSR guide. A concise note on methodology, or methodologies, should be given. Information should be provided on how and when the data was generated and/or collected; how the data was collated and elaborated during the research project; and how the dataset might be used for future research. Documentation in the EUI ResData repository is included in a ‘Docs’ dossier in PDF/A format, or plain text, for long-term preservation.
Security and backup during the research project
During the research project it is important to keep data secure at all times. Scholars should use a desktop computer for data elaboration, and make regular backups on the EUI network server, or on a safely-secured external memory device to facilitate recovery. In accordance with contractual agreements, micro-socioeconomic data hosted at the EUI can only be accessed and elaborated on a desktop computer in a secure location, after completing the required Library registration protocols. Preliminary findings and associated documentation should be kept in locked storage when not in use. The EUI ICT Service provides a strong password policy to prevent the loss, exposure, or corruption of sensitive information.
Data protection, informed consent of data subjects and anonymisation
The data protection requirements for data use, described in Section 2 above, also apply to data generation during the research project cycle. OECD guidelines recommend: “The default position should be that personal data is not collected, processed or shared without informed consent.” The mode of consent obtained from subjects depends on the nature of the research project, the kind of data collected, and how and when the data will be used. Where possible, it is recommended to obtain written consent, using a template appropriate to the discipline or sub-discipline in which the research is being conducted.
Dataset creators are responsible for the anonymisation of sensitive data observations. Anonymisation techniques include: data masking (partial data removal and data quarantining); pseudonymisation; aggregation (cell suppression, inference control, rounding, sampling &c.); and derivative data items and banding. Anonymisation techniques are described in the Anonymisation Code of Practice, UK Information Commissioner's Office (2012, Appendix 2). The UKDS provides guidelines to anonymisation for both quantitative and qualitative data.
Copyright considerations for data access and use, described in Section 2 above, also apply to in-project data management. It is not possible to publish a dataset containing significant portions of data sourced from pre-existing copyrighted databases governed by contractual license. Write to [email protected] for advice on data protection and database copyright.
5. EUI RESDATA REPOSITORY: PRESERVING AND SHARING DATA
The Library launched the EUI ResData repository in October 2017. ResData is a solution for the preservation and sharing of EUI research data outputs. This section provides information on preparing research data outputs for long-term preservation and – where possible – sharing with other users. The Metadata section below, explains how to create data about data, which are essential for reposit, retrieval and reuse. EUI ResData is based on the DSpace infrastructure.
Preparing data for reposit in EUI ResData
Scholars presenting datasets for inclusion in the ResData repository should pay particular attention to data quality control and dataset structure. It is not possible to reposit unstructured data elements and observations in ResData. Large datasets should be submitted in archive format (eg. .zip). Data should be submitted in original file format version. It is not possible to create multiple ResData entries for sub-sections of a dataset. Sub-sections should be accommodated in the folder structure. Documentation should include a concise overview of the research project and methodology. Codebooks and other relevant documentation should be provided in sustainable PDF/A format, or plain text, for preservation in a ‘Docs’ dossier.
Completing the online EUI ResData submission form
EUI members submitting a dataset for inclusion in the EUI ResData repository should first complete the Library’s online metadata form. Library staff will use information from the form to generate dataset-level metadata in the Dublin Core schema. An appointment for data transfer will subsequently be made by Library staff.
By submitting the online form, EUI members acknowledge that the dataset for reposit is the result of original data generation; or is the output of significant, value-added, elaboration of pre-existing sources. Datasets presented for inclusion in the EUI ResData repository must be the output of research by a current EUI member – or at least one EUI member of a research team. The name of the Principal Investigator, researcher(s), and – where applicable – technical collaborator(s), must be provided. EUI email contacts must be given. If the project is undertaken in the context of a consortium, the name of the Data Manager should be provided (if different from the Principal Investigator).
By completing the ResData submission form, EUI dataset creators certify that their work complies with the Code of Ethics in Academic Research of the European University Institute. The source(s) of the data must be indicated. If the dataset is the output of original data generation, details must be provided. If the dataset is derived from pre-existing sources, those sources must be clearly indicated (eg. data creator, institutional source and/or publisher).
EUI members submitting to ResData should state whether or not the dataset can be shared as open data. The Library can offer guidance on data protection and database copyright. Submitters should indicate if datasets presented for reposit are to be subject to embargo. In some cases it may be possible to create a limited public version of a larger restricted dataset.
Persons, families and households cannot be identifiable in any dataset. The collection and use of observations relating to ethnicity, health, orientation, religion, biometrics &c. are subject to data protection laws. Depositors are responsible for obtaining the informed consent of subjects for the collection and processing of personal data. Dataset creators are responsible for the anonymisation of data observations (Section 4 above). Creators of research data outputs which have been elaborated from pre-existing copyrighted sources may need to obtain the permission of rights’ owners before open data sharing. It is not possible to publish a dataset containing substantial portions of data sourced from pre-existing databases governed by contractual license. The EUI Library can provide advice, and can also assist scholars to reposit datasets in discipline-specific data repositories and the multi-disciplinary Zenodo repository. Major data repositories are indexed in the international re3data registry, Google Dataset Search (beta, September 2018) and the data repositories’ section of the Open Access Directory.
Metadata are data about data, presented in a systematic schema. Accurate metadata are necessary for the organisation, use, repositing, sharing and machine-discovery of datasets. Throughout the research cycle, it is important to keep a detailed and updated record of data capture, input, use and elaboration. An introduction to metadata standards for social science and humanities’ data has been published by the Digital Curation Centre.
Metadata elements can be used as a ‘checklist’ to determine whether, when, how, where and under what terms, research data outputs can be shared as open data. Metadata should be consistent during the research project. Some research data outputs may require multi-lingual metadata.
EUI members submitting a dataset for inclusion in the EUI ResData repository should first complete the Library’s online submission form. These are the principal metadata fields:
NAME(S) OF DATASET CREATOR(S)
The name, or names, of the scholars and technical collaborators who created the dataset must be provided. The name of the Principal Investigator must be given if the dataset has been created by a research team. If the project is undertaken in the context of a consortium, the name of the Data Manager must be provided (if different from the P.I.). Where researcher ID numbers are available, eg. ORCID, these should be provided. Guidelines on authorship and credit for research outputs are provided by CASRAI.
The EUI email of the dataset creator(s) must be provided.
TITLE OF DATASET
The title should succinctly convey the nature and scope of the dataset.
DESCRIPTION OF DATA
A meaningful abstract, describing the data, the purpose of the research project and the methodology, must be provided.
SOURCE(S) OF DATA
The source(s) of the data must be clearly indicated. If the dataset has been generated during a research project, this should be indicated with details of data collection methods (eg. survey parameters). If the dataset is derived from a pre-existing database, all source(s) must be clearly cited; eg. publisher, issuer. If there are multiple sources, all must be cited.
TYPE OF DATA
The type of data must be indicated: eg: statistical; textual; observational; computational; experimental or simulational.
YEAR OF COMPLETION OF DATASET
The date of completion of the dataset must be provided. If part of a data series/wave, this should be indicated.
DATE-RANGE COVERAGE OF DATASET
The start- and end-dates of dataset coverage must be provided.
GEOGRAPHICAL COVERAGE OF DATASET
Where applicable, the geographical scope of the dataset (national, regional, global &c.) should be indicated.
FORMAT OF DATA
The software format and version must be given (eg. STATA 15, .csv, Excel, .txt, &c.).
CODEBOOK / SUPPORTING DOCUMENTATION
Codebooks and supporting documentation should be provided in PDF/A or simple text format.
The status of access to the data must be indicated. The status ‘open data’ should be assigned to datasets that are to be made publicly available via the internet. If the data is subject to embargo, the expiry date of the embargo should be indicated.
It is advisable to provide a license (eg. CC-BY International).
The Library will assign a unique object identifier to the dataset, for the purpose of locating, linking and citation.
Where applicable, multi-lingual documentation, tags, questionnaires and variable descriptions should be provided.
The name of funding bodies, and research grant numbers, should be provided where applicable.
Bibliographical details of publications based on the dataset, if any, should be listed with links to abstracts and, where possible, full-texts.
PROJECTED FUTURE WAVES OF DATASET
In cases where it is intended to generate future iterations of the dataset, details should be provided.
A short ‘ready-to-use’ data citation should be provided, incorporating core descriptive elements: family name; first name; year; dataset title; ‘EUI ResData’. For example: ‘Hutcheson, Derek; Jeffers, Kristen (2013): Integration Indicators (CITINT). EUI ResData.’
6. RESEARCH DATA IN EU HORIZON 2020
EUI project managers who are preparing EU Horizon 2020 applications are required to submit preliminary information about data management in the proposed project. That section of the application is evaluated by the European Commission under the criterion ‘impact.’ When completing the general information section of the Horizon 2020 application, EUI project applicants should address the following questions: (i) What types of data will the project generate/collect? (ii) What standards will be used? (iii) How will the data be exploited and/or shared for verification and re-use? (iv) If data cannot be made available, explain why (v) How will the data be curated and preserved? The EUI Library assists project managers with the data management sections of funding applications and data management plans. Write to [email protected]
If the project proposal is approved, a data management plan (DMP) must be provided within six months of the start of the contract period. The European Commission mandates two further versions of the DMP; one at mid-point and one as a final project deliverable. The EC provides further details on p.5 of Guidelines on Data Management in Horizon 2020. The DMPonline tool can be used for Horizon 2020 data management plans, by selecting the Horizon 2020 template in the Funder section. The EUI ResData repository is a Horizon 2020 compliant solution for the preservation and sharing of research project outputs.
The European Commission’s policy is for research data to be “as open as possible, as closed as necessary”. A list of opt-outs is provided by the EC. The Horizon 2020 research data pilot requires grant beneficiaries to deposit research data outputs in a research data repository. The EC does not recommend specific repositories. The guidelines (p.10) state that “as far as possible, projects must then take measures to enable third parties to access, mine, exploit, reproduce and disseminate (free of charge for any user) this research data. One straightforward and effective way of doing this is to attach Creative Commons Licences (CC BY or CC0) to the data deposited.”
7. OPEN DATA
There is a growing trend among scholars, government agencies and international organisations to share data and associated documentation, code, software and tools. Open data resources are available via the internet. Major data repositories are indexed in the international re3data registry, Google Dataset Search (beta) and the data repositories’ section of the Open Access Directory.
By carefully noting the metadata elements explained in Section 5 above, scholars will have a ready checklist for determining whether, when, how, where and under what terms, research data outputs can be shared as open data. Datasets that are made available as open data should be the product of original research. Outputs should be either (i) the result of original data generation, or (ii) the output of significant, value-added, elaboration of pre-existing sources.
Not all research data outputs can be openly shared. The two most significant considerations when determining whether a research dataset can be made available on an open data basis are:
- Data protection: Persons, families and households cannot be identifiable in any dataset. Depositors are responsible for obtaining and documenting the informed consent of subjects for the collection and processing of personal data. Dataset creators are responsible for the anonymisation of data observations (Section 4 above). The EUI Guide to Good Data Protection Practice in Research provides further information on data protection.
- Database copyright: Research data outputs which are elaborated from pre-existing copyrighted sources may require permission to reposit and share. It is not possible to publish a dataset containing substantial portions of data sourced from pre-existing databases governed by contractual license. The Library can provide advice on copyright and data protection.
Research data outputs can be preserved under a variety of access terms and conditions. Access status may change over time. Data can be made openly available for all users via the internet; data can be subject to pre-access registration terms; data can be subject to user contract (sometimes requiring a project proposal); data can be embargoed for a defined period (or indefinitely) and data can be restricted to on-site access and use. In some cases it may be possible to create a limited public version of a larger, restricted dataset. Data can also be reposited solely for preservation purposes (dark archive).
8. QUALITATIVE DATA IN THE HUMANITIES AND SOCIAL SCIENCES
This section treats access to, and use of, qualitative data in the humanities and social sciences. Examples of qualitative data include; minable text, transcripts of interviews; images; audio and video recordings; survey diaries; archival material; field notes; and free-text surveys. The definition of ‘data’ varies across academic disciplines and sub-disciplines – especially where there is a mix of qualitative and quantitative methods. It is important that dataset design be located in the culture of the discipline or sub-discipline in which the research is undertaken.
Access and terms and conditions of use: qualitative data
The handling, use and sharing of qualitative data in the social sciences and humanities is subject to strong ethical considerations and standards. When accessing qualitative data, it is important for scholars to familiarise themselves with the terms and conditions of access, and use, as indicated by the holding institution and/or rights owner. If a digital database is being generated from non-digital materials, it is important to obtain the consent of subjects and/or rights’ holders in advance of digitisation.
The generation of qualitative data is also subject to strong ethical considerations and standards. Qualitative data can be generated from surveys, free-text responses to interview questions, focus group recordings or experimental simulations. In all cases, subjects should be informed of their rights as established by jurisdictional data protection legislation, and best-practice guidelines from scholarly societies in the relevant discipline or sub-discipline. Scholars are responsible for obtaining the informed consent of subjects for the collection and processing of personal data. The mode of consent obtained from subjects depends on the nature of the research project, the kind of data collected, and how and when the data will be used. Where possible, it is recommended to obtain written consent, using a template appropriate to the discipline or sub-discipline in which the research is undertaken. Dataset creators are responsible for the anonymisation of sensitive data observations (see above).
Due to the sensitive personal nature of many qualitative data observations, scholars should pay particular attention to ethical standards when handling data relating to ethnicity, health, orientation, religion, biometrics &c. Persons, families and households cannot be identifiable in any dataset. The linking of variables on ethnicity &c. to individuals, families or households, is governed by data protection legislation and academic best-practice.
Support, software and infrastructure for qualitative data
Support for qualitative data use and elaboration is provided by the EUI Library. Software support is provided by the EUI ICT Service. ArcGIS and ATLAS.ti can be used for analysis, mapping and visualisation of qualitative non-numerical data such as audio, graphics, text and video. Coding Analysis Toolkit (CAT) can be used for content and discourse analysis. Tools for data backup (SyncToy), file zipping (7-Zip), data encryption (TrueCrypt) and image adjustment (Resizer) are also available. Many of the tools used for the analysis of quantitative data (eg. Gauss, Julia, MATLAB, Python, R, Stata) can also be used for qualitative data analysis, where aggregate statistical observations are required. The analysis of restricted personal data may require scholars to work in a ‘safe-room’ environment. If this is stipulated by a data provider or project funder, write to [email protected]
Research data management and data management plans for qualitative data
Although research data management for qualitative data is similar to research data management for quantitative data – there are some additional considerations. Research data management encompasses the control of data inputs, the handling and use of data, the protection of data, the creation of data outputs and the preservation of data. Research data management is carried out by scholars throughout the duration of a research project (Section 3 above). During data analysis, qualitative data materials should be carefully handled and secured, either in a locked storage unit or in a locked room. This is particularly important for confidential, unique and archival material being collated or prepared for digitisation.
Data management plans (DMPs) are normally required by science funding agencies. Due to the heterogeneous, multi-media and complex nature of qualitative data in the humanities and social sciences, it is particularly important for scholars to keep a record of data sources, including notebooks, questionnaires, codebooks and multilingual thesauri. Supporting documentation serves as the basis for accurate metadata, and facilitates future retrieval and reuse. In the case of non-repeatable, time-sensitive, socio-political research, data management plans may require a detailed explanation of the qualitative research methods used.
Metadata for qualitative data
Metadata are data about data, presented in a systematic schema. Accurate metadata are necessary for the organisation, use, repositing, sharing and machine-discovery of datasets (see above). The metadata fields used for quantitative data outputs can also be used for qualitative data outputs. However there are additional considerations. As well as the name(s) of scholars and technical collaborators who generate a dataset, it may be necessary to include the authors/creators of subsidiary qualitative data. The dates of creation of subsidiary works included in any new qualitative dataset should be clearly indicated. Linguistic, national and regional metadata should be provided where relevant (eg. multi-lingual surveys). The format and version of software used to elaborate the data should also be indicated.
Data preservation, repositing and open qualitative data
Qualitative digital data outputs in the humanities and social sciences can be reposited in the EUI ResData repository, or in a subject repository, or a multi-disciplinary repository. By carefully noting the metadata elements explained above, scholars will have a ready checklist for determining whether, when, how, where and under what terms, research data outputs can be shared as open data. Qualitative datasets that are made available as open data should be the product of original research and should adhere to best-practice regarding data protection and database copyright.
9. EUI INFRASTRUCTURE, SOFTWARE AND SUPPORT
Infrastructure, software and connectivity are provided by the EUI ICT Service. Research software programmes are listed on the EUI ICT Service web site. Technical support is provided at the site offices of the ICT Service. The EUI Library maintains the Data Portal, the Micro Data Restricted Server and the ResData repository. Advice on the use of statistical software is provided by EUI research software tutors. The Data and Methods Clinic organises workshops and data consultations for EUI members.
The EUI ICT Service provides research data software, including Fortran, Gauss, MATLAB, OxMetrics, Python, R, Stata, Stat/Transfer, WinEdt, WinRATS, and supports the high-performance computing cluster at the EUI. A guide to the High Performance Cluster (HPC) is available by writing to [email protected] (EUI members only).
The EUI Library maintains the Data Portal, the Micro Data Restricted Server and the ResData repository. The Library provides assistance with data discovery, data access and use of licensed digital databases, and open data. The Library also helps EUI members prepare data management plans. The workflow for the submission, preparation and metadata enrichment of EUI data outputs is also managed by the Library.
A directory of online research data software manuals, with full-text links, is available on the Library web site. Data software manuals are available in the Badia Library and the Economics Departmental Library (shelfmarks 001 to 006). The Library holds a substantial collection of works on statistical science and data methodology (shelfmarks 500 to 519). Books and manuals in any language may be suggested for acquisition by the Library.
Data support is provided at the Badia Library (office 085) on weekday mornings and on Tuesday and Thursday afternoons. Data support in the Economics Department is provided on Monday, Wednesday and Friday afternoons from 15:00 to 19:00 (Departmental Library, Villa La Fonte).
Every Friday during term, the Library issues an e-Bulletin with updates on new data releases, information on how to use Library and internet data resources, and developments in statistical science. EUI members can sign up for the weekly e-Bulletin with an @eui.eu account. Send a message with ‘subscribe’ in the title to [email protected] Data news is also disseminated via the EUI Library Blog and Twitter.
10. INTERNATIONAL RESEARCH DATA GUIDELINES
Information on data use and management, data protection, data preservation and open data is available from these sources:
The 6th edition EUI Library Research Data Guide was published on 1 March 2018.
Contact: Thomas Bourke at [email protected]