Best Practices for the Documentation of Data Collection

This page contains best practices for the Documentation of Data Collection, including instructions and examples.

What is a Documentation of Data Collection?

Discuss Data understands itself as part of initiatives in the social sciences and humanities which aim to make the collection or production and analysis of research data transparent in order to allow for a critical assessment of the validity of the conclusions which have been drawn from them. Moreover, Discuss Data aims to offer the academic community the chance to benefit from Data Collection efforts made by others. Secondary data analysis can provide huge benefits to researchers:

  • It can save costs which is especially important for early-stage researchers who lack funding
  • It can add further evidence to original data, supporting or challenging one’s own results
  • It can allow for comparisons across cases by providing data on additional cases
  • It can offer access to unique information, if, for instance, historical data can no longer be found or generated

However, secondary data analysis crucially relies on a precise documentation to ensure that the data is not misinterpreted. Therefore, all Data Collections which are submitted to Discuss Data should be accompanied by such a documentation - the Documentation of Data Collection.

The Documentation of Data Collection explains the broader context of the Data Collection, e. g. how, when and where the data was compiled, what the data means, and may contain ethical issues, empirical challenges, and epistemological approaches.

What should be included in the Documentation of Data Collection?

Discuss Data does not offer strict guidelines for the Documentation of Data Collection. As an interdisciplinary forum covering the whole range of quantitative and qualitative methods in the social sciences and the humanities, Discuss Data is open to different ways of documenting data collections. However, Discuss Data aims to develop some minimum requirements for the Documentation of Data Collection in order to allow for a sensible and meaningful discussion and to enable secondary data analysis. In compiling the Documentation of Data Collection, Discuss Data suggests following the FAIR Data principles, which are aimed at making data Findable, Accessible, Interoperable, and Reusable.

In general, Discuss Data expects a Documentation of Data Collection to indicate (in at least a brief description) the methodology of the Data Collection:

  • Title of the Data Collection
  • Creator / Author of the Data Collection
  • Guiding research interest and paradigm (i.e., epistemological approach)
  • The sources of the data
  • Date of the creation of the Data Collection
  • Time period covered by the Data Collection
  • Data Collection Methods: Sampling, Data Collection process, instruments used, hardware and software used, scale and resolution, temporal and geographic coverage
  • Detailed description of the Data Collection
  • Context of the data: Project history, aim, objectives and hypotheses
  • Handling of empirical challenges (as encountered during data collection), validity of the collected data, reliability of sources which provided the data, cleaning and quality assurance procedures carried out
  • Handling of ethical issues (as appropriate)
  • Changes made to data over time since their original creation and identification of different versions of data files
  • Information on access and use conditions or data confidentiality

Go to our checklist with a short overview, what should be included in the Documentation of Data Collection.

The Documentation of Data Collection is submitted together with the Data itself and the Metadata and, like the Metadata, is always accessible, even if the access of the Data itself is restricted.

Examples

The form of the Documentation of Data Collection crucially depends on the methods used. While a Documentation of Data Collection concerning statistical data should allow the replication of the respective analysis, a replication study of qualitative elite interviews is not feasible, because time will impact on the result. If the interview is repeated after some time, the same interview partner may not be able or willing to repeat the old views, especially if events related to the interview topic have unfolded in unexpected ways.

Best practices also depend on the epistemological approach of the responsible researcher. The neo-positivist tradition usually documents qualitative elite interviews with a list indicating the name and position of the interview partner as well as the time and place of the interview. The questionnaire or interview guide should be added and interview recordings or transcripts are archived. In the interpretive approach, however, the focus is on contextuality and evidence is seen as co-generated in interaction with the research participants. Accordingly, the Documentation of Data Collection takes the form of field notes, describing the setting of an interview, important events prior to the interview which may have impacted on the persons taking part in the interview process (including the researcher) and any other details deemed relevant by the researcher.

In order to support the quality and standardisation of the Documentation of Data Collections, Discuss Data aims to offer a reference to an actual Documentation of Data Collection that fulfils the requirements of Discuss Data or to a brief discussion of the requirements and challenges of such a documentation by an experienced researcher for as many methods of data collection as possible. Below you find examples which may help drafting your own Documentation of Data Collection.

In case you need further help or information, please contact the responsible curator or the Discuss Data team (info@discuss-data.net).

Case studies

The collection of materials of all sorts related to case studies or process tracing as part of a neo-positivist research design should be documented following the rules for active citations, as lied out by Andrew Moravcsik, see

Andrew Moravcsik (2010): Active Citation: A Precondition for Replicable Qualitative Research, in: PS January 2010, pp. 29-35, doi:10.1017/S1049096509990783, available online at https://www.princeton.edu/~amoravcs/library/ps.pdf

Andrew Moravcsik (2014): Transparency: The Revolution in Qualitative Research, in: PS January 2014, pp. 48-53, doi:10.1017/S1049096513001789, available online at https://www.princeton.edu/~amoravcs/library/transparency.pdf


Concerning the collection of materials for case studies based on a social anthropological, ethnographic or interpretivist approach, Discuss Data recommends the transparency standards presented by

Katherine Cramer (2015): Transparent explanations, yes. Public transcripts and fieldnotes, no: Ethnographic research on public opinion, in: Qualitative & Multi-Method Research, 13 (1), pp. 17-20; available online at: https://zenodo.org/record/893069#.W8X0yPaYTGg

Content analysis of mass media reporting

The following documentation of the creation and content analysis of a text corpus of media reporting can serve as a sample which has to be adjusted to individual projects:

Andreas Heinrich, Heiko Pleines et al. (2014): Analysis of mass media reports on export pipelines (Azerbaijan, Kazakhstan, Turkmenistan). Part I. Documentation of the creation of the text corpus, Part II. Full codebook; available online at: https://www.forschungsstelle.uni-bremen.de/UserFiles/file/Pipelines-Caspian_media-list+codebook.pdf

For quantitative computer-assisted text analysis the following instructions are also relevant:

David Romney, Brandon Stewart, Dustin Tingley (2015): Plain text? Transparency in computer-assisted text analysis, in: Qualitative & Multi-Method Research, 13 (1), pp. 32-38; available online at: https://zenodo.org/record/893085#.W8X5o_aYTGg


Concerning the documentation of text analysis following the hermeneutic or interpretivist approach, Discuss Data accepts the transparency standards presented by

Andrew Davidson (2015): Hermeneutics and the question of transparency, in: Qualitative & Multi-Method Research, 13 (1), pp. 43-47; available online at: https://zenodo.org/record/893073#.W8X6f_aYTGg

Content analysis of official documents

For a documentation of the creation of a text corpus of official (i.e. published) documents and analyses (concerning primary as well as secondary sources) all documents or publications should be presented following the rules for active citations:

Andrew Moravcsik (2010): Active Citation: A Precondition for Replicable Qualitative Research, in: PS January 2010, pp. 29-35, doi:10.1017/S1049096509990783, available online at https://www.princeton.edu/~amoravcs/library/ps.pdf

Andrew Moravcsik (2014): Transparency: The Revolution in Qualitative Research, in: PS January 2014, pp. 48-53, doi:10.1017/S1049096513001789, available online at https://www.princeton.edu/~amoravcs/library/transparency.pdf

The content analysis should be documented with the help of a codebook following the example given for content analysis of mass media.

For quantitative computer-assisted text analysis the following instructions are also relevant:

David Romney, Brandon Stewart, Dustin Tingley (2015): Plain text? Transparency in computer-assisted text analysis, in: Qualitative & Multi-Method Research, 13 (1), pp. 32-38; available online at: https://zenodo.org/record/893085#.W8X5o_aYTGg


Concerning the documentation of text analysis following the hermeneutic or interpretivist approach, Discuss Data accepts the transparency standards presented by

Andrew Davidson (2015): Hermeneutics and the question of transparency, in: Qualitative & Multi-Method Research, 13 (1), pp. 43-47; available online at: https://zenodo.org/record/893073#.W8X6f_aYTGg

Content analysis of social media

For quantitative computer-assisted text analysis the following instructions are relevant:

David Romney, Brandon Stewart, Dustin Tingley (2015): Plain text? Transparency in computer-assisted text analysis, in: Qualitative & Multi-Method Research, 13 (1), pp. 32-38; available online at: https://zenodo.org/record/893085#.W8X5o_aYTGg.

We are grateful for any suggestions related to samples or instructions concerning the documentation of corpus creation and content analysis of social media posts, that could be added here as examples.

Interviews

The documentation of expert or elite interviews conducted in a neo-positivist research design should follow the rules for documentation laid out by

Erik Bleich, Robert J. Pekkanen (2015). Data Access, Research Transparency, and Interviews: The Interview Methods Appendix, in: Qualitative & Multi-Method Research, 13 (1), pp. 8-13; available online at: https://zenodo.org/record/892386#.W8XsavaYTGg

If the submitted interview transcripts have not been completely and irreversibly anonymized, scanned consent forms signed by the respondents have to be submitted to Discuss Data during the upload of the Data Collection. For more information, please see our FAQs on informed consent.


Concerning the documentation of interviews conducted in an interpretivist way, Discuss Data accepts the transparency standards presented by

Katherine Cramer (2015): Transparent explanations, yes. Public transcripts and fieldnotes, no: Ethnographic research on public opinion, in: Qualitative & Multi-Method Research, 13 (1), pp. 17-20; available online at: https://zenodo.org/record/893069#.W8X0yPaYTGg

Process tracing

The documentation of data collection for process tracing follows the logic described above for case studies.

Additionally, for a critical discussion of research transparency in the case of process tracing you can consult:

Tasha Fairfield (2015): Reflections on analytic transparency in process tracing research, in: Qualitative & Multi-Method Research, 13 (1), pp. 17-20; available online at: https://zenodo.org/record/893075#.W8bnJfaYTGg

Protest-event databases

The following documentation of compilation of a protest-event database can serve as a sample which has to be adjusted to individual projects:

Beissinger, Mark R. (2003): Codebook for Disaggregated Event Data. “Mass Demonstrations and Mass Violent Events in the Former USSR, 1987-1992 [Event databases used for the analysis in Nationalist Mobilization and the Collapse of the Soviet State]”, available online at: https://scholar.princeton.edu/mbeissinger/publications/mass-demonstrations-and-mass-violent-events-former-ussr-1987-1992-these, copy at http://www.tinyurl.com/yafecbne

Participant observation

Concerning the documentation of participant observation in social anthropological or ethnographic research, Discuss Data accepts the transparency standards presented by

Katherine Cramer (2015): Transparent explanations, yes. Public transcripts and fieldnotes, no: Ethnographic research on public opinion, in: Qualitative & Multi-Method Research, 13 (1), pp. 17-20; available online at: https://zenodo.org/record/893069#.W8X0yPaYTGg

Representative opinion polls

The documentation of representative opinion polls should follow the guidelines of one of the respected international polling organisations. Discuss Data recommends the following example of guidelines for the documentation of representative opinion polls:

British Polling Council: Statement of Disclosure, available online at http://www.britishpollingcouncil.org/statement-of-disclosure/

Official statistics

If official statistics are used, it is important to give a complete reference to the data source used including the date of collection (or documenting later revisions to the data if relevant). Additionally, the validity of the statistical data should be discussed including an explicit reference to the definitions and collection methods used, highlighting any related incoherencies over time, between units of analysis or between sources.

When discussing the collection and validity of statistical data please consider the experiences described by

Francesca Refsum Jensenius (2014): The Fieldwork of Quantitative Data Collection, in: PS April 2014, pp. 402-404, doi:10.1017/S1049096514000298, available online at www.francesca.no/wp-content/2014/04/PS_Jensenius.pdf

Focus groups

We are grateful for any suggestions related to samples or instructions concerning the documentation of focus group discussions.

Network analysis

We are grateful for any suggestions related to samples or instructions concerning the documentation of network analysis.

Your method is missing?

Please write us to info@discuss-data.net