Download presentation
Presentation is loading. Please wait.
Published byAdrian Perkins Modified over 9 years ago
1
Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona Statache, Claire O’Malley, Tom Rodden, and Derek McAuley HORIZON Digital Economy Research, University of Nottingham
2
Rising popularity of public and semi-public online communication channels: Written form: Blogs (late 1990s); Wikipedia (2001); Facebook (2004); Reddit (2005); Twitter (2006) Spoken form: Podcasts (2004); YouTube (2005) Low effort and cost of data collection Unobtrusive ‘behind the scenes’ data collection using application programme interfaces (APIs) or web scraping techniques (e.g. Twitter; Blogs) Growing appeal of the Web as source of corpus data
3
Section 2.9 ‘Internet research’: In the case of an open-access site, where contributions are publicly archived, and informants might reasonably be expected to regard their contributions as public, individual consent may not be required. In other cases it normally would be required. OK easy, unless a site blocks access (e.g. password required), no consent is needed from observed public. … Or is it? BAAL Recommendations on Good Practice in Applied Linguistics (2006)
4
Online data collection is often undetectable to the public (i.e. covert) unless they are explicitly informed about it. Section 2.5 ‘covert research’: Observation in public places is a particularly problematic issue. If observations or recordings are made of the public at large, it is not possible to gain informed consent from everyone. However, post-hoc consent should be negotiated if the researcher is challenged by a member of the public. Unless explicitly informed about the data collection, the public has no chance to challenge and demand post-hoc consent. BAAL Recommendations on Good Practice in Applied Linguistics (2006)
5
Section 2.5 ‘covert research’ (concluding part): A useful criterion by which to judge the acceptability of research is to anticipate or elicit, post hoc, the reaction of informants when they are told about the precise objectives of the study. If anger or other strong reactions are likely or expressed, then such data collection is inappropriate. Researchers should, at the end of the data collection period, post a message about the research, offering some form of ‘opt-out’ procedure for any participant who wishes to do so. BAAL Recommendations on Good Practice in Applied Linguistics (2006)
6
When is online information ‘public’ and can be “freely quoted and analyzed […] without consent”? [Bruckman, 2002] It is officially, publicly archived No password is required for access No site policy prohibits it The topic is not highly sensitive. With Google-caching, retweeting, ‘Like’ buttons etc. what is the true meaning of “officially, publicly archived”? Software defaults settings produce publicly accessible archives without users formulating a conscious decision (e.g. Blogs) Public – Private distinction
7
“… where contributions are publicly archived, and informants might reasonably be expected to regard their contributions as public, individual consent may not be required.” [section 2.9, BAAL Recommendations on Good Practice in Applied Linguistics] It is not always easy to determine which online spaces people perceive as 'private' or 'public‘. Participants may consider their publicly accessible internet activity to be private despite agreeing to the site User License Agreements. Communication may have been private when it was first conducted, even if it is now publicly available. Public – Private expectation
8
Public/Private: People may operate in public spaces but maintain strong perceptions or expectations of privacy. The substance of their communication may be public, but the context in which it appears implies restrictions on how that information is -- or ought to be -- used. Social, academic, or regulatory delineations of public and private as a clearly recognizable binary no longer holds in everyday practice. AoIR Ethics Working Committee (version 2.0)
9
Communication on the Internet has characteristics that are different from communication in other channels (Boyd, 2008): Persistence: postings on the Internet are automatically registered and stored; Replicability: content in digital form can be duplicated without cost; Invisible audiences: we do not know who sees our postings. Searchability: content in the networked public sphere is very easily accessible by conducting a search. People therefore do not have an intuitive sense about the level of privacy that they should expect from internet communication Factors affecting user behaviour and expectations of privacy
10
Consent was given when the “Terms and Conditions” of the site were click-signed. T&Cs are rarely read T&Cs are too vague and incomprehensible to gain true informed consent (Luger, 2013) Having other people read your conversation is different from having your past conversations made into a corpus for analysis. In a climate where ethically questionable social media anlytics for commercial and security gain are increasing, academia has a responsibility to enter into the discussion of what constitutes good, ethical conduct. Responsibility when dealing with online communication
11
Purpose: obtain first-hand data concerning conditions under which participants would be willing to consent to having their data used for research purposes Targeted at a wide cross-section of the population How do conditions for consent change as function of: Participant demographics The type of social network platform The type of organization doing the study The type of question being studied http://casma.wp.horizon.ac.uk/casma-projects/ccasmd Questionnaire study regarding conditions for consent
12
‘Respect for the autonomy and dignity of persons’, e.g. privacy, is not the only factor determining ethics of data collection or analysis. Scientific value Social responsibility Maximizing benefits and minimizing harm Research conducted to prevent socially unacceptable behaviour, e.g. bullying, may require using data from perpetrators without their consent. The greater good
13
There is no binary divide between private and public. Expectations of privacy differ from official site policy. The ‘public’ nature of a platform does not provide a carte-blanch for accessing the data hosted on it. Maximize transparency of research as much as possible. If opt-in is not possible, opt-out should be offered. If contacting all subjects is not possible, at least contact some to get a sense of the subjective response to the study and methods. Conclusions
14
Thank you for your attention ansgar.koene@nottingham.ac.uk Project blog: http://casma.wp.horizon.ac.uk Consent survey: http://casma.wp.horizon.ac.uk/casma-projects/ccasmd Twitter: @CaSMaResearch
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.