Presentation is loading. Please wait.

Presentation is loading. Please wait.

This work was supported by the TRUST Center (NSF award number CCF-0424422) Introduction With recent advances in technology comes an increase in the quantity.

Similar presentations


Presentation on theme: "This work was supported by the TRUST Center (NSF award number CCF-0424422) Introduction With recent advances in technology comes an increase in the quantity."— Presentation transcript:

1 This work was supported by the TRUST Center (NSF award number CCF-0424422) Introduction With recent advances in technology comes an increase in the quantity of information available in the public domain, which raises concerns regarding the individuals’ right to privacy. Our team is interested in understanding the public’s concerns about information privacy in general. To study this issue, we sought to identify publicly available data to study. After exploring several sources, we chose Yahoo! Answers as an initial source of privacy complaint data because it provided both a useful and free API and a vast amount of publicly available data that could be obtained, thus eliminating any violations of personal privacy that could arise. To collect this data, we wrote a python script to create a command line executed tool that queries Yahoo! Answers for specified keywords and stores selected attributes of questions in a MySQL database. My focus in this team was on adding command line flags, including additional parameters in the Yahoo! Answers URL, and creating a cronjob to automatically run the script. Methods The flowchart below illustrates the design of the overall script. The process includes connecting to and querying Yahoo! Answers for a specified keyword and store the results in a database. My focus is highlighted in purple. Process Overview Script Refinement Refinements of the script, which increased flexibility, autonomy and the quantity of data collected: Command line flags URL Parameters: start, sort Cronjob While loop (illustrated below) While Loop Flowchart The flowchart above illustrates the while loop refinement: Yahoo! Answers is queried and the ‘start’ parameter is incremented until an error message from Yahoo! is received. Results After running the script automatically every two hours for three days, over seven thousand questions were added to the database. Quantitative Analysis Visualization Analysis Conclusions and Next Steps Both types of analysis reveal interesting facts about the data collected. They demonstrate which keywords are most effective in retrieving large quantities of questions from Yahoo! Answers. Furthermore, the more qualitative approach of the Many Eyes visualization shows not only the most common words appearing in the questions, but also the relationship of the word searched for within the text to other words in the text analyzed. The next steps for this research include additional natural language processing and visualizations, like those provided on the Many Eyes web site. Furthermore, this research contributes to the preliminary data collection stage of a larger project being conducted at the School of Information at UC Berkeley. In the scheme of the project in general, the next steps and final goal are to produce a taxonomy of privacy terms. Acknowledgments I would like to thank the team with which I worked to produce the command-line tool discussed in this research, consisting of the following individuals: Christopher Castillo, German Gomez, Rafael Negron, and Anand Sonkar. In addition, I would like to thank my graduate student mentors, Nick Doty, MS and Jen King, and my faculty mentor, Professor Deirdre Mulligan. Finally, I would like to thank Dr. Kristen Gates, TRUST (The Team for Research in Ubiquitous Secure Technology), the NSF and UC Berkeley for the opportunity to conduct this research. Investigating Privacy Complaints Jennifer Felder 1, Jennifer King 2, Nick Doty 2, Prof. Deirdre Mulligan 2 1 North Carolina State University, 2 University of California Berkeley School of Information


Download ppt "This work was supported by the TRUST Center (NSF award number CCF-0424422) Introduction With recent advances in technology comes an increase in the quantity."

Similar presentations


Ads by Google