PROJECT: PRIVACY IN A DEMOGRAPHIC DATABASE PROJECT PLAN Online access to statistical demographic information has many benefits as it allows the study of various social issues. As an example, the US census Bureau ( provides a very impressive example for such information that can be used by researchers and laymen alike. As the underlying individual information is often sensitive, care needs to be taken in order to prevent attackers from using this information to infer individual information. As we saw in the past, attack have been tried in order to achieve this personal information.
US CENSUS BUREAU HACKED BY R00TW0RM AND INJ3CT0R "The official website of the US Census Bureau (census. gov), the government organization that gathers demographic and economic data, was hacked by members of r00tw0rm and inj3ct0r, the hackers obtaining what they call a secret zip file. A Pastebin file reveals a sample of the data they obtained from the agencys servers, including table names and columns...."
MAIN POINT The main point of this project is to analyze how the information they *intentionally* publish creates privacy violations, using statistical and logical inference. A successful attack would give a strong indication that measures taken for preserving the underlying private date were insufficient, and will hopefully contribute towards an improvement in the way such data is handled.
MILESTONE #1: Understanding the specific technique that the CBS uses for their website and collecting as much as we can information about the system and the software they use. We will also start the privacy analysis of the system, in order to find some failure that will help us understanding whether secret information can be deduce. In addition, we will start thinking about what tools/how to build our script.
MILESTONE #2: By the 2nd milestone we should already have impressive examples and some quantitative analysis, and by those examples we will continue writing the script and improve it.
MILESTONE #3: Continue analyzing the data and if needed collecting more information which will help us analyzing it. In this milestone we will try to answer our biggest question: how the information the CBS *intentionally* publish creates privacy violations. we want to know whether it's possible to find the actual people who answered the survey (and hence this is the lack that we are trying to find and ask the CBS to improve the security).
REQUISITE TOOLS, RESOURCES AND KNOWLEDGE We need more knowledge in scripting, learn and understand how to get data from the web, and build a suitable script for our goal. Understanding quantitative notions of privacy, statistical analysis, parsing HTTP queries, and manipulating databases. Our main resource will be the site of CBS The web-accessible database: Israel Central Bureau of Statistics Social Survey Table Generator
DISCUSSION OF RISK FACTORS AND CONTINGENCY PLANS Since the CBS related to our government, we assume that the server/ site have a highly security environment, and may they have a firewall or policy that limit our work. For this reason we assume it will not be easy to success in our mission.