Multi-Layer Network Representation of the NTC Environment Lili Sun, Proof School Arijit Das, Computer Science Introduction The United States Army’s National.

Slides:



Advertisements
Similar presentations
How data is stored. Data can be stored in paper-based systems including: Reference books Dictionaries Encyclopaedias Directories Index Files Filing systems.
Advertisements

Are there organizational characteristics that Public Health Departments share in common? Dominique Smart The Department of Biomedical Informatics Columbia.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Lists Introduction to Computing Science and Programming I.
Computer & Network Forensics
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
CSCD343- Introduction to databases- A. Vaisman1 Relational Algebra.
1 Using Heuristic Search Techniques to Extract Design Abstractions from Source Code The Genetic and Evolutionary Computation Conference (GECCO'02). Brian.
CHAPTER 9 DATABASE MANAGEMENT © Prepared By: Razif Razali.
Spreadsheets and Microsoft Excel. Introduction n A spreadsheet (called a worksheet in Excel) is a two-dimensional array of cells containing data to be.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Get your hands dirty cleaning data European EMu Users Meeting, 3rd June. - Elizabeth Bruton, Museum of the History of Science, Oxford
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
Abstract The Center for Remote Sensing of Ice Sheets (CReSIS) has collected hundreds of terabytes of radar depth sounder data over the Greenland and Antarctic.
Union-find Algorithm Presented by Michael Cassarino.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.
Excel 2007 Part (3) Dr. Susan Al Naqshbandi
Files Tutor: You will need ….
GEO375 Final Project: From Txt to Geocoded Data. Goal My Final project is to automate the process of separating, geocoding and processing 911 data for.
Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search.
Introduction to Database Programming with Python Gary Stewart
From digital to craft: How to make a data matrix with SNS data There are many applications that allow its users to visualize different networks directly.
Networks Are Everywhere
Indexes By Adrienne Watt.
CS 540 Database Management Systems
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
Copyright © Zeph Grunschlag,
Physical Changes That Don’t Change the Logical Design
Database Management System
The Title of Your Poster Should Go Here
Working with Tabs and Tables
Department of Computer Science,
Quarterly Training Meeting
CHAPTER 3 Architectures for Distributed Systems
CSCI-100 Introduction to Computing
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Microsoft Access 2003 Illustrated Complete
SQL – Application Persistence Design Patterns
Lecture 12 Algorithm Analysis
Relational Algebra Chapter 4, Part A
Exam Braindumps
CIS 336 STUDY Lessons in Excellence-- cis336study.com.
Chapter 15 QUERY EXECUTION.
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Geography 375 Introduction to Python May 15, 2015
Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang
Session #, Speaker Name Indexing Chapter 8 11/19/2018.
Relational Operations
Enhance BI Applications and Simplify Development
DataLyzer® Spectrum SPC Wizard.
Transforming Data (Python®)
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Department of Computer Science University of York
Fundamentals of Data Structures
Please thank our sponsors!
Network Search & Visualization
Lecture 12 Algorithm Analysis
Networks Are Everywhere
Lecture 2- Query Processing (continued)
Advance Database Systems
Spreadsheets, Modelling & Databases
Evaluation of Relational Operations: Other Techniques
CENG 351 File Structures and Data Managemnet
Lecture 12 Algorithm Analysis
SQL – Application Persistence Design Patterns
Shelly Cashman: Microsoft Access 2016
Analysis of Structured or Semi-structured Data on a Hadoop Cluster
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

Multi-Layer Network Representation of the NTC Environment Lili Sun, Proof School Arijit Das, Computer Science Introduction The United States Army’s National Training Center (NTC) based in Fort Irwin is a training facility which simulates realistic battlefield environments. With these simulations come a lot of data. This project analyzes the data through network science. A multilayer network is created from the database, which is analyzed using different centrality measures and other techniques to find features such as influential nodes and communities. As data increases, scaling up must occur, as the computing power of a laptop is limited. After cleaning and doing the initial processing of the data, it will be analyzed more in-depth through R programs running on Hadoop clusters, allowing us to analyze and process larger data sets more quickly. Results From preliminary analysis we have gotten the following from appending different layers together. In total we have around 40 layers. For graph visualization we are using Gephi. Approach Extract data from large data set. Use data to create complex multi-layer network. Analyze the different layers using centrality measures, modularity, etc. The data set was given in approximately 40 Excel files. The columns had different attributes such as date of birth, name, town, primary occupation, secondary occupation. The last column had a very long biography of the person. First, all the Excel files were converted to CSV files for easier processing. After this, all the files were merged, and duplicates were deleted. This was all done using Python programs. All columns except the last were simple, each containing a word/phrase or two. The last column in each of the files was a biography which was a huge chunk of text. The hardest part of the data mining was to extract useful information from the biographies. The biographies were not scanned manually as there were approximately 3000 people and each biography was very long so we used a Python program to extract useful data. To extract the data, we used key words and the Python regular expression module. For each person in the data, a dictionary was returned with the key being the attribute from the biography and the value was typically a True or False value. After extracting the data, a new CSV file was created. Also, a separate CSV file was created that was essentially an edge list for direct connections between people, rather than shared interests. After these CSV files were made, different layers of the network were generated and then appended. Layers of the network for attribute are generated with a dictionary as well. Looping through the rows and columns of the CSV, we add an edge if they share an attribute. Since not all the people in the data have ID numbers, they are just given numbers from 1 to the however many people there are to number the nodes. Above is a graph of the size distribution of modularity classes of one of the layers generated from the final CSV file. Above is part of one of the layers generated from the final CSV file. As you can see, it is the union of a bunch of complete graphs. This is because this is just one attribute, and if people share it, they’ll all be connected to each other. So, for all the different possible values, we get complete graphs. Future Research Plans Analysis of graphs is still ongoing, and will be continued through examining different sets of layers together, as well as the layer of direct connections. Acknowledgements Thank you to Mr. Das, Dr. Gera, and LTC Roginski for their help and guidance. This function update_dictionary(row, i) is part of the program that deletes duplicates from the data set. This is done using a dictionary, where the key is essentially the last 15 columns concatenated. The value is a tuple that is the row and the number. To update the dictionary, the program compares how much data each duplicate has, and replaces the dictionary value with the row with more data. Lili Sun lili.sun@gmail.com SEAP Science Engineering and Apprenticeship Program at the Naval Postgraduate School