Download presentation
Presentation is loading. Please wait.
Published byFay Sheena Miller Modified over 6 years ago
1
Multi-Layer Network Representation of the NTC Environment Lili Sun, Proof School Arijit Das, Computer Science Introduction The United States Army’s National Training Center (NTC) based in Fort Irwin is a training facility which simulates realistic battlefield environments. With these simulations come a lot of data. This project analyzes the data through network science. A multilayer network is created from the database, which is analyzed using different centrality measures and other techniques to find features such as influential nodes and communities. As data increases, scaling up must occur, as the computing power of a laptop is limited. After cleaning and doing the initial processing of the data, it will be analyzed more in-depth through R programs running on Hadoop clusters, allowing us to analyze and process larger data sets more quickly. Results From preliminary analysis we have gotten the following from appending different layers together. In total we have around 40 layers. For graph visualization we are using Gephi. Approach Extract data from large data set. Use data to create complex multi-layer network. Analyze the different layers using centrality measures, modularity, etc. The data set was given in approximately 40 Excel files. The columns had different attributes such as date of birth, name, town, primary occupation, secondary occupation. The last column had a very long biography of the person. First, all the Excel files were converted to CSV files for easier processing. After this, all the files were merged, and duplicates were deleted. This was all done using Python programs. All columns except the last were simple, each containing a word/phrase or two. The last column in each of the files was a biography which was a huge chunk of text. The hardest part of the data mining was to extract useful information from the biographies. The biographies were not scanned manually as there were approximately 3000 people and each biography was very long so we used a Python program to extract useful data. To extract the data, we used key words and the Python regular expression module. For each person in the data, a dictionary was returned with the key being the attribute from the biography and the value was typically a True or False value. After extracting the data, a new CSV file was created. Also, a separate CSV file was created that was essentially an edge list for direct connections between people, rather than shared interests. After these CSV files were made, different layers of the network were generated and then appended. Layers of the network for attribute are generated with a dictionary as well. Looping through the rows and columns of the CSV, we add an edge if they share an attribute. Since not all the people in the data have ID numbers, they are just given numbers from 1 to the however many people there are to number the nodes. Above is a graph of the size distribution of modularity classes of one of the layers generated from the final CSV file. Above is part of one of the layers generated from the final CSV file. As you can see, it is the union of a bunch of complete graphs. This is because this is just one attribute, and if people share it, they’ll all be connected to each other. So, for all the different possible values, we get complete graphs. Future Research Plans Analysis of graphs is still ongoing, and will be continued through examining different sets of layers together, as well as the layer of direct connections. Acknowledgements Thank you to Mr. Das, Dr. Gera, and LTC Roginski for their help and guidance. This function update_dictionary(row, i) is part of the program that deletes duplicates from the data set. This is done using a dictionary, where the key is essentially the last 15 columns concatenated. The value is a tuple that is the row and the number. To update the dictionary, the program compares how much data each duplicate has, and replaces the dictionary value with the row with more data. Lili Sun SEAP Science Engineering and Apprenticeship Program at the Naval Postgraduate School
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.