Session 7 – Data aggregation and visualisation
TUV + external speakers Agenda Data aggregation and visualisation TUV + external speakers 09:00 – 10:30 Presentation of data aggregation approach and visualisation approach TUV Practice report on challenges regarding “Quality of Experience” data Mr. Manner, Netradar (Finland) Mrs. Teixeira, INRIA (France) Mr. Rood, Stratix (Netherlands) Survey and discussion, inter alia: Aggregation an visualisation rules Representativeness of data All
Step 1: The process of data aggregation and visualisation Selection and processing of raw data Step 2: Aggregation Step 3: Conversion to data model Step 4: Data visualisation
Step 1: Selection and processing of raw data What is raw data and which choices have to be made? What is raw data Choices to be made Best case: Information on QoS at a specific location / geo-coordinate Other cases: Information on QoS linked to a geographical area What should be the purpose / expressiveness of my aggregated data? Which information do I want to share? Which single QoS data sets should be used to build up the aggregated data sets? NOTE: IP-addresses for single measurements are „personal data“ and cannot be collected within this project. Data transformation into geo-coordinates or aggregation to address or grid level before data provision !
Step 1: Selection and processing of raw data Selection / elimination of single data based on Location accuracy Plausability of values Multiple measurements from same user in same area - 10 Mbit/s Description in meta data on what has been done
Step 1: Selection and processing of raw data Discussion on who should carry this out? Option 1: The initiative as owner of data Option 2: The contractor Requirement: close cooperation with data suppliers / giving instructions on selection / elimination choices + - + - Every data supplier carries out individual measures numerous approaches on processing raw data Knows context (focus and intention) of collected data best Data supplier has sovereignty of interpretation Contractor does not know the context of collected data Risk of misinterpretation and data privacy concerns More homogenous approach on process raw data as only carried out by contractor
Step 2: Aggregation What is aggregated data and which choices have to be made? What is aggregated data? Choices to be made Combination of all values of QoS data sets that derive from the same region and have the same content to an aggregated value (e.g. min, max etc.) Which aggregated values should be provided? / Which quality criteria have to be fulfilled to provide a value for a region? Which significance should the values have?
Description in meta data on what has been done Step 2: Aggregation Definition of aggregation rules and information on which values will be supplied for which regions: Sample sizes Spatial distribution of samples Type of additional information needs be displayed in order to explain values Rural Urban 8 samples 1,300 samples Description in meta data on what has been done
Option 1: The initiative as owner of data Option 2: The contractor Step 2: Aggregation Discussion on who should carry this out? Option 1: The initiative as owner of data Option 2: The contractor Requirement: close cooperation with data suppliers / giving instructions on selection / elimination choices Every data supplier carries out individual measures numerous different aggregation rules - Data supplier has sovereignty of interpretation; knows the intended significance of the values, is able to assess which values are crucial + Contractor does not know the context of collected data Risk of misinterpretation - Homogeneous approach to aggregation rules as only carried out by contractor +
Step 3: Conversion to data model Possible issues that could complicate transfer of data into the data model? Data provided on spatial resolution level which is not compliant to templates Divergent value categories
Step 3: Conversion to data model – Possible issues Data provided on spatial resolution level which is not compliant to templates By means of statistical data (e.g. population or number of households) aggregation can be carried out by contractor for data which is supplied in non-grid format If this statistical data is missing aggregation cannot be carried out by contractor Missing statistical data Data is provided for postal codes without geo-reference codes to NUTS levels Missing geo-data
Step 3: Conversion to data model Divergent value categories Different technologies clustered into technology group Different speeds clustered into speed group Statistical value differences Max Min Average etc. Median >35Mbit/s >50Mbit/s 16-24Mbit/s etc. <1Gbit/s etc. NGA Overall fixed Mobile Wired Wireless
Step 4: Data visualisation General approach Data will be visualised strictly according to data suppliers‘ intention: Without modification, or only modified in coordination with data supplier According to agreement in Memorandum of Understanding Publication on public version Publication on expert version Publication via data feed Specifications given in meta data will be displayed
Step 4: Data visualisation Visualisation of data sets from different initiatives Same value combination / different areas Same value combination / same areas X = X X = X ≠ = If collection approaches are similar, data from different data suppliers (can be visualised in one layer („national initiatives“) Data suppliers decide This decision can be indicated in data model „Solitary visualisation“ or „Simultaneous visualisation“ As collection approaches are very heterogeneous, it is not possible to visualise several initiatives on one layer Challenge to ensure that each data set is represented according to data suppliers‘ intention