Internet-bigdata processing techniques / Amogh Dhamdere / kc claffy
We propose a project that targets three ongoing challenges in Internet research. First, many Internet measurement projects produce large volumes of time series data which they need to efficiently analyze, correlate, and visualize via public facing query interfaces. Second, applications in network measurement, security, and network operations involve mining text-based data consisting of billions of record to track changes or detect anomalies. But the tools used to process text-based measurement data are fairly rudimentary, and do not take advantage of the advances in the field of document search and indexing. Finally, an important class of Internet measurement data consists of dynamic, multi-resolution graphs with annotations on nodes, links and subgraphs along various dimensions (geography, performance, economics). The Internet measurement community lacks the tools to visualize these datasets. In this project we will build data processing and analysis systems that combine heterogeneous data from diverse datasets to enable agile analytics on millions of time series data in streaming and batch modes. Additionally we plan to apply recent advances in text search to mine insights from Border Gateway Protocol (BGP), Domain Name System (DNS), and other measurement datasets that can be represented as text documents. Finally, we will develop novel techniques to visualize large, multi-resolution, annotated, evolving graph structures, allowing many users to interactively browse the data in an intuitive manner. By integrating these different aspects we will build a novel set of tools for data ingestion, storage, and analytics, that are designed to leverage HPC resources, accelerators, and emerging memory technologies