The Science of Complex Networks and the Internet: Lies, Damned Lies, and Statistics

Dr. Walter Willinger
AT&T Labs - Research

25 hours, 6 credits

September 7 - September 11, 2009

Area della Ricerca di Pisa, National Research Council, via G. Moruzzi 1, Pisa, room 7, entrance 3

Contacts: Prof. Stefano Giordano

This activity is part of the Pisa International School on the Next Generation Internet

   

Abstract

A prime application area for the Science of Complex Networks (also known as Network science) in the past years has been the Internet and the various types of connectivity structures that result from its designed nature (e.g., router-level topology, autonomous system-level topology, the Web graph, Peer-to-Peer networks, Online Social Networks). Unfortunately, the Internet has also emerged as a textbook example for illustrating how and why Network science has become a classic lesson in how errors of various forms occur and can add up to produce results and claims that create excitement among non-experts but quickly collapse when scrutinized or examined by domain experts. While these opposite reactions have naturally become a source of great confusion, the main conclusion is neither controversial nor should it come as a big surprise: in its present form, Network science is largely incapable of dealing with highly engineered or designed systems (e.g., the Internet or other technological networks) in a way that advances our understanding of these systems. In fact, the Internet example demonstrates the dire need to develop an intellectually stronger Network science that can pass the more demanding and scientifically more challenging validation criteria required by a more engineering-oriented and less physics-inspired application domain.

By carefully tracing and documenting the main sources of errors regarding the application of the current Network science approach to the Internet, we find that many of the most popular complex network concepts are severely lacking in rigor. The main problems include (i) a dismal attitude towards data hygiene, (ii) a largely ignored mismatch between the rigor of statistical data analysis and the quality of the available data, and (iii) an outdated and completely inadequate approach to modeling and model validation. Fortunately, the Internet application also suggests an alternative approach that highlights the sort of paradigm shifts needed in our quest for an intellectually stronger, mathematically more solid, and scientifically more rigorous Science of COmplex Networks.

Syllabus

  • The Internet as a highly engineered system
  • Internet Measurements: Know your Data!
  • Analysis of Internet Data: Know your Statistics!
  • Internet Modeling: From an Exercise in Data-fitting towards an Exercise in Reverse-Engineering
  • New Challenges in Internet Modeling and Model Validation