Network structure from rich but noisy data

Newman, M. E. J.

doi:10.1038/s41567-018-0076-1

Letter
Published: 12 March 2018

Network structure from rich but noisy data

M. E. J. Newman ORCID: orcid.org/0000-0002-0907-1660¹

Nature Physics volume 14, pages 542–545 (2018)Cite this article

14k Accesses
129 Citations
83 Altmetric
Metrics details

Subjects

Abstract

Driven by growing interest across the sciences, a large number of empirical studies have been conducted in recent years of the structure of networks ranging from the Internet and the World Wide Web to biological networks and social networks. The data produced by these experiments are often rich and multimodal, yet at the same time they may contain substantial measurement error^{1,2,3,4,5,6,7}. Accurate analysis and understanding of networked systems requires a way of estimating the true structure of networks from such rich but noisy data^{8,9,10,11,12,13,14,15}. Here we describe a technique that allows us to make optimal estimates of network structure from complex data in arbitrary formats, including cases where there may be measurements of many different types, repeated observations, contradictory observations, annotations or metadata, or missing data. We give example applications to two different social networks, one derived from face-to-face interactions and one from self-reported friendships.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Application of the methods described here to two example networks.**

Statistical inference links data and theory in network science

Article Open access 10 November 2022

Network analysis of multivariate data in psychological science

Article 19 August 2021

Variability in higher order structure of noise added to weighted networks

Article Open access 02 November 2021

References

Killworth, P. D. & Bernard, H. R. Informant accuracy in social network data. Hum. Organ. 35, 269–286 (1976).
Article Google Scholar
Marsden, P. V. Network data and measurement. Annu. Rev. Sociol. 16, 435–463 (1990).
Article Google Scholar
Lakhina, A., Byers, J., Crovella, M. & Xie, P. Sampling biases in IP topology measurements. In Proc. 22nd Annual Joint Conf. of the IEEE Computer and Communications Societies (Institute of Electrical and Electronics Engineers, New York, NY, 2003).
Clauset, A. & Moore, C. Accuracy and scaling phenomena in Internet mapping. Phys. Rev. Lett. 94, 018701 (2005).
Article ADS Google Scholar
Wodak, S. J., Pu, S., Vlasblom, J. & Séraphin, B. Challenges and rewards of interaction proteomics. Mol. Cell. Proteom. 8, 3–18 (2009).
Article Google Scholar
Handcock, M. S. & Gile, K. J. Modeling social networks from sampled data. Ann. Appl. Stat. 4, 5–25 (2010).
Article MathSciNet Google Scholar
Lusher, D., Koskinen, J. & Robins, G. Exponential Random Graph Models for Social Networks: Theory, Methods, and Applications (Cambridge Univ. Press, Cambridge, 2012).
Butts, C. T. Network inference, error, and informant (in)accuracy: A Bayesian approach. Soc. Netw. 25, 103–140 (2003).
Article Google Scholar
Clauset, A., Moore, C. & Newman, M. E. J. Hierarchical structure and the prediction of missing links in networks. Nature 453, 98–101 (2008).
Article ADS Google Scholar
Guimerà, R. & Sales-Pardo, M. Missing and spurious interactions and the reconstruction of complex networks. Proc. Natl Acad. Sci. USA 106, 22073–22078 (2009).
Article ADS Google Scholar
Namata, G. M., Kok, S. & Getoor, L. Collective graph identification. In Proc. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Association of Computing Machinery, New York, 2011).
Allen, J. D., Xie, Y., Chen, M., Girard, L. & Xiao, G. Comparing statistical methods for constructing large scale gene networks. PLoS One 7, e29348 (2012).
Article ADS Google Scholar
Han, X., Shen, Z., Wang, W.-X. & Di, Z. Robust reconstruction of complex networks from sparse data. Phys. Rev. Lett. 114, 028701 (2015).
Article ADS Google Scholar
Martin, T., Ball, B. & Newman, M. E. J. Structural inference for uncertain networks. Phys. Rev. E 93, 012306 (2016).
Article ADS Google Scholar
Casiraghi, G., Nanumyan, V., Scholtes, I. & Schweitzer, F. From relational data to graphs: Inferring significant links using generalized hypergeometric ensembles. In Proc. International Conf. on Social Informatics (SocInfo 2017), no. 10540 in Lecture Notes in Computer Science (eds Ciampaglia, G. et al.) 111–120 (Springer, Berlin, 2017).
Uetz, P. et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000).
Article ADS Google Scholar
Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA 98, 4569–4574 (2001).
Article ADS Google Scholar
Giot, L., Bader, J. S. & Brouwer, C. et al. A protein interaction map of Drosophila melanogaster. Science 302, 1727–1736 (2003).
Article ADS Google Scholar
Krogan, N. J. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637–643 (2006).
Article ADS Google Scholar
Rapoport, A. & Horvath, W. J. A study of a large sociogram. Behav. Sci. 6, 279–291 (1961).
Article Google Scholar
Resnick, M. D. et al. Protecting adolescents from harm: Findings from the National Longitudinal Study on Adolescent Health. J. Am. Med. Assoc. 278, 823–832 (1997).
Article Google Scholar
Bernard, H. R. & Killworth, P. D. Informant accuracy in social network data II. Human. Commun. Res. 4, 3–18 (1977).
Article Google Scholar
Liu, Y., Liu, N. J. & Zhao, H. Y. Inferring protein–protein interactions through high-throughput interaction data from diverse organisms. Bioinformatics 21, 3279–3285 (2005).
Article Google Scholar
Angulo, M. T., Moreno, J. A., Lippner, G., Barabási, A.-L. & Liu, Y.-Y. Fundamental limitations of network reconstruction from temporal data. J. Royal Soc. Interface 14, 20160966 (2017).
Article Google Scholar
Overbeek, R. et al. Wit: Integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res. 28, 123–125 (2000).
Article Google Scholar
Forster, J., Famili, I., Fu, P., Palsson, B. O. & Nielsen, J. Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Res. 13, 244–253 (2003).
Article Google Scholar
Schafer, J. & Strimmer, K. An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 21, 754–764 (2005).
Article Google Scholar
Margolin, A. A. et al. ARACNE: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7, S7 (2006).
Article Google Scholar
Langfelder, P. & Horvath, S. Wgcna: An R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).
Article Google Scholar
Liben-Nowell, D. & Kleinberg, J. The link-prediction problem for social networks. J. Assoc. Inf. Sci. Technol. 58, 1019–1031 (2007).
Article Google Scholar
Huisman, M. Imputation of missing network data: Some simple procedures. J. Social Struct. 10, 1–29 (2009).
Google Scholar
Kim, M. & Leskovec, J. The network completion problem: Inferring missing nodes and edges in networks. In Proc. 2011 SIAM International Conf. on Data Mining (eds Liu, B. et al.) 47–58 (Society for Industrial and Applied Mathematics: Philadelphia, PA, 2011).
Smalheiser, N. R. & Torvik, V. I. Author name disambiguation. Annu. Rev. Inf. Sci. Technol. 43, 287–313 (2009).
Article Google Scholar
D’Angelo, C. A., Giuffrida, C. & Abramo, G. A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments. J. Assoc. Inf. Sci. Technol. 62, 257–269 (2011).
Article Google Scholar
Ferreira, A. A., Goncalves, M. A. & Laender, A. H. F. A brief survey of automatic methods for author name disambiguation. SIGMOD Rec. 41, 15–26 (2012).
Article Google Scholar
Tang, J., Fong, A. C. M., Wang, B. & Zhang, J. A unified probabilistic framework for name disambiguation in digital library. IEEE Trans. Knowl. Data Eng. 24, 975–987 (2012).
Article Google Scholar
Brugere, I., Gallagher, B. & Berger-Wolf, T. Y. Network structure inference, a survey: Motivations, methods, and applications. ACM Comput. Surv. 1, 1 (2016).
Google Scholar
Dempster, A. P., Laird, N. M. & Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Stat. Soc. B 39, 185–197 (1977).
MathSciNet MATH Google Scholar
Eagle, N. & Pentland, A. Reality mining: Sensing complex social systems. J. Personal Ubiquitous Comput. 10, 255–268 (2006).
Article Google Scholar

Download references

Acknowledgements

The author thanks E. Bruch, G. Cantwell, T. Martin, G. Reinert and M. Riolofor useful comments. This work was funded in part by the US National Science Foundation under grants DMS–1407207 and DMS–1710848. This work uses data from Add Health, a programme project designed by J. R. Udry, P. S. Bearman and K. Mullan Harris, and funded by a grant P01–HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, with cooperative funding from 23 other federal agencies and foundations. A special acknowledgment is due to R. R. Rindfuss and B. Entwisle for assistance in the original design. Anyone interested in obtaining data files from Add Health should contact Add Health, Carolina Population Center, 123 W. Franklin Street, Chapel Hill, NC 27516-2524 (addhealth@unc.edu). No direct support was received from grant P01-HD31921 for this analysis.

Author information

Authors and Affiliations

Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI, USA
M. E. J. Newman

Authors

M. E. J. Newman
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.E.J.N. designed and conducted the research and wrote the paper.

Corresponding author

Correspondence to M. E. J. Newman.

Ethics declarations

Competing interests

The author declares no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Material

Supplementary notes, supplementary figures 1–3

Rights and permissions

Reprints and permissions

About this article

Cite this article

Newman, M.E.J. Network structure from rich but noisy data. Nature Phys 14, 542–545 (2018). https://doi.org/10.1038/s41567-018-0076-1

Download citation

Received: 12 September 2017
Accepted: 02 February 2018
Published: 12 March 2018
Issue Date: June 2018
DOI: https://doi.org/10.1038/s41567-018-0076-1

This article is cited by

Understanding the complexities of Bluetooth for representing real-life social networks
- Bojan Simoski
- Michel C.A. Klein
- Henri Bal
Personal and Ubiquitous Computing (2024)
Link prediction using deep autoencoder-like non-negative matrix factorization with L21-norm
- Tongfeng Li
- Ruisheng Zhang
- Jun Ma
Applied Intelligence (2024)
Compressing network populations with modal networks reveal structural diversity
- Alec Kirkley
- Alexis Rojas
- Jean-Gabriel Young
Communications Physics (2023)
Reconstructing signed relations from interaction data
- Georges Andres
- Giona Casiraghi
- Frank Schweitzer
Scientific Reports (2023)
Hypergraph reconstruction from uncertain pairwise observations
- Simon Lizotte
- Jean-Gabriel Young
- Antoine Allard
Scientific Reports (2023)