Data Models and Languages

  • Codd, E. F., A Relational Model of Data for Large Shared Data Banks, Communications of the ACM, 1970

  • J. Ullman. Database and Knowledge Base Systems, vol. I. Chapter 3 (Logic as a Data Model).

  • Stonebraker, M., Inclusion of New Types In Relational Data Base Systems, Proceedings of the International Conference on Data Engineering, 1986.

  • Stonebraker M., and Hellerstein J., What Goes Around Comes Around.

Database Theory

  • S. Abiteboul, R. Hull, V. Vianu. Foundations of Databases. Available for free from: http: //

    • Conjunctive Queries (Chapters 3,4)

    • Chapter 6, Sections 6.2 and 6.4

    • Datalog (Chapter 12, Sections 12.1 - 12.3, Chapter 13, Section 13.1 - 13.3)

  • T. J. Green, S. Huang, B. T. Loo, W. Zhou, Datalog and Recursive Query Processing, Foundations and Trends in Databases, Vol. 5, 2012.

  • H. Ngo, C. Re, A. Rudra, Skew Strikes Back: New Developments in the Theory of Join Algorithms, SIGMOD Record, 2013.

  • Cheney, Chiticariu, Tan, Provenance in Databases: Why, How and Where, Foundations and Trends in Databases, 2009.

  • C. Dwork, A Firm Foundation for Private Data Analysis, Communications of the ACM, 2011.

DBMS Architecture

  • Chamberlin, D. D., Astrahan, M. M., Blasgen, M. W., Gray, J. N., King, W. F., Lindsay, B. G., Lorie, R., Mehl, J. W., Price, T. G., Putzolu, F., Selinger, P. G., Schkolnick, M., Slutz, D. R., Traiger, I. L., Wade, B. W. and Yost, R. A. A History and Evaluation of System R, Communications of the ACM 24(10), 1981.

  • Stonebraker, M., Wong E., Kreps P., and Held G., The Design and Implementation of INGRES, ACM Transactions on Database Systems 1(3), 1976.

  • Hellerstein J. M., Stonebraker M. and Hamilton, J. R., Architecture of a Database System, Foundations and Trends in Databases 1(2), 2007.

Operating System Issues

  • Chou, H., and DeWitt, D., An Evaluation of Buffer Management Strategies for Relational Database Systems, Proceedings of the International Conference on Very Large Data Bases (VLDB), 1985.

  • O’Neil, E. J., O’Neil, P. E., Weikum G., The LRU-K Page Replacement Algorithm For Database Disk Buffering, Proceedings of the ACM-SIGMOD International Conference on Management of Data, 1993.

  • Stonebraker, M., Operating System Support for Database Management, Communications of the ACM 24(7), 1981.

File Organizations and Access Methods

  • Comer, D., The Ubiquitous B-Tree, ACM Computing Surveys 11(2), June 1979.

  • Guttman, A., R-Trees: A Dynamic Index Structure for Spatial Searching, SIGMOD Conference, 1984.

  • Patrick E. O’Neil, Dallan Quass: Improved Query Performance with Variant Indexes. SIGMOD Conference, 1997: 38-49.

Query Complexity

  • Optimal implementation of conjunctive queries in relational databases, Chandra, Merlin, STOC 1977

  • The Complexity of Relational Query Languages, Vardi, STOC 1982

  • Algorithms for acyclic database schemes, Yannakakis, VLDB 1981.

  • Size bounds and query plans for relational joins, Atserias, Grohe, Marx, FOCS 2008

  • Hypertree Decompositions and Tractable Queries, Gottlob, Leone, Scarcello, JCSS 2002

  • Leapfrog Triejoin: a worst-case optimal join algorithm, Veldhuizen, ICDT 2014

  • Skew Strikes Back: New Developments in the Theory of Join Algorithms, Ngo, Re, Rudra, SIGMOD RECORD 2013

Query Processing and Optimization

  • Shapiro, L. D., Join Processing in Database Systems with Large Main Memories, ACM Transactions on Database Systems 11(3), 1986.

  • Selinger, P., et al., Access Path Selection in a Relational Database Management System, Proceedings of the ACM-SIGMOD International Conference on Management of Data, 1979.

  • Surajit Chaudhuri: An Overview of Query Optimization in Relational Systems. PODS 1998: 34-43

  • Goetz Graefe: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2): 73-170 (1993)

Concurrency Control and Recovery

  • Bernstein, P.A., Hadzilacos, V., and Goodman, N., Concurrency Control and Recovery in Database Systems, Addison-Wesley, 1987; can be freely downloaded from Bernstein’s webpage. (Chapters 1 and 2)

  • Gray, J., Lorie, R. A., Pulzolu, G. R., Traiger, I. L., Granularity of Locks and Degrees of Consistency in a Shared Data Base, Proceedings of the IFIP Working Conference on Modeling of Data Base Management Systems, 1979.

  • Kung, H., and Robinson, J., On Optimistic Methods for Concurrency Control, ACM Transactions on Database Systems 6(2), June 1981.

  • Berenson, H., Bernstein, P. A., Gray, J., Melton, J., O’Neil, E. J., O’Neil, P. E., A Critique of ANSI SQL Isolation Levels, Proceedings of the ACM-SIGMOD International Conference on Management of Data, 1995.

  • Lehman, P. and Yao, S., Efficient Locking for Concurrent Operations on B-Trees, ACM Transactions on Database Systems, 6(4): 650-670.

  • Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., Schwarz, P., ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging, ACM Transactions on Database Systems, 17(1): 94-162.

Distributed and Parallel Data Processing

  • MapReduce: simplified data processing on large clusters, Dean, Ghemawat, OSDI 2004

  • MapReduce and parallel DBMSs: friends or foes?, Stonebraker et al., CACM 2010

  • Optimizing Joins in a Map-Reduce Environment, Afrati, Ullman, EDBT 2010

  • A Guide to Formal Analysis of Join Processing in Massively Parallel Systems, Koutris, Suciu, SIGMOD Record 2016

  • Streaming

    • Models and issues in data stream systems, Babcock, Babu, Datar, Motwani, Widom, PODS 2002

    • The space complexity of approximating the frequency moments, Alon, Matias, Szegedy, STOC 1996

Data Extraction and Integration

  • Uncertain Data

    • Probabilistic Databases, Suciu, Olteanu, Re, Koch

    • Probabilistic Databases: Diamonds in the Dirt, Dalvi, Re, Suciu, CACM 2008

    • The dichotomy of probabilistic inference for unions of conjunctive queries, Dalvi, Suciu, JACM 2012

    • Consistent Query Answering: Five Easy Pieces, Chomicki, ICDT 2007

    • Consistent Query Answers in Inconsistent Databases, Arenas, Bertossi, Chomicki, PODS 1999

  • S. Sarawagi, Information Extraction, Foundations and Trends in Databases. Vol. 1, No. 3 (2007): read only Chapters 1-3.

  • Levy, Alon, Logic-based Techniques in Data Integration, available on the Web at http: // read up to and including Section 5.1.

  • Doan, A., Halevy, A., Ives, Z., Principles of Data Integration, Chapter 1, Chapter 5, Chapter 4: read 4.1, 4.2.1 (only Edit Distance), 4.2.2 (only Overlap, Jaccard, and TF/IDF), 4.2.4, and 4.3 (only Inverted Index and Size Filtering). Chapter 7: up to and including 7.5.3. Chapter 9: read 9.1, 9.2, and 9.3.1. Chapters available from courses/dibook-chapters.

Data Analysis and Decision Support

  • Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart D., Venkatrao, M., Pellow, F., and Pirahesh, H., Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub- Totals. In Data Mining and Knowledge Discovery, 1(1): 29?53.

  • Agrawal, R., and R. Srikant, Fast Algorithms for Mining Association Rules. In Proceedings of the 20th International Conference on Very Large Data Bases, 487?499.

  • Zhang, T., Ramakrishnan R., and Livny M., BIRCH: A Clustering Algorithm for Large Multidimensional Datasets, Proceedings of the ACM SIGMOD International Conference on Management of Data, 1996

  • Graham Cormode, Minos Garofalakis, Peter J. Haas and Chris Jermaine, Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches, Foundations and Trends in Databases, Vol. 4, 2011.

DBMS and Search Engines

  • Brin, S. and Page, L., The Anatomy of a Large-Scale Hypertextual Web Search Engine, Proceedings of Computer Networks and ISDN Systems, 1998.

  • Singhal, A., Modern Information Retrieval: a Brief Overview. IEEE Data Engineering Bulletin, 24(4), 35- 43, 2001.

  • Page, L. and Brin, S. and Motwani, R. and Winograd, T., The Pagerank Citation Ranking: Bringing Order to the Web, Technical Report, 1999.

Emerging Topics

Last updated