Data Management

Motivation

Single search query (~700ms): 8 processor cores
- Millions of cores
- Cost in management and energy cost
CPU performance
- 32% doing useful work
- Other: fetching, decoding, bad speculation, execution units unavailable, memory stalls (poor data locality)

Existing techniques

Compiler Optimizations
- Improve data locality via program transformation
- Con: rely on static heuristic, can sometimes hurt performance
Dynamic profilers
- Accurate execution information, identify and resolve poor locality
- Con: mostly manual repair, high profiling overhead

Contribution

Evaluation

Data locality problem that is not solved by optimization?

Workload / problems:

Method key:

Lossless log compression: better than general-purpose compressors
Can search compressed logs
- Without decompression
  - Only decompress matching results
- With good performance

Method:

Domain-specific algorithm to extract the overlapping parts
- De-duplicate
- Log Type Dictionary
- Variable Dictionary
- Encoded Message

Comparison:

How to change the structure? And prevent them from blowing down?

Compress logs to multiple archives
- Different set of archives
Do the search? Affecting the performance?
- Time is neglectable compared to the time to read
Accurate schema?
- Yes
- Based on the variable, tend to be quite general (generic schema)
  - Aren't the best? Search performance

Motivation:

CC: control how concurrent executions interleave
- Maximizing interleaving --> better performance
Currently, no single CC algorithm is the best depending on the workload (high-, low- contention)

Current solutions:

Coarse-grained approach to combine few known CC algorithms
Con
- Cumbersome: requires manual partition of the workload
- Suboptimal: uses a single CC within each partition

Approach:

Workload:

Requirement:

Performance
- Minimizes performance degradation
- Millions of transactions per second
Freshness
- Delay..

Data analysis: TP + ETL + AP

Idea: VEGITO

High availability for fast in-memory OLTP
- Replication-based
- Synchronous log shipping
Challenge
- Log cleaning
  - Need to be consistent and log-parallel
- Epoch-based design
  - Gossip-style
    Guarantee epoch assignment is consistent
- MVCS
  - Multi-version column store (MVCS)
  - Different locality for read & write
    column-wise v.s row-wise
  - Use block-based MVCS
    Row-split
    Col-merge
- Tree-based index
  - Write-optimized tree index
  - Read-optimized index
  - Tree insert = in-place insert + balance (costly)
    Balance in batch

Last updated 4 years ago

Was this helpful?