# Data Management

[DMon: Efficient Detection and Correction of Data Locality Problems Using Selective Profiling](https://www.usenix.org/conference/osdi21/presentation/khan)&#x20;

[CLP: Efficient and Scalable Search on Compressed Text Logs](https://www.usenix.org/conference/osdi21/presentation/rodrigues)&#x20;

[Polyjuice: High-Performance Transactions via Learned Concurrency Control](https://www.usenix.org/conference/osdi21/presentation/wang-jiachen)&#x20;

[Retrofitting High Availability Mechanism to Tame Hybrid Transaction/Analytical Processing](https://www.usenix.org/conference/osdi21/presentation/shen)&#x20;

### DMon&#x20;

Motivation&#x20;

* Single search query (\~700ms): 8 processor cores&#x20;
  * Millions of cores&#x20;
  * Cost in management and energy cost&#x20;
* CPU performance&#x20;
  * 32% doing useful work&#x20;
  * Other: fetching, decoding, bad speculation, execution units unavailable, memory stalls (poor data locality)&#x20;

Existing techniques

* Compiler Optimizations&#x20;
  * Improve data locality via program transformation&#x20;
  * Con: rely on static heuristic, can sometimes hurt performance&#x20;
* Dynamic profilers&#x20;
  * Accurate execution information, identify and resolve poor locality&#x20;
  * Con: mostly manual repair, high profiling overhead&#x20;

Contribution&#x20;

* Selective profiling&#x20;
* Apply specific compiler optimizations based on profiling results

&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MeaBWv6pVgHu48_qeSs%2F-MeaTnEfYbpaoWon_K01%2Fimage.png?alt=media\&token=fa3a09d9-5dea-4659-a596-ee0b4551b15c)

Evaluation

* Efficiency&#x20;
* Effectiveness&#x20;
* Real-world case studies&#x20;

Data locality problem that is not solved by optimization?&#x20;

* Tail latency problem&#x20;
  * Can't identify and repair all&#x20;

### CLP

Workload / problems:&#x20;

* Logs --> search tools --> compress (i.e. gzip)&#x20;
  * Problem&#x20;
    * Perabytes of logs&#x20;
    * Consume a lot of resources for uncompressed logs&#x20;
  * Current solution&#x20;
    * Build indexes: adds storage overhead&#x20;
      * Total usage is the same magnitude&#x20;
    * Can only retain indexed logs for a few weeks&#x20;
    * Compress (2.27% of the original size)&#x20;
      * Unsearchable once compressed&#x20;
      * Requires decompression&#x20;

Method key:&#x20;

* Lossless log compression: better than general-purpose compressors&#x20;
* Can search compressed logs&#x20;
  * Without decompression&#x20;
    * Only decompress matching results&#x20;
  * With good performance&#x20;

Method:

* Domain-specific algorithm to extract the overlapping parts&#x20;
  * De-duplicate&#x20;
  * Log Type Dictionary&#x20;
  * Variable Dictionary&#x20;
  * Encoded Message&#x20;

Comparison:

* Search: ripgrep, ElasticSearch&#x20;
* Archieve tools: lzma, ...&#x20;

How to change the structure? And prevent them from blowing down?

* Compress logs to multiple archives&#x20;
  * Different set of archives&#x20;
* Do the search? Affecting the performance?&#x20;
  * Time is neglectable compared to the time to read&#x20;
* Accurate schema?&#x20;
  * Yes&#x20;
  * Based on the variable, tend to be quite general (generic schema)&#x20;
    * Aren't the best? Search performance&#x20;

### Polyjuice&#x20;

Motivation:&#x20;

* CC: control how concurrent executions interleave&#x20;
  * Maximizing interleaving --> better performance&#x20;
* Currently, no single CC algorithm is the best depending on the workload (high-, low- contention)

Current solutions:

* Coarse-grained approach to combine few known CC algorithms&#x20;
* Con&#x20;
  * Cumbersome: requires manual partition of the workload&#x20;
  * Suboptimal: uses a single CC within each partition&#x20;

Approach:

* RL
  * State&#x20;
    * Differentiate state that require different CC actions&#x20;
    * Two part&#x20;
      * Type of transaction&#x20;
      * Data accessed&#x20;
  * Action
    * Should be able to encode most existing CC algorithms&#x20;
    * Exert control on the interleaving&#x20;
    * Solution&#x20;
      * Whether to wait, and how long?
        * OCC: no wait&#x20;
        * 2PL: wait until dependent transactions commit&#x20;
        * IC3, RP, DRP: wait until ... finish execution up to some point&#x20;
      * Which versions of data to read?
        * Latest committed version&#x20;
        * Latest published dirty version&#x20;
      * Whether to make a dirty write visible?
        * Keep itself in the private buffer&#x20;
        * Publish the dirty write&#x20;
      * Whether to validate now, prior to commit?&#x20;
        * Whether or not to validate after this access&#x20;
        * Based on Silo's protocol&#x20;

![](https://2097630930-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MVORxAomcgtzVVUqmws%2F-MeaBWv6pVgHu48_qeSs%2F-MeacE2AHKFx0XuYYmGE%2Fimage.png?alt=media\&token=d0659dc6-7ddd-40b8-a643-7e8b50cb5fdc)

### VEGITO &#x20;

Workload:&#x20;

* Typical workloads&#x20;
  * OLTP: short-term&#x20;
  * OLAP: long-term
* New type of workload: HTAP&#x20;

Requirement:

* Performance&#x20;
  * Minimizes performance degradation&#x20;
  * Millions of transactions per second&#x20;
* Freshness&#x20;
  * Delay..&#x20;

Data analysis: TP + ETL + AP

* HTAP alternatives&#x20;
* Alternative 1: Dual system&#x20;
  * Good performance&#x20;
  * Large time delay&#x20;
* Alter 2: single-layout&#x20;
  * Short time delay&#x20;
  * Huge performance degradation&#x20;
* Alter 3: dual-layout&#x20;
  * Lightweight sync.&#x20;

Idea: VEGITO&#x20;

* High availability for fast in-memory OLTP&#x20;
  * Replication-based&#x20;
  * Synchronous log shipping&#x20;
* Challenge&#x20;
  * Log cleaning&#x20;
    * Need to be consistent and log-parallel&#x20;
  * Epoch-based design&#x20;
    * Gossip-style&#x20;
      * Guarantee epoch assignment is consistent&#x20;
  * MVCS&#x20;
    * Multi-version column store (MVCS)&#x20;
    * Different locality for read & write&#x20;
      * column-wise v.s row-wise&#x20;
    * Use block-based MVCS&#x20;
      * Row-split&#x20;
      * Col-merge&#x20;
  * Tree-based index&#x20;
    * Write-optimized tree index
    * &#x20;Read-optimized index&#x20;
    * Tree insert = in-place insert + balance (costly)
      * Balance in batch&#x20;
