LEMMA-RCA

Multi-modal Datasets for Root Cause Analysis

Download


Root cause analysis (RCA) is a task of identifying the underlying causes of system faults/failures by analyzing the system monitoring data. LEMMA-RCA is a collection of multi-modal datasets with various real system faults to facilitate future research in RCA. It is multi-domain, consisting of real-world applications such as microservice and water treatment/distribution systems. The datasets are released under the CC BY-NC 4.0 license and hosted on Huggingface, the codes are available on Github.

Placeholder for teaser


Multiple Domains

LEMMA-RCA covers two domains: IT Operations (Product Review and Cloud Computing) and OT Operations (Water Treatment/Distribution). Each domain contains two datasets.

Real System Faults

Each dataset contains various system faults simulated from real-world scenarios. For details, please check our Github page.

Unified Evaluation

LEMMA-RCA datasets are evaluated with eight causal learning baselines in four settings: online/offline with single/multiple modality data.


Citation

If you use LEMMA-RCA in your work, please cite our paper


      @misc{zheng2024lemmarca,
      title={LEMMA-RCA: A Large Multi-modal Multi-domain Dataset for Root Cause Analysis}, 
      author={Lecheng Zheng, Zhengzhang Chen, Dongjie Wang, Chengyuan Deng, Reon Matsuoka, and Haifeng Chen},
      year={2024},
      eprint={2406.05375},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
      }
		    			

ⓒ2024 LEMMA-RCA