LEMMA-RCA

Multi-modal Datasets for Root Cause Analysis

Download


Root cause analysis (RCA) is a task of identifying the underlying causes of system faults/failures by analyzing the system monitoring data. LEMMA-RCA is a collection of multi-modal datasets with various real system faults to facilitate future research in RCA. It is multi-domain, consisting of real-world applications such as microservice and water treatment/distribution systems. The datasets are released under the CC BY-NC 4.0 license and hosted on Huggingface, the codes are available on Github.

Placeholder for teaser


Multiple Domains

LEMMA-RCA covers two domains: IT Operations (Product Review and Cloud Computing) and OT Operations (Water Treatment/Distribution). Each domain contains two datasets.

Real System Faults

Each dataset contains various system faults simulated from real-world scenarios. For details, please check our Github page.

Unified Evaluation

LEMMA-RCA datasets are evaluated with eight causal learning baselines in four settings: online/offline with single/multiple modality data.



ⓒ2024 LEMMA-RCA