

Research & Project
Data-Driven Decisions: Smarter Financial Planning via Data Visualization and AI-Driven Insights
This project equips DMRF with a robust data architecture to enhance transparency and strategic planning. By analyzing, visualizing, and forecasting financial data (2018–2024), we deliver actionable insights through advanced analytics and predictive modeling.

Integrating MedCLIP and Cross-Modal Fusion for Automatic Radiology Report Generation
We propose a novel cross-modal framework that uses MedCLIP as both a vision extractor and a retrieval mechanism to improve the process of medical report generation. By extracting retrieved report features and image features through an attention-based extract module, and integrating them with a fusion module, our method improves the coherence and clinical relevance of generated reports.

Confidence Bounded Replica Currency Estimation

Replicas of the same data often show varying consistency levels during read and write operations due to network and system limitations. Estimating the currency (staleness) of data from responding replicas without accessing others is crucial for applications needing timely updates. Depending on the confidence in this estimation, queries can decide to use the retrieved replicas or wait for additional responses. Our approach provides theoretical bounds on the confidence of such estimations, ensuring accuracy with minimal overhead. We implement a confidence-bounded replica currency estimation system in Cassandra, introducing a novel DYNAMIC read consistency level.
Contextual Data Cleaning and Ontological Dependencies
Functional Dependencies (FDs) rely on syntactic equality and often mislabel semantically equivalent values as errors in data cleaning. To address this, we introduce Ontology Functional Dependencies (OFDs), which capture semantic relationships, like synonyms, using ontologies. We establish OFD foundations, including axioms, a linear-time inference procedure, and an algorithm for discovering OFDs, including those with exceptions. We develope FastOFD, a contextual data cleaning framework for addressing minimal repairs for data and ontologies under OFDs.

CurrentClean: Spatio-Temporal Cleaning of Stale Data
Data currency is imperative towards achieving up-to-date and accurate data analysis. Identifying and repairing stale data goes beyond simply having timestamps. Individual entities each have their own update patterns in both space and time. We develop CurrentClean, a probabilistic system for identifying and cleaning stale values. We propose a spatio-temporal
probabilistic model with inference rules to capture database update patterns and identify stale values, recommending repairs based on past update trends.

Collaborators







