We introduce MIRAI, a benchmark crafted for evaluating LLM agents in temporal forecasting of international events with tool use and complex reasoning. With 59,161 unique events and 296,630 unique news articles, we curate a test set of 705 forecasting query-answer pairs.
Jul 1, 2024
We introduce STIC (Self-Training on Image Comprehension) that enhances the understanding and reasoning capabilities of LVLMs through self-generated data. Our experiments across seven benchmarks, including ScienceQA, TextVQA, ChartQA, LLaVA-Bench, MMBench, MM-Vet, and MathVista, demonstrate a notable average accuracy gain of 4.0% by self-training.
May 30, 2024
Jan 2, 2024
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis posuere tellus ac convallis placerat. Proin tincidunt magna sed ex sollicitudin condimentum.
Jul 1, 2013