We introduce MIRAI, a benchmark crafted for evaluating LLM agents in temporal forecasting of international events with tool use and complex reasoning. With 59,161 unique events and 296,630 unique news articles, we curate a test set of 705 forecasting query-answer pairs.
Jul 1, 2024
We introduce STIC (Self-Training on Image Comprehension) that enhances the understanding and reasoning capabilities of LVLMs through self-generated data. Our experiments across seven benchmarks, including ScienceQA, TextVQA, ChartQA, LLaVA-Bench, MMBench, MM-Vet, and MathVista, demonstrate a notable average accuracy gain of 4.0% by self-training.
May 30, 2024
Feb 13, 2024
Jan 1, 2024
Nov 7, 2023
Oct 1, 2023
Jan 1, 2022
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis posuere tellus ac convallis placerat. Proin tincidunt magna sed ex sollicitudin condimentum.
Sep 1, 2015