Large Language Models

MIRAI: Evaluating LLM Agents for Event Forecasting

We introduce MIRAI, a benchmark crafted for evaluating LLM agents in temporal forecasting of international events with tool use and complex reasoning. With 59,161 unique events and 296,630 unique news articles, we curate a test set of 705 forecasting query-answer pairs.

Jul 1, 2024

Enhancing Large Vision Language Models with Self-Training on Image Comprehension

We introduce STIC (Self-Training on Image Comprehension) that enhances the understanding and reasoning capabilities of LVLMs through self-generated data. Our experiments across seven benchmarks, including ScienceQA, TextVQA, ChartQA, LLaVA-Bench, MMBench, MM-Vet, and MathVista, demonstrate a notable average accuracy gain of 4.0% by self-training.

May 30, 2024

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Jan 2, 2024

An example conference paper

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis posuere tellus ac convallis placerat. Proin tincidunt magna sed ex sollicitudin condimentum.

Jul 1, 2013