DEV Community

Takara Taniguchi profile picture

Takara Taniguchi

404 bio not found

[memo]Generating Distractors for Reading Comprehension Questions from Real Examinations

[memo]Generating Distractors for Reading Comprehension Questions from Real Examinations

Comments
1 min read
[memo]WHEN AN LLM IS APPREHENSIVE ABOUT ITS ANSWERS - AND WHEN ITS UNCERTAINTY IS JUSTIFIED

[memo]WHEN AN LLM IS APPREHENSIVE ABOUT ITS ANSWERS - AND WHEN ITS UNCERTAINTY IS JUSTIFIED

Comments
1 min read
[memo]The Internal State of an LLM Knows When It’s Lying

[memo]The Internal State of an LLM Knows When It’s Lying

Comments
1 min read
[memo]Droid: A large-scale in-the-wild robot manipulation dataset

[memo]Droid: A large-scale in-the-wild robot manipulation dataset

Comments
1 min read
[memo]A Vision-Language-Action Flow Model for General Robot Control

[memo]A Vision-Language-Action Flow Model for General Robot Control

Comments
1 min read
[memo]RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

[memo]RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Comments
1 min read
[memo]SafeVLA: Towards Safety Alignment of VisionLanguage-Action Model via Constrained Learning

[memo]SafeVLA: Towards Safety Alignment of VisionLanguage-Action Model via Constrained Learning

Comments
1 min read
[memo]LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

[memo]LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning

Comments
1 min read
[memo]Visual Instruction Tuning

[memo]Visual Instruction Tuning

Comments
1 min read
[memo]Training-free Regional Prompting for Diffusion Transformers

[memo]Training-free Regional Prompting for Diffusion Transformers

Comments
1 min read
memo clearsight

memo clearsight

Comments
1 min read
[memo]OpenVLA: An Open-Source Vision-Language-Action Model

[memo]OpenVLA: An Open-Source Vision-Language-Action Model

Comments
1 min read
[memo]MMBench: Is Your Multi-modal Model an All-around Player?

[memo]MMBench: Is Your Multi-modal Model an All-around Player?

Comments
1 min read
[memo]Enhancing Distractor Generation Retrieval Augmented Pretraining and Knowledge Graph Integration

[memo]Enhancing Distractor Generation Retrieval Augmented Pretraining and Knowledge Graph Integration

Comments
1 min read
[memo]FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

[memo]FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

Comments
1 min read
[memo]Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning

[memo]Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning

Comments
1 min read
[memo]Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction

[memo]Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction

Comments
1 min read
[memo]Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models

[memo]Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models

Comments
1 min read
[memo]Unified Hallucination Detection for Multimodal Large Language Models

[memo]Unified Hallucination Detection for Multimodal Large Language Models

Comments
1 min read
[memo]OminiControl: Minimal and Universal Control for Diffusion Transformer

[memo]OminiControl: Minimal and Universal Control for Diffusion Transformer

Comments
1 min read
[memo]Scalable Diffusion Models with Transformers

[memo]Scalable Diffusion Models with Transformers

Comments
1 min read
[memo]A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge

[memo]A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge

Comments
1 min read
[memo] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

[memo] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Comments
1 min read
[memo]VIDHAL: Benchmarking Temporal Hallucinations in Vision LLMs

[memo]VIDHAL: Benchmarking Temporal Hallucinations in Vision LLMs

Comments
1 min read
[memo]AMBER: An Adversarial Multimodal Benchmark for Robustness Evaluation

[memo]AMBER: An Adversarial Multimodal Benchmark for Robustness Evaluation

Comments
1 min read
[memo]mPLUG-Owl : Modularization Empowers Large Language Models with Multimodality

[memo]mPLUG-Owl : Modularization Empowers Large Language Models with Multimodality

Comments
1 min read
[memo] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

[memo] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Comments
1 min read
[memo]Flamingo: a Visual Language Model for Few-Shot Learning

[memo]Flamingo: a Visual Language Model for Few-Shot Learning

Comments
1 min read
[memo]AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

[memo]AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

Comments
1 min read
[memo]VideoVista: A Versatile Benchmark for Video Understanding and Reasoning

[memo]VideoVista: A Versatile Benchmark for Video Understanding and Reasoning

Comments
1 min read
[memo]Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

[memo]Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

1
Comments
1 min read
CoCa: Contrastive Captioners are Image-Text Foundation Models

CoCa: Contrastive Captioners are Image-Text Foundation Models

1
Comments
1 min read
[memo]AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

[memo]AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE

Comments
1 min read
[memo]Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

[memo]Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

Comments
1 min read
Pythonのドッカーイメージについて

Pythonのドッカーイメージについて

Comments
1 min read
[For me] AI-toolkit bug report

[For me] AI-toolkit bug report

Comments
1 min read
[memo]VITED: Video Temporal Evidence Distillation

[memo]VITED: Video Temporal Evidence Distillation

Comments
1 min read
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

Comments
1 min read
Image Difference Captioning with Pre-training and Contrastive Learning

Image Difference Captioning with Pre-training and Contrastive Learning

Comments 2
1 min read
VideoPrism: A Foundational Visual Encoder for Video Understanding

VideoPrism: A Foundational Visual Encoder for Video Understanding

Comments
1 min read
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

Comments
1 min read
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

Comments
1 min read
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Comments
1 min read
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

Comments
1 min read
TextFlux: An OCR-Free DiT Model for High-Fidelity Multilingual Scene Text Synthesis

TextFlux: An OCR-Free DiT Model for High-Fidelity Multilingual Scene Text Synthesis

Comments
1 min read
Video-LLaMA An Instruction-tuned Audio-Visual Language Model for Video Understanding

Video-LLaMA An Instruction-tuned Audio-Visual Language Model for Video Understanding

Comments
1 min read
Video Instruction Tuning With Synthetic Data

Video Instruction Tuning With Synthetic Data

Comments
1 min read
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Comments
1 min read
Transparent Image Layer Diffusion using Latent Transparency

Transparent Image Layer Diffusion using Latent Transparency

Comments
1 min read
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations (1)

MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations (1)

Comments
1 min read
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation

Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation

Comments
1 min read
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Comments
1 min read
Denoising Diffusion Probabilistic Models

Denoising Diffusion Probabilistic Models

Comments
1 min read
ANYTEXT: MULTILINGUAL VISUAL TEXT GENERATION AND EDITING

ANYTEXT: MULTILINGUAL VISUAL TEXT GENERATION AND EDITING

Comments
1 min read
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation

PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation

Comments
1 min read
CARAT: Contrastive Feature Reconstruction and Aggregation for Multi-Modal Multi-Label Emotion Recognition

CARAT: Contrastive Feature Reconstruction and Aggregation for Multi-Modal Multi-Label Emotion Recognition

Comments
1 min read
Seeing is Believing: Mitigating Hallucination in Large VisionLanguage Models via CLIP-Guided Decoding

Seeing is Believing: Mitigating Hallucination in Large VisionLanguage Models via CLIP-Guided Decoding

Comments
1 min read
Cross-Covariate Gait Recognition: A Benchmark

Cross-Covariate Gait Recognition: A Benchmark

Comments
1 min read
DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection

DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection

Comments
1 min read
Kernelized Normalizing Constant Estimation: Bridging Bayesian Quadrature and Bayesian Optimization

Kernelized Normalizing Constant Estimation: Bridging Bayesian Quadrature and Bayesian Optimization

Comments
1 min read
loading...