[{"data":1,"prerenderedAt":252},["ShallowReactive",2],{"studies-en-animegen":3},{"id":4,"title":5,"body":6,"category":230,"description":81,"extension":231,"featured":232,"links":233,"meta":234,"navigation":232,"order":235,"path":236,"period":237,"role":238,"seo":239,"slug":240,"stack":241,"stem":249,"subtitle":250,"summary":233,"__hash__":251},"projects_en\u002Fprojects\u002Fen\u002Fanimegen.md","AnimeGen",{"type":7,"value":8,"toc":221},"minimark",[9,14,31,34,38,45,67,71,74,85,89,95,105,115,133,147,156,160,203,207,214],[10,11,13],"h2",{"id":12},"the-context","The context",[15,16,17,18,22,23,26,27,30],"p",{},"The ",[19,20,21],"strong",{},"Computer Vision"," exam posed a concrete challenge: given two consecutive manga panels, generate a short anime-style transition video that connects them. The starting point is ",[19,24,25],{},"ToonCrafter",", a recent diffusion-based image-to-video model designed for cartoon interpolation, applied to the ",[19,28,29],{},"Manga109"," dataset.",[15,32,33],{},"The interesting problem isn't \"make the model run\" — it's that ToonCrafter is built for Western colour animation, and manga are a different domain. Applied as-is, it produces mediocre results for specific reasons.",[10,35,37],{"id":36},"three-domain-specific-weaknesses","Three domain-specific weaknesses",[15,39,40,41,44],{},"Before touching the model, I isolated ",[19,42,43],{},"why"," the baseline failed on manga:",[46,47,48,55,61],"ol",{},[49,50,51,54],"li",{},[19,52,53],{},"Text balloons"," — comics contain speech bubbles that don't exist in anime. The generator reads them as objects to animate and gets confused.",[49,56,57,60],{},[19,58,59],{},"Screentone noise"," — scanned line-art carries regular textures the model mistakes for semantic structure.",[49,62,63,66],{},[19,64,65],{},"Low frame rate"," — generated clips have few frames, with temporal incoherence that's perceptible to the eye.",[10,68,70],{"id":69},"the-pipeline","The pipeline",[15,72,73],{},"A pre\u002Fpost-processing chain that addresses all three weaknesses:",[75,76,82],"pre",{"className":77,"code":79,"language":80,"meta":81},[78],"language-text","Manga109\n  → balloon removal      (LaMa inpainting, guided by Manga109 ground-truth bounding boxes)\n  → LAB preprocessing    (bilateral → CLAHE → unsharp: denoise + edge sharpening)\n  → ToonCrafter          (16-frame generation)\n  → RIFE 4×              (temporal frame-rate upsampling)\n","text","",[83,84,79],"code",{"__ignoreMap":81},[10,86,88],{"id":87},"key-technical-decisions","Key technical decisions",[15,90,91,94],{},[19,92,93],{},"1. Balloon removal with ground-truth, not heuristics."," LaMa inpainting is guided by the speech-bubble bounding boxes already annotated in Manga109 — no custom detector to train, just using the dataset for what it offers.",[15,96,97,100,101,104],{},[19,98,99],{},"2. Preprocessing in LAB colour space."," Denoise and sharpening operate on luminance separated from colour: the ",[83,102,103],{},"bilateral → CLAHE → unsharp"," chain cleans the screentone without destroying the line-work contours.",[15,106,107,110,111,114],{},[19,108,109],{},"3. A rethought style metric."," Prior work used a 3-channel Gram-matrix style metric — uninformative on grayscale manga. I replaced it with a ",[19,112,113],{},"VGG19","-based formulation, which makes more sense for monochrome line-art.",[15,116,117,120,121,124,125,128,129,132],{},[19,118,119],{},"4. Manual model-sharding to generate at 16 frames."," The canonical 16-frame configuration didn't fit in the memory of a ",[19,122,123],{},"Kaggle T4 × 2"," environment. I implemented manual model-sharding: OpenCLIP and the VAE on ",[83,126,127],{},"cuda:0",", the diffusion U-Net on ",[83,130,131],{},"cuda:1",". A hardware constraint solved with explicit allocation instead of cutting quality.",[15,134,135,138,139,142,143,146],{},[19,136,137],{},"5. Evaluation on two axes, quantitative and perceptual."," A ",[19,140,141],{},"2×2×2 ablation"," over N=40 pairs from 5 titles measures each stage's contribution; a ",[19,144,145],{},"2-AFC pairwise user study"," (forced choice between two clips, with a \"no preference\" option) validates perceptual preference with a two-sided binomial test on the decisive votes.",[15,148,149,138,152,155],{},[19,150,151],{},"6. Interactive demo.",[19,153,154],{},"Streamlit"," web app lets you pick any of the 40 pairs and compare its video under all 8 ablation configurations, with the per-clip metrics alongside.",[10,157,159],{"id":158},"by-the-numbers","By the numbers",[161,162,163,173,185,190,196],"ul",{},[49,164,165,166,169,170],{},"Full pipeline vs vanilla ToonCrafter baseline: ",[19,167,168],{},"−61.3% LPIPS",", ",[19,171,172],{},"−34.5% Warping Error",[49,174,175,176,179,180,184],{},"The dominant contribution comes from ",[19,177,178],{},"temporal frame interpolation"," — preprocessing and balloon removal interact ",[181,182,183],"em",{},"non-monotonically"," with the downstream metrics",[49,186,187,189],{},[19,188,141],{}," · N=40 pairs · 5 titles",[49,191,192,195],{},[19,193,194],{},"2-AFC user study",": 10 respondents, 40 decisive votes per comparison axis",[49,197,198,199,202],{},"~",[19,200,201],{},"1,900 lines"," of Python (core modules: preprocessing, balloon removal, metrics, panel extraction + pipeline scripts)",[10,204,206],{"id":205},"what-i-took-away","What I took away",[15,208,209,210,213],{},"Adapting a model outside its domain teaches two things. First: ",[19,211,212],{},"the biggest gain came from the least glamorous part"," — frame interpolation, not fine-tuning the diffusion model. Second: a wrong metric lies with confidence. The 3-channel style metric gave precise, useless numbers on grayscale images; until I replaced it, the ablation was telling a false story.",[15,215,216,217,220],{},"Project developed at the ",[19,218,219],{},"University of Bari Aldo Moro",".",{"title":81,"searchDepth":222,"depth":222,"links":223},2,[224,225,226,227,228,229],{"id":12,"depth":222,"text":13},{"id":36,"depth":222,"text":37},{"id":69,"depth":222,"text":70},{"id":87,"depth":222,"text":88},{"id":158,"depth":222,"text":159},{"id":205,"depth":222,"text":206},"studies","md",true,null,{},11,"\u002Fprojects\u002Fen\u002Fanimegen","2026","Computer Vision · MSc curriculum AI",{"title":5,"description":81},"animegen",[242,243,244,245,246,247,248,154],"Python","PyTorch","ToonCrafter (diffusion)","LaMa inpainting","RIFE","OpenCV","VGG19 · LPIPS","projects\u002Fen\u002Fanimegen","A diffusion-interpolation pipeline that generates short anime-style transitions between two consecutive manga panels — ToonCrafter adapted to the domain with balloon removal, preprocessing and frame interpolation.","NRoYhAU0BM6G5gIbu5KQxVV8zrkm2a3RHDuES7QSLSc",1781346783483]