<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Krakiun Blog]]></title><description><![CDATA[Pe acest blog scriu despre activitatea mea de zii cu zii.]]></description><link>https://blog.krakiun.com</link><image><url>https://blog.krakiun.com/img/substack.png</url><title>Krakiun Blog</title><link>https://blog.krakiun.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 13 Apr 2026 10:59:20 GMT</lastBuildDate><atom:link href="https://blog.krakiun.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Krakiun]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[krakiun@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[krakiun@substack.com]]></itunes:email><itunes:name><![CDATA[Craciun Florentin]]></itunes:name></itunes:owner><itunes:author><![CDATA[Craciun Florentin]]></itunes:author><googleplay:owner><![CDATA[krakiun@substack.com]]></googleplay:owner><googleplay:email><![CDATA[krakiun@substack.com]]></googleplay:email><googleplay:author><![CDATA[Craciun Florentin]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[De ce colecționez dataseturi Nvidia pe LTO-8]]></title><description><![CDATA[Ast&#259;zi e a doua zi de Pa&#537;te.]]></description><link>https://blog.krakiun.com/p/de-ce-colectionez-dataseturi-nvidia</link><guid isPermaLink="false">https://blog.krakiun.com/p/de-ce-colectionez-dataseturi-nvidia</guid><dc:creator><![CDATA[Craciun Florentin]]></dc:creator><pubDate>Mon, 13 Apr 2026 10:25:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!iIAq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd897f334-199d-4939-b991-e52c81690f46_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iIAq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd897f334-199d-4939-b991-e52c81690f46_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iIAq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd897f334-199d-4939-b991-e52c81690f46_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!iIAq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd897f334-199d-4939-b991-e52c81690f46_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!iIAq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd897f334-199d-4939-b991-e52c81690f46_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!iIAq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd897f334-199d-4939-b991-e52c81690f46_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iIAq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd897f334-199d-4939-b991-e52c81690f46_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d897f334-199d-4939-b991-e52c81690f46_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2861621,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.krakiun.com/i/194054937?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd897f334-199d-4939-b991-e52c81690f46_1536x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iIAq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd897f334-199d-4939-b991-e52c81690f46_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!iIAq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd897f334-199d-4939-b991-e52c81690f46_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!iIAq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd897f334-199d-4939-b991-e52c81690f46_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!iIAq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd897f334-199d-4939-b991-e52c81690f46_1536x1024.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><br>Ast&#259;zi e a doua zi de Pa&#537;te. Familia e la mas&#259;. PC-ul meu scrie pe band&#259; magnetic&#259;.</p><p>Nu e un glitch. E o decizie deliberat&#259;.</p><div><hr></div><h2><strong>Ce se &#238;nt&#226;mpl&#259; efectiv</strong></h2><p>Fac backup la dou&#259; dataseturi publicate de Nvidia pe Hugging Face:</p><p><strong>Nemotron-CC-v2</strong> &#8212; un dataset masiv de con&#539;inut web procesat &#537;i curat. Miliarde de tokeni extra&#537;i din Common Crawl, filtra&#539;i, deduplica&#539;i, preg&#259;ti&#539;i pentru pretraining.</p><p><strong>Nemotron-Pretraining-Code-v2</strong> &#8212; echivalentul pentru cod. Tot ce a folosit Nvidia ca s&#259; antreneze modelele lor de programare.</p><p>Ambele sunt publice acum. Desc&#259;rcabile gratuit. Pe Hugging Face.</p><p>&#206;ntrebarea e &#8212; pentru c&#226;t timp?</p><div><hr></div><h2><strong>Datele vs Modelele &#8212; ce e cu adev&#259;rat valoros</strong></h2><p>Toat&#259; lumea vorbe&#537;te despre modele. GPT-5, Claude, Gemini. Cine e mai bun, cine e mai rapid, cine cost&#259; mai pu&#539;in.</p><p>Nimeni nu vorbe&#537;te despre datele din spatele lor.</p><p>Eu am o perspectiv&#259; diferit&#259;, venit&#259; din 20 de ani &#238;n digital: <strong>&#238;n orice industrie, cine controleaz&#259; materia prim&#259; controleaz&#259; industria.</strong></p><p>Modelele AI sunt produsul finit. Datele de pretraining sunt materia prim&#259;.</p><p>Modelele se schimb&#259; la fiecare 6 luni. Se &#238;mbun&#259;t&#259;&#539;esc, se &#238;nlocuiesc, devin obsolete.</p><p>Datele de pretraining de calitate &#8212; r&#259;m&#226;n valoroase. Un dataset curat, bine procesat, reprezint&#259; sute de milioane de dolari &#238;n infrastructur&#259; de colectare &#537;i procesare. Nvidia nu a publicat Nemotron-CC-v2 din generozitate. L-a publicat pentru c&#259; la momentul respectiv a calculat c&#259; e mai util public dec&#226;t privat.</p><p>Acest calcul se poate schimba.</p><div><hr></div><h2><strong>De ce LTO-8 &#537;i nu cloud</strong></h2><p>&#206;ntrebarea logic&#259; e &#8212; de ce nu pui totul pe un hard disk sau &#238;n cloud?</p><p><strong>Cloud</strong> &#8212; pl&#259;te&#537;ti lunar. Depinzi de un furnizor. Dac&#259; m&#226;ine Amazon, Google sau Microsoft decide c&#259; stochezi ceva inconvenabil &#8212; datele dispar sau devin inaccesibile. Nu e paranoie, e clauze contractuale standard.</p><p><strong>Hard disk</strong> &#8212; durat&#259; de via&#539;&#259; de 3-5 ani &#238;n condi&#539;ii bune. Fragil. Nu e conceput pentru arhivare pe termen lung.</p><p><strong>LTO-8</strong> &#8212; band&#259; magnetic&#259;. Durat&#259; de via&#539;&#259; 30+ ani dac&#259; e stocat&#259; corect. 12TB per caset&#259; nativ. Cost per TB incomparabil mai mic pe termen lung. Tehnologie testat&#259; zeci de ani &#238;n industria broadcast &#537;i arhivare profesional&#259;.</p><p>E solu&#539;ia pe care o folosesc televiziunile, arhivele na&#539;ionale, studiile Hollywood pentru a p&#259;stra con&#539;inut pe termen lung.</p><p>Are sens s&#259; o folosesc &#537;i eu pentru dataseturi pe care vreau s&#259; le am disponibile peste 10 ani.</p><div><hr></div><h2><strong>Ce fac cu ele</strong></h2><p>Sincer? Acum &#8212; nimic special.</p><p>Dar am o convingere clar&#259;: tehnicile de a lucra cu dataseturi mari evolueaz&#259; rapid. Ce azi necesit&#259; infrastructur&#259; de datacenter, peste 3-5 ani va fi accesibil pe un server de acas&#259;.</p><p>C&#226;nd acel moment va veni &#8212; vreau s&#259; am datele. Nu s&#259; le caut retroactiv &#537;i s&#259; descop&#259;r c&#259; nu mai sunt disponibile.</p><p>E acela&#537;i principiu cu care am abordat orice oportunitate digital&#259; &#238;n ultimii 20 de ani: <strong>intri &#238;nainte s&#259; fie evident, nu dup&#259;.</strong></p><div><hr></div><h2><strong>Concluzia</strong></h2><p>Nu &#537;tiu exact cum voi folosi aceste dataseturi &#238;n viitor. &#536;tiu c&#259; Nvidia a investit resurse masive &#238;n colectarea &#537;i procesarea lor. &#536;tiu c&#259; sunt publice acum &#537;i c&#259; asta s-ar putea schimba. &#536;tiu c&#259; banda magnetic&#259; LTO-8 va p&#259;stra datele intacte mai mult dec&#226;t orice alt&#259; solu&#539;ie accesibil&#259; azi.</p><p>Uneori cea mai bun&#259; decizie strategic&#259; e s&#259; colectezi &#238;nainte s&#259; &#537;tii exact de ce.</p><p>&#206;ntrebarea pe care &#539;i-o las: <strong>tu ce faci cu datele pe care le ai acces azi &#537;i m&#226;ine s-ar putea s&#259; nu mai existe?</strong></p>]]></content:encoded></item></channel></rss>