This page intentionally left blank.

dummy slide

From idea to product:
creating Spare Cores

Gergely Daróczi

Jan 10, 2025

Slides: sparecores.com/talks

Press Space or click the green arrow icons to navigate the slides ->

>>> from sparecores import why

Data Science / Machine Learning batch jobs:

run SQL
run R or Python script

train a simple model, reporting, API integrations etc.
train hierarchical models/GBMs/neural nets etc.

>>> from sparecores import why

Data Science / Machine Learning batch jobs:

run SQL
run R or Python script

train a simple model, reporting, API integrations etc.
train hierarchical models/GBMs/neural nets etc.

Scaling (DS) infrastructure.

>>> from sparecores import why

AWS ECS

AWS Batch

Kubernetes

>>> from sparecores import why

Source: xkcd

>>> from sparecores import why

Other use-cases:

stats/ML/AI model training,
ETL pipelines,
traditional CI/CD workflows for compiling and testing software,
building Docker images,
rendering images and videos,
etc.

>>> from sparecores import why

>>> from sparecores import support

>>> from sparecores import intro

Open-source tools, database schemas and documentation to inspect and inventory cloud vendors and their compute resource offerings.

Managed infrastructure, databases, APIs, SDKs, and web applications to make these data sources publicly accessible.

Helpers to start and manage instances in your own environment.

SaaS to run containers in a managed environment without direct vendor engagement.

so Spare Cores is an open-source ecosystem, including software, database schemas, guides,
and actual databases if you don’t want to run the ETL tooling yourself .. also providing APIs, SDKs etc to make it easier to query data
unified CLI to start machines
and working on an an optional SaaS offering built on the top of the open-source tooling for folks who would rather avoid registering with all cloud providers etc: give us a Docker image, a command to run, and you credit card .. all set, we will run it wherever it’s cheapest.

We help DevOps, DS, ML, AI, ETL, AV, and other engineering teams to find optimal instances for their batch jobs (e.g. “8 CPU cores, 64 GB of RAM, and a TPU needed in an EU datacenter to train ML models for 6 hours”) by providing:

Open-source tools, database schemas and documentation to monitor cloud and flexible VPS/dedicated server vendors and their compute resource offerings in an innovative and truly comparative way, including vendor details (e.g. location, certificates, green power), compute capabilities (e.g. CPU, memory, GPU/TPU), pricing (especially of spot instances), and performance (by running task-specific benchmarks).
Managed infrastructure, databases, APIs, SDKs, and web applications to make these continuously and transparently tracked data sources publicly available and comparable in a validated, unbiased, structured, and searchable manner.
Helpers to easily start and manage instances at all the supported vendors with a standardized API.
SaaS

BUT let’s focus on the open-source and open-data components ..

>>> from sparecores import intro

Source: sparecores.com

>>> from sparecores import intro

>>> from rich import print as pp
>>> from sc_crawler.tables import Server
>>> from sqlmodel import create_engine, Session, select
>>> engine = create_engine("sqlite:///sc-data-all.db")
>>> session = Session(engine)
>>> server = session.exec(select(Server).where(Server.server_id == 'g4dn.xlarge')).one()
>>> pp(server)
Server(
    server_id='g4dn.xlarge',
    vendor_id='aws',
    display_name='g4dn.xlarge',
    api_reference='g4dn.xlarge',
    name='g4dn.xlarge',
    family='g4dn',
    description='Graphics intensive [Instance store volumes] [Network and EBS optimized] Gen4 xlarge',

    status=<Status.ACTIVE: 'active'>,
    observed_at=datetime.datetime(2024, 6, 6, 10, 18, 4, 127254),

    hypervisor='nitro',
    vcpus=4,
    cpu_cores=2,
    cpu_allocation=<CpuAllocation.DEDICATED: 'Dedicated'>,
    cpu_manufacturer='Intel',
    cpu_family='Xeon',
    cpu_model='8259CL',
    cpu_architecture=<CpuArchitecture.X86_64: 'x86_64'>,
    cpu_speed=3.5,
    cpu_l1_cache=None,
    cpu_l2_cache=None,
    cpu_l3_cache=None,
    cpu_flags=[],

    memory_amount=16384,
    memory_generation=<DdrGeneration.DDR4: 'DDR4'>,
    memory_speed=3200,
    memory_ecc=None,

    gpu_count=1,
    gpu_memory_min=16384,
    gpu_memory_total=16384,
    gpu_manufacturer='Nvidia',
    gpu_family='Turing',
    gpu_model='Tesla T4',
    gpus=[
        {
            'manufacturer': 'Nvidia',
            'family': 'Turing',
            'model': 'Tesla T4',
            'memory': 15360,
            'firmware_version': '535.171.04',
            'bios_version': '90.04.96.00.A0',
            'graphics_clock': 1590,
            'sm_clock': 1590,
            'mem_clock': 5001,
            'video_clock': 1470
        }
    ],

    storage_size=125,
    storage_type=<StorageType.NVME_SSD: 'nvme ssd'>,
    storages=[{'size': 125, 'storage_type': 'nvme ssd'}],

    network_speed=5.0,
    inbound_traffic=0.0,
    outbound_traffic=0.0,
    ipv4=0,
)

>>> sparecores.dir()

>>> input(“Biggest challenge?”)

>>> input(“Why not use ChatGPT?!”)

>>> input(“Burned 150k on this?!”)

>>> from sc_crawler import pricing

No way to find SKUs by filtering in the API call. Get all, search locally.

f1-micro is one out of 2 instances with simple pricing.
- For other instances, lookup SKUs for CPU + RAM and do the math.

Match instance family with SKU via search in description, e.g. C2D.

Except for c2, which is called “Compute optimized”.

And m2 is actually priced at a premium on the top of m1.

The n1 resource group is not CPU/RAM, but N1Standard, extract if it’s CPU or RAM price from description.

>>> import sc_data

Source: dbhub.io/sparecores

>>> from sparecores import team

@bra-fsn

@palabola

@daroczig

>>> from sparecores import team

@bra-fsn

Infrastructure and Python veteran.

@palabola

Guardian of the front-end and Node.js tools.

@daroczig

Hack of all trades, master of NaN.

Slides: sparecores.com/talks

dummy slide

From idea to product:creating Spare Cores

Gergely Daróczi

Jan 10, 2025

Slides: sparecores.com/talks

>>> from sparecores import why

>>> from sparecores import why

>>> from sparecores import why

>>> from sparecores import why

>>> from sparecores import why

>>> from sparecores import why

>>> from sparecores import why

>>> from sparecores import why

>>> from sparecores import why

>>> from sparecores import why

>>> from sparecores import why

>>> from sparecores import support

>>> from sparecores import intro

>>> from sparecores import intro

>>> from sparecores import intro

>>> from sparecores import intro

>>> from sparecores import intro

>>> from sparecores import intro

>>> from sparecores import intro

>>> from sparecores import intro

>>> from sparecores import intro

>>> from sparecores import intro

>>> from sparecores import intro

>>> sparecores.__dir__()

>>> input(“Biggest challenge?”)

>>> input(“Biggest challenge?”)

>>> input(“Why not use ChatGPT?!”)

>>> input(“Why not use ChatGPT?!”)

>>> input(“Burned 150k on this?!”)

>>> from sc_crawler import pricing

>>> import sc_data

>>> from sparecores import team

>>> from sparecores import team

From idea to product:
creating Spare Cores

>>> sparecores.dir()