Skip to content
Lorenzo Pastore

Data Info Knowledge

I build pobablistic ML systems and much more

Pietrasanta, IT

01About

Probability over certainty, end-to-end over hand-off.

About

I'm a Data Scientist and ML Engineer with a statistics background. BSc in Statistical Sciences from Bologna (110 cum laude, 2018), MSc in Data Science from Milano-Bicocca (108/110, 2022).
That order matters: I think in distributions first, in models second.

My working line is probabilistic ML and LLM agent systems, but I don't stop at the model. I've shipped products end-to-endfrom ingestion and warehouse modeling, through training and evaluation, up to the UI a user actually clicks. The reason is selfish: a model that never reaches a user is a paper, not a product, and I prefer products.

Right now I'm Data Scientist at Menumal, designing and implementing the company's data platform: heterogeneous ingestion with Airbyte, transforms in dbt, orchestration on Temporal.io, probabilistic record linkage with Splink, Postgres underneath, Metabase on top. Production decisions documented, reconciliation audited, security audited.

Before Menumal, I spent close to three years at Tomato AI as Full-Stack Developer and AI Engineer, rewriting a hospitality SaaS from legacy PHP to Next.js + Django + PostgreSQL with a team of three, migrating active clients to the new platform, and building AI modules for revenue management and supplier price monitoring on the OpenAI API. Deployment was hybrid AWS + Vercel. Independent consulting since 2019 keeps the muscles for market research, segmentation, and lightweight modeling honest.

On the side I'm building vaguea Python library that represents an LLM agent's memory as a Gaussian Mixture Model in embedding space, instead of a list of retrievable chunks. Two primitives, benchmarked on LongBench (3 tasks × 2 models). GaussianBelief is F1-equivalent to naive RAGthe value is structural: belief states compose, update incrementally, and merge between agents in closed form as mixture parameters. SummaryBelief (experimental) compresses the injected context 15–40×: on small/quantized models (Qwen3-8B-4bit) it recovers +30–65% F1 over the best baseline; on a frontier model (Haiku 4.5) it costs15%. A real engineering trade-off, surfaced empirically. Integrations with the Anthropic SDK and LangGraph. Pre-release; numbers are real, API isn't frozen yet.

There's a consistent line through the academic work toomy MSc thesis was on a framework for unbiased word embeddings, and I have a separate project on adversarial fair classification. Responsible AI isn't a sticker I add later; it's how I was trained to think.

  • MSc Data Science
  • BSc Statistics
  • 4Y+ production ML
  • Open source
02Stack

Honest fingerprint of what I use in production.

ML / DL

6

Probabilistic and deep models.

  • Python
  • PyTorch
  • scikit-learn
  • Gaussian Mixture Models
  • Probabilistic ML
  • Embeddings

LLM / Agents

6

Agentic systems, retrieval, evaluation.

  • Anthropic SDK
  • OpenAI API
  • LangGraph
  • RAG
  • LongBench
  • Needle-in-haystack

Data Engineering

7

Ingestion, modeling, orchestration.

  • Airbyte
  • dbt
  • Temporal.io
  • Splink
  • Postgres
  • SQL
  • pandas

Web / Infra

7

Product surface, APIs, and infrastructure.

  • Next.js
  • Django
  • Docker
  • AWS
  • Vercel
  • Railway
  • Git

Practices: production deployment · data quality & reconciliation · security audit documentation

03Contact

Fastest way to reach me.