mlops | startups

Mypy and Doltpy

This is a cross-post from Dolthub’s Blog. Dolt Dolt is an SQL-database with Git-versioning. The goal of Doltpy, in concert with Dolt, is to solve reproducibility and versioning problems for data and machine learning engineers using Python. Mypy Mypy was created by Guido van Rossum, the primary developer of the Python language, as a way to apply PEP standards to Python source code....

Lambda MR

What problems to Spark and Map Reduce solve? Was Spark the next generation of MapReduce? Do they solve the same problem? Is Spark quick and failure-prone, while MapReduce is slow and reliable? To play devil’s advocate, I think MapReduce was the right abstraction for the wrong problem....

MLOps Segmentation

I was doing some market analysis on data science tools, and broke down categories along a “hand-holding” metric. I think this falls back to the same “correctness principle” that I mentioned in the workflow article, where there is a balance between lightweight tools that flexibly bend to the user’s need, and heavy-weight tools that do “everything” in a narrow and opinionated way....

Unrealistic DS - Spark Tensorflow

People aren’t afraid of autocrats. People are afraid of being different from thier neighbors. -Jacob Snell (Ozark) These are a few ideas that might not be possible to develop right now (or might not even be good ideas in practice), but originate from pain points in my work....

Unrealistic DS - Minimalist Workflows

Right-thing philosophy is based on letting the experts do their expert thing all the way to the end before users get their hands on it. [But] in some cases, the software system that succeeds starts with a kernel of good technology, but it is not committed to fully realizing that technology....

Unrealistic DS - Pandas/Spark Subset

Design doesn’t have to be new, but it has to be good. Research doesn’t have to be good, but it has to be new… The best design surpasses its predecessors by using new ideas, and the best research solves problems that are not only new, but worth solving....