An App Builder for the Data Science Team – The New Stack


Not only do companies collect large amounts of data, but new types of data, like geographic data and sentiment analysis, are being used not only to trace the past but, along with machine learning, to predict the future. .

Yet companies haven’t been able to take full advantage of the data they have, as sharing internally has taken too much time and human resources to build the kind of apps to fully leverage data.

Enter Streamlit, an open source framework for data scientists to quickly build web applications to access and explore machine learning models, advanced algorithms, and complex data types.

“There this is a whole new category of business intelligence problems that weren’t there five years ago, and the traditional ways, using Tableau or Microsoft Power BI, just say, ‘let’s set up a dashboard, and let’s set up some charts, and, and, graphically, this data no longer works in this world, ”said Adrien Treuille, co-founder and CEO of Streamlit.

He and his co-founders Amanda Kelly and Thiago Teixeira met while working at the Google X Innovation Lab in 2013.

They started with the question: what if we could make building tools as easy as writing Python scripts?

They wanted data scientists and machine learning engineers to be able to create applications that would allow them to interact with data without having to call on a team of tools or handle data engineering tasks backwards – plan.

Today, the San Francisco-based company, which opened the tech in 2019, has more than 16,000 GitHub stars and a community of more than 30,000 developers around the world. It is used by Delta Dental, Caterpillar, 7-Eleven, Uber, Ford, and Pfizer.

Streamlit started with the question: what if we could make authoring tools as easy as writing Python scripts?

“Building a small web app in Streamlit takes me 10% of the time it would take to build the same thing with a conventional app building approach. Streamlit is an even bigger win for data scientists unfamiliar with JavaScript, because Streamlit allows them to create everything in Python, ”said former Google data scientist Dan Becker, founder of Kaggle Learn and, now Vice President of Product, Decision. Intelligence at DataRobot.

“Historically, I had to manage the frontend code, the backend code and the communication between them. With Streamlit, I can specify how I want the page to work in Python, and it takes care of everything. The pages look nice by default, so I don’t have to write CSS. Streamlit is particularly easy to learn. It takes about 10 minutes to learn enough to be productive.

Part of the existing workflow

Rather than creating a single tool, the idea was to create Lego-like functionality to allow users to create their own ways of making sense of their data. This could mean creating cursors with different variables or extracting subsets of data into sidebars to examine them in different ways.

These apps are data visualizations written as a few lines of Python code, the backbone of data scientists’ existing workflow. React is the front-end framework used to display data on the screen.

Streamlit treats widgets like variables. Each interaction simply reruns the script from top to bottom.

It downloads data only once, using a cache primitive that acts as a persistent, immutable data store that allows the application to safely reuse information. This eliminates data retrieval and redundant calculations.

The product deploys apps directly from private Git repositories and updates instantly on commits.

It integrates with popular Python libraries used in data science such as NumPy, Pandas, Matplotlib, Scikit-learn, and others.

“From my perspective, Streamlit is by far the fastest way to turn an interesting analysis, machine learning model, or intelligent visualization into a data product that you can easily show other people online.” , said Tyler Richards, data scientist at Facebook. who also wrote a book on Streamlit.

“I constantly have this problem where I have an impressive result at work or on a personal project, and I have to reduce it to something that I can easily paste into a dashboard or a Word document (a static chart or basic performance stats on my model) or spending a lot of time building a custom Flask / Django app. Streamlit is the best of both worlds because I can just directly build a fully functional web application from my already created Python script and use their tools to easily host it.

Hours, not weeks

Treuille drew on his experience working with students on machine learning projects as a professor at Carnegie Mellon University and as vice president of autonomous vehicle startup Zoox.

With Streamlit, a project that previously took weeks can be completed in hours, he said.

“[The data science] group faces unique challenges that the business has never seen before, especially when it comes to making the information we produce available, in a scalable way, so that the marketing team can directly benefit from it. ” a model that we’ve built that predicts the future, or so that the product team themselves can look at all this filtered geographic data in a way that isn’t traditionally possible, then step in and see the sentiment analysis applied to this or that country, ”he said.

“So these are the kinds of next-gen challenges that data scientists and machine learning engineers are very good at solving, but which haven’t been systematically shared more widely across the enterprise. “

The open-source technology-based company adds enterprise-grade data security and authentication as well as collaboration features for data scientists and their customers.

“Literally in an afternoon, in the work you are already doing, you can go from an analysis that was primarily for yourself… to something interactive and shareable with someone else,” said Kelly said.

“People have said to us all the time, ‘It would have been 10,000 lines of code, if I had to put it in a different language like Flask, and it was about 100 lines. [in Streamlit]. ‘ Or “It took three and a half months to build another team; I reproduced the exact same thing in six hours.

New features in 1.0

While Streamlit can be deployed anywhere, the company recently announced Streamlit Cloud to handle containers, authentication, scaling, security, and more.

The company’s physical infrastructure is hosted and managed on Google Cloud Platform (GCP), leveraging its built-in security, privacy and redundancy features.

User permission levels are those assigned in GitHub. Workers with write access to a particular application can make changes, but only those with administrator access can deploy or remove an application.

The technology recently reached the 1.0 milestone.

“We’ve spent pretty much all of 2020, and a good chunk of 2021, both adding these features, but also toughening up, making sure we’re really testing with the community, really finding out and saying : “Isn’t just the fastest way to go out and build an app, but the best way to do it in terms of primitives and ease of use,” Kelly said.

Among these novelties:

  • Improved caching by leveraging Apache Arrow for serialization and memory management, which increased speed and responsiveness.
  • Provide more customization with primitives and app layout themes to allow users to match their company brand.
  • Added state with session state and forms to allow users to build more complex applications.
  • Adding components and integrations to allow users to write their own components or integrate libraries such as SpaCy, HiPlot or Folium. The new feature also includes the ability to send and receive videos or draw on a canvas.

Its roadmap includes plans to add to its library of widgets, improve the developer experience, and make it easier to share code, components, and apps.

In a blog post, Crystal Huang, who describes herself as an aspiring data scientist, described her project using Streamlit to apply face mask detection to photos using deep learning algorithms.

Streamlit raised $ 62 million, most recently a $ 35 million Series B round announced in April by Sequoia and previous investors Gradient Ventures and GGV Capital.

Source link


Comments are closed.