Using Swift for Data Science Workflows

Why Swift? Data science is dominated by Python and R, with some usage of Julia, Scala, Java, and C++. While Swift may not be the most popular choice, it offers several notable benefits—especially for developers already invested in the Apple ecosystem. Key Advantages Performance Considerations As a compiled language, Swift often runs faster than languages like Python or R. This can be especially beneficial when handling large datasets or complex computations....

March 4, 2025 · 5 min · Konrad Zdeb

On Structuring Python Projects

Types of Projects The term Python project can be somewhat misleading. While languages like Swift are designed for specific purposes such as generating macOS/iOS apps, components, and frameworks, Python is used in a much more versatile manner. A Python project might range from an analytical solution developed across multiple Jupyter notebooks to a standalone script querying a database API and extracting results to an application or package facilitating the deployment of models....

March 16, 2022 · 4 min · Konrad Zdeb

Why regex is not fuzzy matching

Recently, I cam across an interesting discussion on StackOverflow^[SO discussion on: Fuzzy Join with Partial String Match in R] pertaining to approach to fuzzy matching tables in R. Good answer contributed by one of the most resilient and excellent contributors to whom I owe a lot of thanks for help suggested relying on regular expression, combining this with basic sting removal and transformations like toupper to deterministically match the tables. The solution solved the problem and was accepted....

June 29, 2021 · 7 min · Konrad Zdeb

Inserting Data into Partitioned Table

Rationale Maintaining partitioned Hive tables is a frequent practice in a business. Properly structured tables are conducive to achieving robust performance through speeding up query execution (see Costa, Costa, and Santos 2019). Frequent use cases pertain to creating tables with hierarchical partition structure. In context of a data that is refreshed daily, the frequently utilised partition structure reflects years, months and dates. Creating partitioned table In HiveQL we would create the table with the following structure using the syntax below....

February 26, 2021 · 8 min · Konrad Zdeb