On Structuring Python Projects

Types of Projects The term Python project can be somewhat misleading. While languages like Swift are designed for specific purposes such as generating macOS/iOS apps, components, and frameworks, Python is used in a much more versatile manner. A Python project might range from an analytical solution developed across multiple Jupyter notebooks to a standalone script querying a database API and extracting results to an application or package facilitating the deployment of models....

March 16, 2022 · 4 min · Konrad Zdeb

Why regex is not fuzzy matching

Recently, I cam across an interesting discussion on StackOverflow^[SO discussion on: Fuzzy Join with Partial String Match in R] pertaining to approach to fuzzy matching tables in R. Good answer contributed by one of the most resilient and excellent contributors to whom I owe a lot of thanks for help suggested relying on regular expression, combining this with basic sting removal and transformations like toupper to deterministically match the tables. The solution solved the problem and was accepted....

June 29, 2021 · 7 min · Konrad Zdeb

Inserting Data into Partitioned Table

Rationale Maintaining partitioned Hive tables is a frequent practice in a business. Properly structured tables are conducive to achieving robust performance through speeding up query execution (see Costa, Costa, and Santos 2019). Frequent use cases pertain to creating tables with hierarchical partition structure. In context of a data that is refreshed daily, the frequently utilised partition structure reflects years, months and dates. Creating partitioned table In HiveQL we would create the table with the following structure using the syntax below....

February 26, 2021 · 8 min · Konrad Zdeb