Using R for File Manipulation

Challenge File manipulation is a frequent task unavoidable in almost every IT business process. Traditionally, file manipulation tasks are accomplished within the ramifications of specific tools native to a given system. As such, the one may consider writing and scheduling shell script to undertake frequent file operations or using more specific purpose-built tools like logrotate in order to archive logs or tools like Kafka are used to build streaming-data pipelines. R is usually though of as a statistical programming language or as an environment for a statistical analysis....

March 29, 2021 · 6 min · Konrad Zdeb

Inserting Data into Partitioned Table

Rationale Maintaining partitioned Hive tables is a frequent practice in a business. Properly structured tables are conducive to achieving robust performance through speeding up query execution (see Costa, Costa, and Santos 2019). Frequent use cases pertain to creating tables with hierarchical partition structure. In context of a data that is refreshed daily, the frequently utilised partition structure reflects years, months and dates. Creating partitioned table In HiveQL we would create the table with the following structure using the syntax below....

February 26, 2021 · 8 min · Konrad Zdeb

Poor Man's Robust Shiny App Deployment (Part II)

Introduction This article draws on the past post concerned with utilisation of golem for robust deployment of analytical and reporting solutions. For this article, we will assume that we are working with defined working requirements that utilise some of the Labour Market Statistics disseminated through the nomis portal. Change Plan What we have Reporting requirements Past scripts we used to create reports with accompanying instructions What we want Stronger business continuity - we want to be able to give some access to this project and don’t be concerned with missing files, outdated unavailable documentation and questions on how to produce updated reports....

February 12, 2021 · 3 min · Konrad

Poor Man's Robust Shiny App Deployment

Not so uncommon problem… RStudio Connect and more modest Shiny Proxy come to mind as most obvious solutions for deploying Shiny applications in production. Application servers are ideal for deploying applications that are to be consumed on a regular basis by larger audiences. In addition to serving the application, managing dependencies and user access or logging user activity are common tasks we would expect for a publishing platform to address. Frequently, however, deployment of Shiny application is directed at smaller audiences and less frequent usage....

July 23, 2020 · 5 min · Konrad

Installing Hortonworks Sanbox on Mac with Docker

Background The post covers installation of Hortonworks Sandbox (HD) on Mac using Docker. In software development, sandbox describes a testing environment that can be used to isolate untested code changes from a production code. Hortonworks Sandbox provides such an environment with the Hortonworks Data Platform installed. Hortonworks Data Platform is an open source framework facilitating distributed storage and processing large volumes of data. Deploying system for distributed processing within a single computer may seem like a counter-intuitive idea but it’s actually a very common practice....

February 23, 2019 · 2 min · Konrad