Using RScript for R Installation Managment

Most frequently, users tend to undertake common R installation and management tasks from within the R session. Frequently making use of commands, like install.packages, update.packages or old.packages to obtain or update packages or update/verify the existing packages. Those common tasks can also be accomplished via the GUI offered within RStudio, which provides an effortless mechanism for undertaking basic package management tasks. This is approach is usually sufficient for the vast majority of cases; however, there are some examples when working within REPL^[REPL stands for Read Eval Print Loop and is usually delivered in a form of an interactive shell....

January 3, 2022 · 5 min · Konrad Zdeb

R-based metaprogramming strategies for handling Hive/CSV interaction (Part I, imports)

Background Handling Hive/CSV interaction is a common reality of many analytical and data environments. The question on exporting data from Hive to CSV and other formats is frequently raised on online forums with answers frequently suggesting making use of sed that combined with nifty regular expressions pipes Hive output into a flat CSV files as an exporting solution. Import of large amounts of data is best handled by suitable tools like Apache Flume....

August 13, 2021 · 9 min · Konrad Zdeb

Installing Hortonworks Sanbox on Mac with Docker

Background The post covers installation of Hortonworks Sandbox (HD) on Mac using Docker. In software development, sandbox describes a testing environment that can be used to isolate untested code changes from a production code. Hortonworks Sandbox provides such an environment with the Hortonworks Data Platform installed. Hortonworks Data Platform is an open source framework facilitating distributed storage and processing large volumes of data. Deploying system for distributed processing within a single computer may seem like a counter-intuitive idea but it’s actually a very common practice....

February 23, 2019 · 2 min · Konrad