Trending
Heat Index
Databases
Most Recent
 
Read More
September 1, 2017

What is big data and Hadoop

While the term ‘big data’ is used a lot by small and large organizations alike, it doesn’t always mean that they’ve got a firm grasp on the concept of the technology and its benefits. As such, the ideal starting point of this post is to discuss the concept in a little more detail, ensuring that we have common understanding of the subject matter before we delve any further into the detail.

To quote SAS (source), “Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. And big data may be as important to business – and society – as the Internet has become. Why? More data may lead to more accurate [...]

10233
 
Read More
May 3, 2017

Creating a new table from file in Hadoop Hive

Today, I was provided with a beta version of a data feed that would be consumed by the Hadoop platform. As it’s not been configured to run into the platform yet, there was no way to query the data to ensure we had all the raw data we’d eventually need to extract insights and run vital reports for business consumers.

The data I was provided was a feed from the test environment and was in CSV format. For this data to be useful, I wanted to load it into the Hadoop cluster and run some queries, calculations & aggregations on it using Hue. To to this, I needed to create a new table and populate it with my shiny new data set. So, I created a new folder in my user [...]

3790
 
Read More
May 1, 2017

Optimizing database queries

When you’re querying 10,000 rows of data you can be sloppy. It doesn’t actually matter how inefficiently you write your queries, they’ll run in a reasonable amount of time and you’ll extract the insight you needed. That’s because 10,000 rows is tiny and you don’t need much compute power to get those numbers crunched.

However, when you start querying 1 billion rows, things start to get interesting. Your ‘Select *’ statement is a big no-no when you’ve got 100 columns and 1 billion rows – you need to think smart. You need to really streamline your queries to ensure reasonable execution time & resource [...]

3032
 
Read More
April 27, 2017

Using Regex in Hadoop Hive queries

I was working on a query today – something which could be executed against the Hadoop cluster using Hive & visualised in Tableau. While writing the query, I found that a few of the string functions I’d usually use in SQL weren’t valid and created ‘unknown function’ errors. So, I started working through each of the areas for which I was receiving an error, until I had a working query. That query is below:

SELECT table1.dt, field2, table2.postcode as mgpcode, table3.postcode as lookuppcode, lat, long, REGEXP_REPLACE(table2.postcode, '\\s+', '') as newpostcode, REGEXP_REPLACE(table3.postcode, '\\s+', '') as normalizedpostcode FROM [...]

28804
Business Analysis
 
Read More
88307
 
Read More
98919
 
Read More
71252

 
Read More
77695
 
Read More
29941
 
Read More
26151

 
Read More
26151
Trending Topics
Netshock Small Business Technology Blog
The Cloud
Amazon Web Services
Business Analysis
Business Intelligence
Building my own tech
CMS
Marketing my business
Technology blog news
Monkey Worldwide
Free Small Business CRM
AWS eBook
Netshock Web Design
AWS Articles
 
 
 
Top Ten
Heat Index
 
1
AWS Security Concepts
 
2
Technology operating procedures (SOPs, MOPs, EOPs and SCPs)
 
3
A detailed look at AWS S3
 
4
Cloud HSM & KMS Services
 
5
Qlikview Lookup() Function
 
6
Schedule data reload on Qlikview Desktop
 
7
The ultimate AWS exam guide
 
8
Why do mergers & acquisitions fail so frequently?
 
9
Entrepreneurs & the lean start up
 
10
The ultimate on-page SEO guide