BigQuery is revolutionizing the way Big Data is maintained and processed. Its “server-less” architecture is not just enterprise friendly but also developer friendly as it takes care of all the hardware configuration and scalability matrices on its own.
As a developer you just need to know the process of data extraction and that is all. All the performance related hard-work have been taken care by Google’s BigQuery.
Before you begin, I would suggest you to read following article:
This article will give you the entire picture of Big Query and why you should be using it in the first place to deal with Big Data.
Table of Contents
Create a New Project
To do anything, you first need to create a Project under Google Cloud Platform.
Navigate to the Google Credentials Page. And select your project from the top navbar. Look at the image below:
If your project is not visible, then you will have to create it.
Click on select project drop-down and  New Project. Take a look at the below snapshot.
Upon clicking the New Project button, a new page will open where you will be asked to enter project details and create new project.
Once your project is created, navigate back to the Credentials Page and look for Manage service Account Link at the bottom. Your application should use Service Account to interact with different Google Services.
Before moving forward, let me give you a quick overview of Service Account and its significance to better understand the tutorial.
Understanding Service Account
The first thing I tell everyone: Do not confuse Service Account with User Account.
Service accounts are not user specific, in-fact, these are special type of Google Accounts that belongs to your application instead of a User. Service account is a way to access Google Services with proper authentication and authorization.
Google assume identity of a Service account while calling Google APIs. Service accounts are made with the sole purpose to eliminate User from directly accessing the resources.
How does Service Account maintain authentication and authorization?
Service account have special keys that are used to authenticate themselves to google while accessing google services.
When you create Service Account, you decide:
- Which resources must be accessible by the service account?
- What permissions does the Service Account need in order to access a specific resource?
- Where will the code that assumes the identity of the service account be running: on Google Cloud Platform or on-premises?
Use below flowchart to figure out how to configure service account:
How to Create Service Account
Login to your Google Cloud Platform dashboard. Navigate to IAM & admin link on the left side bar and click on Service Accounts link.
Alternatively, you can Click on Manage Service Account link and that will lead you to Service Account Creation page.
After selecting project you will see all the listed accounts in the list format. And on the top navbar there is an option to create new service account. Click on Create New Service Account and provide the necessary information.
Make sure to choose the BigQuery role under Project role option. Checkout the image below:
Do not forget to check the furnish new private key checkbox. This will provide you file for download which we will be using later to access data from BigQuery tables.
Click save and your Service account will be created.
Note: Download the private key and save it on your local machine.
Great, you are done with all the boring service account creation part. Now, let’s jump into the development part. Here, you will be fetching the data from the BigQuery tables in JSON format.
Environment Setup
Set the path to the Private Key file in your System Environment. Follow following steps:
- Right Click on PC in windows explore and click on properties.
- Click on Advanced System Settings
- Click on Environment Variables button and create new environment variable with the key GOOGLE_APPLICATION_CREDENTIALS.
- You are done setting up the environment and now can make calls to Google BigQuery.
JAVA Project Setup for BigQuery Data Access Using REST API(s)
In this project, you will query the Hacker News comments from the BigQuery database. This dataset is uploaded to BigQuery publicly and is accessible by everyone. To give you the overview, below is the query that you will be firing on the Bigquery database:
You will first need to setup JAVA project and include all the required dependencies to make HTTP calls and retrieve data from the BigQuery database.
For this project you will be using the pom.xml
 file provided by Google in their sample projects. You can copy and paste the file contents from the following link: https://github.com/vslala/BigQueryRestSample/blob/master/pom.xml.
The main dependencies included in the pom.xml
 file are:
- Google Cloud BigQuery Library (1.35.0)
- Google API Client (1.23.0)
- Google HTTP Client (1.23.0)
- Google OAuth Client (1.23.0)
You can explore pom.xml
 file later.
Once you have created a project and imported all the dependencies into your classpath, then its time for some action 😉
Project Structure
For this project, I would suggest you to use simple structure and try out the application.
- I assume you have already created a project with all the required dependencies.
- Create Package Structure: com.example.bigquery
- For simplicity, keep all the code inside this package.
Classes Used
Let’s take each class one-by-one and see what it does.
Dataset.java
Go through the below code:
package com.example.bigquery;
import java.util.HashMap;
import java.util.Map;
public class Dataset {
private Map<String, Object> labels;
public Dataset() {
labels = new HashMap<>();
}
public Dataset addLabel(String labelKey, String labelValue) {
labels.put(labelKey, labelValue);
return this;
}
public Dataset removeLabel(String labelKey) {
labels.remove(labelKey);
return this;
}
public Map<String, Object> getLabels() {
return labels;
}
}
BigQueryResponseVO.java
This class is Response Value Object for the BigQuery Response. Take a look at its properties:
package com.example.bigquery;
import java.util.List;
import org.codehaus.jackson.annotate.JsonProperty;
import com.google.api.services.bigquery.model.JobReference;
import com.google.cloud.bigquery.BigQueryError;
import com.google.cloud.bigquery.Schema;
public class BigQueryResponseVO {
@JsonProperty
String kind;
@JsonProperty
String etag;
@JsonProperty
Schema schema;
@JsonProperty
JobReference jobReference;
@JsonProperty
long totalRows;
@JsonProperty
String pageToken;
@JsonProperty
boolean cacheHit;
@JsonProperty
long totalBytesProcessed;
@JsonProperty
List<BigQueryError> errors;
@JsonProperty
long numDmlAffectedRows;
}
LabelsSample.java
This class is the bootstrap class for the project. It contains the main()
 method that gives call to the BigQueryDemo.implicit()
 method.Â
Take a look at the class:
/*
* Copyright 2016 Google Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.example.bigquery;
import java.io.IOException;
/** Sample demonstrating labeling a BigQuery dataset or table. */
public class LabelsSample {
public static void main(String[] args) throws IOException, InterruptedException {
BigQueryDemo.implicit();
}
}
BigQueryDemo.java
Go through the below code and then read further:
package com.example.bigquery;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.Arrays;
import org.apache.log4j.Logger;
import com.google.api.client.googleapis.auth.oauth2.GoogleCredential;
import com.google.api.client.http.GenericUrl;
import com.google.api.client.http.HttpContent;
import com.google.api.client.http.HttpHeaders;
import com.google.api.client.http.HttpRequest;
import com.google.api.client.http.HttpRequestFactory;
import com.google.api.client.http.HttpResponse;
import com.google.api.client.http.HttpTransport;
import com.google.api.client.http.javanet.NetHttpTransport;
import com.google.api.client.http.json.JsonHttpContent;
import com.google.api.client.json.JsonFactory;
import com.google.api.client.json.jackson2.JacksonFactory;
import com.google.common.io.CharStreams;
public class BigQueryDemo {
private static final String QUERY_URL_FORMAT = "https://www.googleapis.com/bigquery/v2/projects/%s/queries" + "?access_token=%s";
private static final String QUERY = "query";
private static final String QUERY_HACKER_NEWS_COMMENTS = "SELECT * FROM [bigquery-public-data:hacker_news.comments] LIMIT 1000";
private static final Logger logger = Logger.getLogger(BigQueryDemo.class);
static GoogleCredential credential = null;
static final HttpTransport HTTP_TRANSPORT = new NetHttpTransport();
static final JsonFactory JSON_FACTORY = new JacksonFactory();
static {
// Authenticate requests using Google Application Default credentials.
try {
credential = GoogleCredential.getApplicationDefault();
credential = credential.createScoped(Arrays.asList("https://www.googleapis.com/auth/bigquery"));
credential.refreshToken();
} catch (IOException e) {
e.printStackTrace();
}
}
public static void implicit() {
String projectId = credential.getServiceAccountProjectId();
String accessToken = generateAccessToken();
// Set the content of the request.
Dataset dataset = new Dataset().addLabel(QUERY, QUERY_HACKER_NEWS_COMMENTS);
HttpContent content = new JsonHttpContent(JSON_FACTORY, dataset.getLabels());
// Send the request to the BigQuery API.
GenericUrl url = new GenericUrl(String.format(QUERY_URL_FORMAT, projectId, accessToken));
logger.debug("URL: " + url.toString());
String responseJson = getQueryResult(content, url);
logger.debug(responseJson);
}
private static String getQueryResult(HttpContent content, GenericUrl url) {
String responseContent = null;
HttpRequestFactory requestFactory = HTTP_TRANSPORT.createRequestFactory();
HttpRequest request = null;
try {
request = requestFactory.buildPostRequest(url, content);
request.setParser(JSON_FACTORY.createJsonObjectParser());
request.setHeaders(
new HttpHeaders().set("X-HTTP-Method-Override", "POST").setContentType("application/json"));
HttpResponse response = request.execute();
InputStream is = response.getContent();
responseContent = CharStreams.toString(new InputStreamReader(is));
} catch (IOException e) {
logger.error(e);
}
return responseContent;
}
private static String generateAccessToken() {
String accessToken = null;
if ((System.currentTimeMillis() > credential.getExpirationTimeMilliseconds())) {
accessToken = credential.getRefreshToken();
} else {
accessToken = credential.getAccessToken();
}
System.out.println(accessToken);
return accessToken;
}
}
This is the main class that would do all the work.
The very first thing that executes in this class is the static block.
static {
// Authenticate requests using Google Application Default credentials.
try {
credential = GoogleCredential.getApplicationDefault();
credential = credential.createScoped(Arrays.asList("https://www.googleapis.com/auth/bigquery"));
credential.refreshToken();
} catch (IOException e) {
e.printStackTrace();
}
}
The static block gets the Secret Json File of Service Account from the system environment properties. Then it sends a request to Google OAuth service to get the refresh token.
This refresh token is used to access the BigQuery with a scope as `https://www.googleapis.com/auth/bigquery`.
Form Request Body
Once the static block has executed successfully, the implicit
 method will be invoked.
The starting point for this class is its implicit()
method. This implicit method is called from the main()
 method of LabelsSample.java
.
Next, you will have to create the request body
 that will contain the query and BigQuery post URL.
public static void implicit() {
String projectId = credential.getServiceAccountProjectId();
String accessToken = generateAccessToken();
// Set the content of the request.
Dataset dataset = new Dataset().addLabel(QUERY, QUERY_HACKER_NEWS_COMMENTS);
HttpContent content = new JsonHttpContent(JSON_FACTORY, dataset.getLabels());
// Send the request to the BigQuery API.
GenericUrl url = new GenericUrl(String.format(QUERY_URL_FORMAT, projectId, accessToken));
logger.debug("URL: " + url.toString());
String responseJson = getQueryResult(content, url);
logger.debug(responseJson);
}
Send Request and Get Query Result
Once you have formed the request content, it is time to call the BigQuery’s REST API and fetch the results in JSON format.
Use request = requestFactory.buildPostRequest(url, content)
 (Request Factory) to create post
 request and pass in the URL and Content to it.
Set the appropriate headers (mandatory) and call the execute method.
request.setHeaders(
new HttpHeaders().set("X-HTTP-Method-Override", "POST").setContentType("application/json"));
The response is provided in the InputStream. Convert the InputStream
 into Characters
 to retrieve the response in JSON.
That is all you need to do in order to fetch the data from BigQuery in JSON format.
Conclusion
Great! I hope you were able to get the response by following this tutorial.
This article was only intended to fetch the JSON response from BigQuery. Do not use the example as it is. Try to play around and modify the code to suit your need.
I will be modifying the code to make it more friendly and act as a framework for your project. So, do check the master
branch of the repository as well.
If you have any question or if there is anything that you do not understand then please do comment below. I would be more than happy to interact with you and solve your queries.