Latest Posts
Amazon AWS Certified Data Analytics Specialty – Domain 6: Security Part 2
Cloud HSM Overview Let’s talk about another way to perform encryption on your cloud, and that is going to be different than Kms. This one is called Cloud HSM. So with Kms, AWS is the one that manages the software for the encryption. But with cloud HSM, AWS will just provision you the encryption hardware and you have to use your own client to perform the encryption. So HSM is a dedicated hardware to you. HSM really means hardware Security module. The idea is that you give you a hardware,…
Amazon AWS Certified Data Analytics Specialty – Domain 6: Security
Encryption 101 Welcome to this section on security and encryption. These are not necessarily the most fun sections to deal with, but they are super important for the exam. The exam will definitely ask you a lot of security questions as well as encryption. And so Kms, the encryption SDK, the parameter store, I am, all these things are a central piece of the exam and I want to make this as easy as possible because security, I know you’re not an expert, or maybe not an expert, but it’s…
Amazon AWS Certified Data Analytics Specialty – Domain 4: Analysis Part 7
[Exercise] Redshift Spectrum, Pt. 1 So let’s continue building out our data warehousing requirements here for Cadabra. com. This time, we’re going to use Amazon Redshift instead of Athena, and that’s a managed solution that requires you to spin up a cluster with a fixed number of servers. So a little bit more work, a little bit more thinking you have to do about capacity there. But at the end of the day, it’s going to operate very similarly. Now, just to tie things together even more and mix things…
Amazon AWS Certified Data Analytics Specialty – Domain 4: Analysis Part 6
Redshift Data Flows and the COPY command The exam is going to expect a lot of depth from you on importing and exporting data to and from your redshift cluster. Now, the most efficient way to import data or load data into your redshift table is using the Copy command. Using the Copy command, you can read from multiple data files or a multiple data stream simultaneously. You can import that data in from S Three EMR DynamoDB or some remote host using SR SSH for access control. You can…
Amazon AWS Certified Data Analytics Specialty – Domain 4: Analysis Part 5
Redshift Durability and Scaling Now, let’s talk about the specifics of Redshift’s, durability, and scalability. Redshift replicates all of the data within the Data Warehouse cluster when it is loaded automatically. Also, your data is continuously backed up to S Three for you. Three copies of the data are maintained on the original, on a Replica, on compute nodes, and in a backup in S Three. So your data is stored in three different places. There’s the original copy within your client cluster. There’s a backup replica copy within your…
Amazon AWS Certified Data Analytics Specialty – Domain 4: Analysis Part 4
[Exercise] AWS Glue and Athena Let’s start building out our data warehousing and visualization requirements for Cadabra using Amazon Athena at first. We’re going to do this in a couple of different ways in this course. One using Redshift and one using Athena. Let’s start with Athena, because that one’s easier. All we have to do is set up AWS Glue to actually infer a schema from our data lake in S Three, which houses all of our order data. And we’ve already done the work of importing all that…
Amazon AWS Certified Data Analytics Specialty – Domain 4: Analysis Part 3
[Exercise] Amazon Elasticsearch Service, Part 2 Let’s go back to the Firehose configuration screen here. We can close out of this and choose the lambda function we just made, log transform, and move on to the next page. Now we need a destination. The destination in our case will be the Elasticsearch service to the domain that we just set up. Let’s choose our domain. Hopefully it’s there. Cadabra. We need to specify an index to put this stuff into. Let’s call it web logs. Let’s rotate it, I don’t…
Amazon AWS Certified Data Analytics Specialty – Domain 4: Analysis Part 2
Intro to Elasticsearch Let’s dive into Amazon’s elasticsearch service. Elasticsearch is a pretty exciting technology, I think, for doing largescale analysis and reporting petabyte scale, in fact. And what’s interesting is that even though elasticsearch has started out as a search engine, that’s fundamentally what it was made for originally. It’s not just for search anymore, really. It’s primarily for analysis and reporting these days. And for some applications, it can actually analyze massive data that’s like that a lot faster than something like Apache Spark could. So for the…
Amazon AWS Certified Data Analytics Specialty – Domain 4: Analysis
Intro to Kinesis Analytics As we start our journey into the analysis domain of big data. Let’s start off with Kinesis Analytics. It’s another system for querying streams of data continuously, very similar in spirit to Spark Streaming, but it is specific to AWS Kinesis. So conceptually it’s pretty simple. Kinesis Data Analytics can basically receive data from either a Kinesis data stream or from a Kinesis data firehose stream. And just like Spark Streaming, you can set up windows of time that you can look back on and aggregate…
Amazon AWS Certified Data Analytics Specialty – Domain 3: Processing Part 7
[Exercise] Elastic MapReduce, Part 1 For our next hands on activity, we’re going to build a product recommendation system for Cadabra. com. And the good news is that we’ve already built out most of this system way back in Exercise One. So we already have an EC Two instance that is Populating server logs that get consumed by Kinesis data firehose, which in turn dumps that data into an Amazon S Three bucket. And this has already been set up. And you might recall that we already put 500,000 rows…